Dear Friends,
- What is HashMap
- When to use HashMap
- Various ways to instantiate HashMap
- How HashMap works Internally
- How HashMap works Internally
- What is the difference between
size and capacity of HashMap and how size and capacity of Hashmap grows
- What happens with existing
elements of HashMap when capacity is doubled? What is Rehashing?
What is HashMap
- HashMap is implementation of Map
interface from collection framework, which stores elements in the form of
Key-Value pairs.
- HashMap is not
synchronized, which means that all methods of HashMap class are not synchronized
and hence are not thread safe which in turn makes HashMap as not good option in
multi-threaded environment.
- HashMap allows one null
key and multiple null values.
- HashMap is an unordered
collection, which means when we will iterate over the elements of the HashMap, the order in which elements will be returned is not guaranteed
to be same in which they were inserted into the HashMap.
- HashMap has complexity of
O(1), which means that time taken for put and get operations for HashMap remains
constant as number of elements in HashMap keeps on increasing, though it is
guaranteed only when hashcode generated for the keys is such that they are
distributed across the capacity(bucket) of the HashMap. A poorly written
hashcode method can degrade performance.
When to use HashMap
HashMap should be used when:
- You want to store data in the
form of key-value pair such that corresponding to each unique key, you
have unique values. For example: Country Name and its Prime minister. Although
it is perfectly allowed to store duplicate values against unique keys, but
it is not allowed to have duplicate keys.
For example, Employee ->
Department
It is possible that two employees
have same department. In this case key(employee) will be unique, though values
can be duplicate.
- You don't care about the order
of the elements.
- You want to use it in single
threaded environment, as concurrent operations on HashMap are not thread safe.
Various ways to instantiate HashMap
Below are three ways, we can instantiate a HashMap
1. Through no argument constructor
Map<String,String> map = new HashMap<String,String>();
Below is the source code of no
argument constructor of HashMap
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR;
threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
table = new Entry[DEFAULT_INITIAL_CAPACITY];
init();
}
Here DEFAULT_LOAD_FACTOR and
DEFAULT_INITIAL_CAPACITY are defined in HashMap
class as below :
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 16;
Here is how Entry class is defined within HashMap class
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
/**
* Creates new entry.
*/
Entry(int h, K k, V v, Entry<K,V> n) {
value = v;
next = n;
key = k;
hash = h;
}
public final K getKey() {
return key;
}
public final V getValue() {
return value;
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (!(o instanceof Map.Entry))
return false;
Map.Entry e = (Map.Entry)o;
Object k1 = getKey();
Object k2 = e.getKey();
if (k1 == k2 || (k1 != null && k1.equals(k2))) {
Object v1 = getValue();
Object v2 = e.getValue();
if (v1 == v2 || (v1 != null && v1.equals(v2)))
return true;
}
return false;
}
public final int hashCode() {
return (key==null ? 0 : key.hashCode()) ^
(value==null ? 0 : value.hashCode());
}
public final String toString() {
return getKey() + "=" + getValue();
}
/**
* This method is invoked whenever the value in an entry is
* overwritten by an invocation of put(k,v) for a key k that's already
* in the HashMap.
*/
void recordAccess(HashMap<K,V> m) {
}
/**
* This method is invoked whenever the entry is
* removed from the table.
*/
void recordRemoval(HashMap<K,V> m) {
}
}
So, coming back to our constructor, what it is doing is....it is initializing an array of class type Entry with initial capacity of 16. This Entry class objects will store key value pairs that we add in a HashMap. As you can see above, this Entry class has following instance variables.
final K key;
V value;
Entry<K,V> next;
final int hash;
final K key :
This will store key of HashMap.
2. Through single
argument Constructor
Map<String,String> map = new HashMap<String,String>(16);
We can instantiate HashMap by passing initial
capacity as argument as below:
Source code of single
argument Constructor is as below:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
As you can see from source code of constructor, ingle argument constructor in turn calls two arguments constructor. With this method you have option to instantiate HashMap with capacity more than 16 also as below:
Map<String,String> map = new HashMap<String,String>(32);
This will be done typically when
we know in advance that our HashMap is going to contain more than 16 number of elements. By instantiating HashMap with capacity as per requirement, we avoid
rehashing (we will see what is rehashing in next sections) as threshold is
reached.
3. Through two argument
constructor
Map<String,String> map = new HashMap<String,String>(16,0.75f);
Here is the source code of
two argument constructor:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
So, using two argument
constructor, we can pass initial capacity as well as load factor.
How HashMap Works Internally
HashMap stores each Key-Value pair in instance variables of the Entry class, which is static inner class defined within HashMap class and HashMap class has array of these Entry class objects. This array is initialized with initial size of 16 when we instantiate HashMap class with no argument constructor, which means HashMap has initial capacity of 16 or in other words HashMap has capacity to store 16 key-value pairs initially.
Lets take an example
Lets create Employee class with name and empId as its attributes or instance variables. We will use objects of this class as Key of HashMap and corresponding to each unique Employee we will put his/her department as Value of HashMap.
package com.blogspot.javasolutionsguide;
public class Employee {
private String name;
private String empId;
Employee(String name,String id){
this.name = name;
this.empId =id;
}
/**
* @return
*/
public String getName() {
return name;
}
/**
* @param name
*/
public void setName(String name) {
this.name = name;
}
/**
* @return
*/
public String getEmpId() {
return empId;
}
/**
* @param empId
*/
public void setEmpId(String empId) {
this.empId = empId;
}
@Override
public int hashCode() {
return 5*empId.length();
}
@Override
public boolean equals(Object obj) {
if(this == obj)
return true;
Employee emp = (Employee)obj;
if(this.getEmpId().equalsIgnoreCase(emp.getEmpId())){
return true;
}
return false;
}
}
Let's write a Test class where we will put key-value (employee-department) pairs in HashMap.
package com.blogspot.javasolutionsguide;
import java.util.HashMap;
import java.util.Map;
public class TestHashMap {
public static void main(String[] args) {
Employee emp1 = new Employee("Gaurav", "101");
Employee emp2 = new Employee("Saurav", "102");
Employee emp3 = new Employee("Sachin", "1001");
Map<Employee, String> map = new HashMap<Employee, String>();
map.put(emp1, "IT");
map.put(emp2, "Accounts");
map.put(emp3, "Human Resource");
System.out.println(map);
System.out.println("Employee 1 Department:" + map.get(emp1));
System.out.println("Employee 2 Department:" + map.get(emp2));
System.out.println("Employee 3 Department:" + map.get(emp3));
}
}
Let's execute the program and check the output:
Output:
Employee 1 Department:IT
Employee 2 Department:Accounts
Employee 3 Department:Human Resource
Let us see the structure of HashMap. Put a debug point at line 15, select map, right click and click on inspect or press Ctrl + Shift + I.
As you can see from above snapshot that at index 5, we have object of Employee class as key and "Human Resource" as value, but next here is null.
At index 15, we have object of Employee as Key, "Accounts" as value and next variable here is pointing to another Entry object. So here is our third element.
Let's expand it more to see where this next is pointing.
As you can see from above snapshot that next variable of Entry class object at index 15 is pointing to another Entry object which has Employee object as key and "IT" as value. This is because both employees with empId 101 and 102 have same hashCode which is calculated using following code in hashCode() method defined in Employee class :
5 * empId.length;
so for empId 101, hashCode = 5 * 3 = 15
and for empId 102, hashCode = 5 *3 = 15
So we can conclude from this, that if two keys in hashmap have same hashcode, they will be stored at the same index(bucket or Entry array) in the form of linked list.
When we put element in HashMap
Below method of HashMap class gets called
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Let's see what put method is doing :- If key is null, key value pair is stored at the zeroth index of the array named as table(logical bucket),because hashcode for Null is always zero and index will be always zero in this case.
- If key is not null, hashcode method of the Key object(Employee in our case) is called, which returns hashcode for key object - hash(int) method is called with the calculated hashcode as argument. This extra hashing is added by Java library designers to overcome the issue when developers would have written an inefficient hashcode. In general while writing logic for hashcode ,it should be kept in mind that hashcode returned for different objects should be such that it distributes objects across various available buckets.
- On the basis of calculated hashcode in the above step, index is calculated by calling indexFor(hash, table.length) method, which,which is the index of array table where this key value pair(in the form of Entry object) will be stored.
- If there is no Entry Object already at calculated index, then our new Entry object will be placed at the calculated index.
- If there is already an Entry object at calculated index and new key object also has same hashcode, equals method of Key object is called and
- if value of new key object is also equal to the existing Key object's value, then Value object is replaced with the new Value.
- If value of new Key object is not equal to the value of existing Key object value.then Entry object is iterated unless next variable is null. Once next variable with null value is found, next variable of existing Entry object will start pointing to new Entry Object, thus maintaining Linked List structure.
- If there is already an Entry object at calculated index and new key object also has same hashcode, equals method of Key object is called and
- if value of new key object is also equal to the existing Key object's value, then Value object is replaced with the new Value.
- If value of new Key object is not equal to the value of existing Key object value.then Entry object is iterated unless next variable is null. Once next variable with null value is found, next variable of existing Entry object will start pointing to new Entry Object, thus maintaining Linked List structure.
so because in our case hashcode returned for following two Employee objects were equal, but they were not considered equal by the equals() method due to there different empId values(101, 102), they landed in same array index(bucket) i.e. 15 maintained in the form of Linked list structure.
Employee emp1 = new Employee("Gaurav","101");
Employee emp2 = new Employee("Saurav","102");
hashcode for emp1 = 5*empId.length(); = 5*3 = 15
hashcode for emp2 = 5*empId.length(); = 5*3 = 15
When we get element from HashMap
Below method of
HashMap class gets called
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Let's see what get method is doing:
- If key is null, value at zeroth index(bucket) is returned.
- If key is not null, then hashCode method of the Key object is called and hashcode is calculated.
- hash() method is called with calculated hashcode as argument .hash() method returns the index(bucket).
- equals method of key(Employee class) is called and key is searched by traversing linked list and corresponding value is returned.
If we do map.get(emp2);
It will first call the hashCode() method of Employee which would help in finding the index of Entry array or bucket where this key(and corresponding value) can be present. So, which means we will reach at index 15. Now using equals() method, value of empId of emp2 object wiill be compared with each of the keys at index 15 and once value of the key matches, then corresponding value would be returned.
Hope it clarified that how HashMap works internally in Java.
Hope it clarified that how HashMap works internally in Java.
What is the difference
between size and capacity of HashMap and how size and capacity of Hashmap grows
Capacity of HashMap is number of
maximum elements it can store.When we instantiate a HashMap like below,it has
initial default capacity of 16.
Map<String,String> map = new
HashMap<String,String>();
Size of HashMap is actual number
of elements present in it.
Let's see with an example,how size
and capacity of HashMap grows.
package com.blogspot.javasolutionsguide;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.Map;
public class TestHashMap {
public static void main(String[] args) throws Exception {
Employee emp1 = new Employee("Gaurav","101");
Employee emp2 = new Employee("Saurav","102");
Employee emp3 = new Employee("Sachin","1001");
Employee emp4 = new Employee("Sachin1","1002");
Employee emp5 = new Employee("Sachin2","1003");
Employee emp6 = new Employee("Sachin3","1004");
Employee emp7 = new Employee("Sachin4","1005");
Employee emp8 = new Employee("Sachin5","1006");
Employee emp9 = new Employee("Sachin6","1007");
Employee emp10 = new Employee("Sachin7","1008");
Employee emp11 = new Employee("Sachin8","1009");
Employee emp12 = new Employee("Sachin9","1010");
Employee emp13 = new Employee("Sachin10","1011");
Map<Employee,String> map = new HashMap<Employee,String>();
System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
map.put(emp1, "IT");
map.put(emp2, "Accounts");
map.put(emp3, "Human Resource");
map.put(emp4, "Accounts");
map.put(emp5, "Human Resource");
map.put(emp6, "Accounts");
map.put(emp7, "IT");
map.put(emp8, "Human Resource");
map.put(emp9, "Accounts");
map.put(emp10, "IT");
map.put(emp11, "Human Resource");
map.put(emp12, "Accounts");
System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
map.put(emp13, "Human Resource");
System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
}
/*
* This method returns capacity of HashMap by using reflection library.
*/
static int getCapacity(Map<Employee,String> map) throws Exception {
Field dataField = HashMap.class.getDeclaredField("table");
dataField.setAccessible(true);
return ((Object[]) dataField.get(map)).length;
}
}
Output
Map size:0 Map Capacity:16
Map size:12 Map Capacity:16
Map size:13 Map Capacity:32
As you can see from above output
that
- Initial capacity of HashMap is
16,when we instantiate it with no element in it.
- When we added 13th element in
the HashMap,its capacity doubled i.e. from 16 to 32.
- Size of HashMap at any time is
equal to the number of elements(Key value pairs) in the map.
- Capacity at any time will be
greater than size.
How Capacity of HashMap grows
depends upon the threshold.Threshold is maximum size of HashMap after which it
is resized.
threshold = capacity * load factor
Default load factor for HashMap is
0.75f.
As we know,initial capacity of
HashMap is 16,so threshold is calculated as below
i.e (75/100)*16 = 12
Which means ,when we add 13th
element in the HashMap ,HashMap is resized and capacity is increased to double
the initial capacity.
On the similar line,next time
capacity will increase when threshold = (75*100)*32 = 24 is reached, i.e. on
addition of 25th element.
and so on....
What happens with existing elements of HashMap when capacity is doubled?What is Rehashing?
When capacity of HashMap is
doubled, rehashing of the existing elements of the HashMap happens.
Rehashing is a mechanism where in
hashcode of the existing elements(of Keys ofcourse) is calculated again when
capacity of HashMap changes.
Before we understand why rehashing
is done, let's try to understand why we need to increase the capacity of HashMap.
As we know, initial
capacity of HashMap is 16,so say we initially added 16 elements in the
HashMap. A well written hashcode method will place each of these 16 elements at
separate 16 positions(array indexes or buckets),such that if we want to get
element from HashMap, it will take just 1 look up.
Now say we increased number of
elements to 32,keeping capacity to 16 only,in that case a well written hashcode
method will distribute 2 elements in each bucket, such that getting element from
any of the bucket will take maximum 2 look ups.
Now say we increased number of
elements to 64,keeping capacity to 16 only,in that case a well written hashcode
method will distribute 4 elements in each bucket,such that getting element from
each bucket will take maximum 4 look ups.
Now say we increased number of
elements to 128,keeping capacity to 16 only,in that case a well written
hashcode method will distribute 8 elements in each bucket,such that getting
element from each bucket will take maximum 8 look ups.
As we can see from above behavior
that as number of elements in HashMap are increasing keeping capacity to
16,more look ups are required to get elements from HashMap and with number of
elements increasing further it will degrade the performance of HashMap
So, what is the solution?
yes, you got it it right. This is
the reason HashMap is designed in such as way that when number of elements
increased to certain limit(threshold),it's capacity is doubled.
But only doubling capacity will
not help,as long as elements in the map are distributed across the buckets. So
to make sure that when we double the capacity, still elements are well
distributed across buckets, rehashing is done.
Hope this article was
helful to you and you got idea of How Hashmap Works Internally in Java .For more collections tutorials you can go through my below
articles. Also I post on Google+,so you can add me in your circle to get updates.
ArrayList Write Operations explored
ArrayList Read Operations explored