How HashMap Works internally in Java

Dear Friends,

In this post,we will discuss about java.util.HashMap.How does HashMap works internally in Java.



- What is HashMap
- When to use HashMap
- Various ways to instantiate HashMap
- How HashMap works Internally
- What is the difference between size and capacity of HashMap and how size and capacity of Hashmap grows
- What happens with existing elements of HashMap when capacity is doubled? What is Rehashing?

What is HashMap


- HashMap is implementation of Map interface from collection framework, which stores elements in the form of Key-Value pairs.
- HashMap is not synchronized, which means that all methods of HashMap class are not synchronized and hence are not thread safe which in turn makes HashMap as not good option in multi-threaded environment.
- HashMap allows one null key and multiple null values.
- HashMap is an unordered collection, which means when we will iterate over the elements of the HashMap, the order in which elements will be returned is not guaranteed to be same in which they were inserted into the HashMap.
- HashMap has complexity of O(1), which means that time taken for put and get operations for HashMap remains constant as number of elements in HashMap keeps on increasing, though it is guaranteed only when hashcode generated for the keys is such that they are distributed across the capacity(bucket) of the HashMap. A poorly written hashcode method can degrade performance.

When to use HashMap


HashMap should be used when:

- You want to store data in the form of key-value pair such that corresponding to each unique key, you have unique values. For example: Country Name and its Prime minister. Although it is perfectly   allowed to store duplicate values against unique keys, but it is not allowed to have duplicate keys.

For example, Employee -> Department
It is possible that two employees have same department. In this case key(employee) will be unique, though values can be duplicate.
- You don't care about the order of the elements.
- You want to use it in single threaded environment, as concurrent operations on HashMap are not thread safe.

Various ways to instantiate HashMap


Below are three ways, we can instantiate a HashMap


1. Through no argument constructor

Map<String,String> map = new HashMap<String,String>();
Below is the source code of no argument constructor of HashMap
   /**
    * Constructs an empty <tt>HashMap</tt> with the default initial capacity
    * (16) and the default load factor (0.75).
    */
    public HashMap() {
      this.loadFactor = DEFAULT_LOAD_FACTOR;
      threshold = (int)(DEFAULT_INITIAL_CAPACITY  * DEFAULT_LOAD_FACTOR);
      table = new Entry[DEFAULT_INITIAL_CAPACITY];
      init();
   }
  
Here  DEFAULT_LOAD_FACTOR and DEFAULT_INITIAL_CAPACITY  are defined in  HashMap    class    as below :
    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;
Here is how Entry class is defined within HashMap class
   static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return (key==null   ? 0 : key.hashCode()) ^
                   (value==null ? 0 : value.hashCode());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

So, coming back to our constructor, what it is doing is....it is initializing an array of class type Entry with initial capacity of 16. This Entry class objects will store key value pairs that we add in a HashMap. As you can see above, this Entry class has following instance variables.

final K key;
V value;
Entry<K,V> next;
final int hash;
 final K key : This will store key of HashMap.
 V value : This will store value of HashMap.
 Entry<K,V> next : This will point to another Entry Object. This field is used to maintain a  linked list  structure for elements, when more than one element has same hashcode.
 final int hash : This field is used to store hash value of the key object.
   
2. Through single argument Constructor
Map<String,String> map = new HashMap<String,String>(16);
We can instantiate HashMap by passing initial capacity as argument as below:

Source code of single argument Constructor is as below:
 /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

As you can see from source code of constructor, ingle argument constructor in turn calls two arguments constructor. With this method you have option to instantiate HashMap with capacity more than 16 also as below:

Map<String,String> map = new HashMap<String,String>(32);
This will be done typically when we know in advance that our HashMap is going to contain more than 16 number of elements. By instantiating HashMap with capacity as per requirement, we avoid rehashing (we will see what is rehashing in next sections) as threshold is reached.

3. Through two argument constructor
 Map<String,String> map = new HashMap<String,String>(16,0.75f);
 Here is the source code of two argument constructor:
/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }
So, using two argument constructor, we can pass initial capacity as well as load factor.

How HashMap Works Internally 

HashMap stores each Key-Value pair in instance variables of the Entry class, which is static inner class defined within HashMap class and HashMap class has array of these Entry class objects. This array is initialized with initial size of 16 when we instantiate HashMap class with no argument constructor, which means HashMap has initial capacity of 16 or in other words HashMap has capacity to store 16 key-value pairs initially.

Lets take an example

Lets create Employee class with name and empId as its attributes or instance variables. We will use objects of this class as Key of HashMap and corresponding to each unique Employee we will put his/her department as Value of HashMap.

package com.blogspot.javasolutionsguide;

public class Employee {
    private String name;
    private String empId;

    Employee(String name,String id){
        this.name = name;
        this.empId =id;
    }
    /**
     * @return
     */
    public String getName() {
        return name;
    }

    /**
     * @param name
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * @return
     */
    public String getEmpId() {
        return empId;
    }

    /**
     * @param empId
     */
    public void setEmpId(String empId) {
        this.empId = empId;
    }

    @Override
    public int hashCode() {
        return 5*empId.length();
    }

    @Override
    public boolean equals(Object obj) {
        if(this == obj)
            return true;
        Employee emp = (Employee)obj;
        if(this.getEmpId().equalsIgnoreCase(emp.getEmpId())){
            return true;

        }
        return false;
    }
}

Let's write a Test class where we will put key-value (employee-department) pairs in HashMap. 

package com.blogspot.javasolutionsguide;

import java.util.HashMap;
import java.util.Map;

public class TestHashMap {
    public static void main(String[] args) {
        Employee emp1 = new Employee("Gaurav", "101");
        Employee emp2 = new Employee("Saurav", "102");
        Employee emp3 = new Employee("Sachin", "1001");
        Map<Employee, String> map = new HashMap<Employee, String>();
        map.put(emp1, "IT");
        map.put(emp2, "Accounts");
        map.put(emp3, "Human Resource");
        System.out.println(map);
        System.out.println("Employee 1 Department:" + map.get(emp1));
        System.out.println("Employee 2 Department:" + map.get(emp2));
        System.out.println("Employee 3 Department:" + map.get(emp3));
    }
}

Let's execute the program and check the output: 

Output:
Employee 1 Department:IT
Employee 2 Department:Accounts
Employee 3 Department:Human Resource

Let us see the structure of HashMap. Put a debug point at line 15, select map, right click and click on inspect or press Ctrl + Shift + I.



As you can see from above snapshot that at index 5, we have object of Employee class as key and "Human Resource" as value, but next here is null.
At index 15, we have object of Employee as Key, "Accounts" as value and next variable here is pointing to another Entry object. So here is our third element.

Let's expand it more to see where this next is pointing.



As you can see from above snapshot that next variable of Entry class object at index 15 is pointing to another Entry object which has Employee object as key and "IT" as value. This is because both employees with empId 101 and 102 have same hashCode which is calculated using following code in hashCode() method defined in Employee class :


  5 * empId.length;

so for empId 101, hashCode = 5 * 3 = 15
and for empId 102, hashCode = 5 *3 = 15


So we can conclude from this, that if two keys in hashmap have same hashcode, they will be stored at the same index(bucket or Entry array) in the form of linked list.


When we put element in HashMap

Below method of HashMap class gets called   
/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
 Let's see what put method is doing :

- If key is null, key value pair is stored at the zeroth index of the array named as table(logical bucket),because hashcode for Null is always zero and index will be always zero in this case.
- If key is not null, hashcode method of the Key object(Employee in our case) is called, which returns hashcode for key object - hash(int) method is called with the calculated hashcode as argument. This extra hashing is added by Java library designers to overcome the issue when developers would have written an inefficient hashcode. In general while writing logic for hashcode ,it should be kept in mind that hashcode returned for different objects should be such that it distributes objects across various available buckets.
- On the basis of calculated hashcode in the above step, index is calculated by calling indexFor(hash, table.length) method, which,which is the index of array table where this key value pair(in the form of Entry object) will be stored.
- If there is no Entry Object already at calculated index, then our new Entry object will be placed at the calculated index.
- If there is already an Entry object at calculated index and new key object also has same hashcode, equals method of Key object is called and 
   - if value of new key object is also equal to the existing Key object's value, then Value object is replaced with the new  Value.
  - If value of new Key object is not equal to the value of existing Key object value.then Entry object is iterated unless next variable is null. Once next variable with null value is found, next variable of existing Entry object will start pointing to new Entry Object, thus maintaining Linked List structure. 

so because in our case hashcode returned for following two Employee objects were equal, but they were not considered equal by the equals() method due to there different empId values(101, 102), they landed in same array index(bucket) i.e. 15 maintained in the form of Linked list structure.
Employee emp1 = new Employee("Gaurav","101");
Employee emp2 = new Employee("Saurav","102");

hashcode for emp1 = 5*empId.length(); = 5*3 = 15
hashcode for emp2 = 5*empId.length(); = 5*3 = 15
When we get element from HashMap

Below method of HashMap class gets called
/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Let's see what get method is doing:

- If key is null, value at zeroth index(bucket) is returned.

- If key is not null, then hashCode method of the Key object is called and hashcode is    calculated.
- hash() method is called with calculated hashcode as argument .hash() method returns the  index(bucket).
- equals method of key(Employee class) is called and key is searched by traversing linked list and corresponding value is returned.

For example:

If we do map.get(emp2);

It will first call the hashCode() method of Employee which would help in finding the index of Entry array or bucket where this key(and corresponding value) can be present. So, which means we will reach at index 15. Now using equals() method, value of empId of emp2 object wiill be compared with each of the keys at index 15 and once value of the key matches, then corresponding value would be returned.

Hope it clarified that how HashMap works internally in Java.

What is the difference between size and capacity of HashMap and how size and capacity of Hashmap grows
  
Capacity of HashMap is number of maximum elements it can store.When we instantiate a HashMap like below,it has initial default capacity of 16.

Map<String,String> map = new HashMap<String,String>();

Size of HashMap is actual number of  elements present in it.

Let's see with an example,how size and capacity of HashMap grows.
package com.blogspot.javasolutionsguide;

import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.Map;

public class TestHashMap {
          public static void main(String[] args) throws Exception {
                     Employee emp1 = new Employee("Gaurav","101");
                     Employee emp2 = new Employee("Saurav","102");
                     Employee emp3 = new Employee("Sachin","1001");
                     Employee emp4 = new Employee("Sachin1","1002");
                     Employee emp5 = new Employee("Sachin2","1003");
                     Employee emp6 = new Employee("Sachin3","1004");
                     Employee emp7 = new Employee("Sachin4","1005");
                     Employee emp8 = new Employee("Sachin5","1006");
                     Employee emp9 = new Employee("Sachin6","1007");
                     Employee emp10 = new Employee("Sachin7","1008");
                     Employee emp11 = new Employee("Sachin8","1009");
                     Employee emp12 = new Employee("Sachin9","1010");
                     Employee emp13 = new Employee("Sachin10","1011");
                     
                     Map<Employee,String> map = new HashMap<Employee,String>();
                     System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
                     map.put(emp1, "IT");
                     map.put(emp2, "Accounts");
                     map.put(emp3, "Human Resource");
                     map.put(emp4, "Accounts");
                     map.put(emp5, "Human Resource");
                     map.put(emp6, "Accounts");
                     map.put(emp7, "IT");
                     map.put(emp8, "Human Resource");
                     map.put(emp9, "Accounts");
                     map.put(emp10, "IT");
                     map.put(emp11, "Human Resource");
                     map.put(emp12, "Accounts");
                     System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
                     map.put(emp13, "Human Resource");
                     System.out.println("Map size:"+map.size()+" "+"Map Capacity:"+getCapacity(map));
          }
/* 
  * This method returns capacity of HashMap by using reflection library.
  */         
 static int getCapacity(Map<Employee,String> map) throws Exception {
        Field dataField = HashMap.class.getDeclaredField("table");
        dataField.setAccessible(true);
        return ((Object[]) dataField.get(map)).length;
}
}
Output
Map size:0 Map Capacity:16
Map size:12 Map Capacity:16
Map size:13 Map Capacity:32

As you can see from above output that 
- Initial capacity of HashMap is 16,when we instantiate it with no element in it.
- When we added 13th element in the HashMap,its capacity doubled i.e. from 16 to 32.
- Size of HashMap at any time is equal to the number of elements(Key value pairs) in the map.
- Capacity at any time will be greater than size.

How Capacity of HashMap grows depends upon the threshold.Threshold is maximum size of HashMap after which it is resized.

threshold = capacity * load factor

Default load factor for HashMap is 0.75f.

As we know,initial capacity of HashMap is 16,so threshold is calculated as below 

i.e (75/100)*16 = 12

Which means ,when we add 13th element in the HashMap ,HashMap is resized and capacity is increased to double the initial capacity.

On the similar line,next time capacity will increase when threshold = (75*100)*32 = 24 is reached, i.e. on addition of 25th element.

and so on....

What happens with existing elements of HashMap when capacity is doubled?What is Rehashing?

When capacity of HashMap is doubled, rehashing of the existing elements of the HashMap happens.
Rehashing is a mechanism where in hashcode of the existing elements(of Keys ofcourse) is calculated again when capacity of HashMap changes.

Before we understand why rehashing is done, let's try to understand why we need to increase the capacity of HashMap.

As  we know, initial capacity of HashMap is 16,so say we initially added 16 elements in the HashMap. A well written hashcode method will place each of these 16 elements at separate 16 positions(array indexes or buckets),such that if we want to get element from HashMap, it will take just 1 look up.

Now say we increased number of elements to 32,keeping capacity to 16 only,in that case a well written hashcode method will distribute 2 elements in each bucket, such that getting element from any of the bucket will take maximum 2 look ups.

Now say we increased number of elements to 64,keeping capacity to 16 only,in that case a well written hashcode method will distribute 4 elements in each bucket,such that getting element from each bucket will take maximum 4 look ups.

Now say we increased number of elements to 128,keeping capacity to 16 only,in that case a well written hashcode method will distribute 8 elements in each bucket,such that getting element from each bucket will take maximum 8 look ups.

As we can see from above behavior that as number of elements in HashMap are increasing keeping capacity to 16,more look ups are required to get elements from HashMap and with number of elements increasing further it will degrade the performance of HashMap

So, what is the solution?

yes, you got it it right. This is the reason HashMap is designed in such as way that when number of elements increased to certain limit(threshold),it's capacity is doubled.

But only doubling capacity will not help,as long as elements in the map are distributed across the buckets. So to make sure that when we double the capacity, still elements are well distributed across buckets, rehashing is done.

Hope this article was helful to you and you got idea of How Hashmap Works Internally in Java .For  more collections tutorials you can go through my below articles. Also I post on Google+,so you can add me in your circle to get updates.


Various ways to Iterate over commonly Used collections vizArrayList,HashSet,HashMap

ArrayList Write Operations explored


ArrayList Read Operations explored