探索之路：HashMap

阿新 • • 發佈：2019-01-01

探索之路：HashMap

HashMap是中高階開發工程師必備的知識，無論是求職面試的道路上還是實戰運用當中，無處不在。

HashMap的資料結構

首先我們來了解下JAVA的資料結構：陣列和連結串列。

陣列：有序的元素序列。主要有下標和元素組成。每個下標對應著元素。通過下標可以快速定位，其時間複雜度O(1),通過固定值查詢，需要逐一比對，其時間複雜度為O(n)，當然也要看是否有序陣列，如果是則用二分查詢等查詢方法，其時間複雜度為O(logn)但要刪除或新增一個元素，需要移動對應下標地址，其時間複雜度為O(n)。

連結串列：一種物理儲存單元上非連續、非循序的儲存結構，其邏輯順序是通過連結串列中的指標連結次序實現的。查詢一個元素，需要遍歷整個連結串列，所以查詢循序慢，其時間複雜度為O(n)，但新增刪除比較快，刪除元素，只需要將指標指向下一個位置，其時間複雜度為O(1)。

HashMap主要是陣列+連結串列的資料結構。這種結構主要是為了解決根據key計算hashCode相同而衝突的設計的。Hash的解決衝突方法有：1.開放地址法、鏈地址法、再雜湊法、建立公共溢位區。而HashMap採用的是鏈地址法：將所有相同的雜湊地址相同的記錄都連結到同一個連結串列中。

HashMap的工作原理

HashMap的主幹主要是Entry[]，map的內容都儲存到了Entry

static class Entry<K,V> implements Map.Entry<K,V> {
      final K key;// Key-value結構的key,即鍵 

      V value;//儲存值，即值
      Entry<K,V> next;//指向下一個連結串列節點
      final int hash;//雜湊值
      /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        } 
}

注意：JDK1.8對HashMap做了優化，Entry改成了Node，即紅黑樹（又稱平衡二叉樹），原始碼如下：

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

這裡寫圖片描述
HashMap基於Hash原理，上圖是HashMap的結構圖。陣列是HashMap的主體，連結串列就是用來解決衝突的。如果定位到的陣列位置不含連結串列，即next指向null，那麼查詢和新增操作就很快，時間複雜度為O(1),只需要一次定址。如果包含連結串列，對於新增操作的時間複雜度也是O(1)，這個是因為最新的Entry會插入到連結串列頭部，只是改變下引用鏈便可，但對於操作來講，此時就需要遍歷連結串列，然後通過key物件的equal方法逐一比較，其時間複雜度是O(1)。

現在來看看HashMap的幾個預設值

這裡寫圖片描述

HashMap的構造器方法（以下來源於jdk1.8）：

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity); // 這個是1.8才有的，1.7用下面兩個代替
    /*threshold = initialCapacity;
　　　　//init方法在HashMap中沒有實際實現，不過在其子類如 linkedHashMap中就會有對應實現　
        init();
        */
}
/**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

從A原始碼中可以看出，陣列Table分配記憶體空間並不再構造器中實現，當然D原始碼，即建構函式傳的是一個Map物件，會分配空間。因為其的空間實在put方法實現的。

以下是JDK1.7的方法：

public V put(K key, V value) {
//如果table陣列為空陣列{}，進行陣列填充（為table分配實際記憶體空間），入參為threshold，此時threshold為initialCapacity 預設是1<<4(24=16)
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
   //如果key為null，儲存位置為table[0]或table[0]的衝突鏈上
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);//對key的hashcode進一步計算，確保雜湊均勻
    int i = indexFor(hash, table.length);//獲取在table中的實際位置
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    //如果該對應資料已存在，執行覆蓋操作。用新value替換舊value，並返回舊value
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;//保證併發訪問時，若HashMap內部結構發生變化，快速響應失敗
    addEntry(hash, key, value, i);//新增一個entry
    return null;
}

以下是JDK1.8原始碼，區別和1.7是非常大的，1.7 rehash的時候，舊連結串列遷移新連結串列的時候，如果在新的陣列索引位置相同的時候，用的倒置方式，而1.8使用紅黑樹

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)//如果table[i]為空，則進行擴容
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)//根據鍵key計算hash值進行得到陣列下標i，其值如果為空，則建立一個新的Node
        tab[i] = newNode(hash, key, value, null);// 
    else {//否則，
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))// hash和key都要判斷存放的key是否相同，如果相同，則覆蓋之前的舊值
            e = p;
        else if (p instanceof TreeNode)// 如果table[i]是紅黑樹，則在樹中插入值
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {// 如果不是，先判斷連結串列長度是否大於8，要是大於8，就把連結串列轉換為紅黑樹，並執行插入操作
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;//保證併發訪問時，若HashMap內部結構發生變化，快速響應失敗
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

HashMap的陣列長度

其長度始終是2的次冪。其1.7的inflateTable方法或1.8的resize方法,其都是通過位移運算的。而且

threshold取capacity*loadFactor和MAXIMUM_CAPACITY+1的最小值，capaticy一定不會超過MAXIMUM_CAPACITY，除非loadFactor大於1

一下是jdk1.7的原始碼：

private void inflateTable(int toSize) {
   int capacity = roundUpToPowerOf2(toSize);//capacity一定是2的次冪
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }

從這個可以看出，HashMap的擴容時，用為了位移運算，而且highestOneBit的意思取的高位1的值，那麼這個結果永遠是2的次冪。比如5，那麼二進位制：

0000 0101

0001 0101

取高位1，那麼就是2的4次方，即8.

那為什麼HashMap容量一定要為2的冪？

目的就是為了讓HashMap的元素存放更均勻。最理想的狀態是，每個Entry陣列位置都只有一個位置，即next沒有值，也就是內有單鏈表，這樣這樣查詢效率高，不用遍歷單鏈表，更不用去用equals比較K。一般考慮分佈均勻，都會用到%（取模），雜湊值%容量=bucketIndex。SUN的大神們的做法參考一下程式碼：

JDK1.7

/**
 * Returns index for hash code h. 
 */
static int indexFor(int h, int length) {
    return h & (length-1);// h是通過K的hashCode最終計算出來的雜湊值，並不是hashCode本身。length是目前的容量。
}

JDK1.8

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

從這原始碼看出,無論是1.7還是1.8，再次運用位運算。h是通過K的hashCode最終計算出來的雜湊值，並不是hashCode本身。length是目前的容量。當容量一定是2^n時，h&(lenght-1) == h%length,這兩個是等價不等效的，位運算是給計算運算的，效率非常高，不是給我們人運算的，我們都是用十進位制，否則沒有很深的數學功底，是很難理解的。先介紹下二進位制計算：

2^n轉換為二進位制是1+n個0，減1後是0+n個1。比如16=2^4=10000,15=16-1=2^4-1=01111。

&運算: 都為1時候，結果為1，

迴歸HashMap：那麼如果h為16，h&(16-1)的結果肯定大於等於0，小於等於15。如果h<=15,那麼與01111進行&運算的結果就是h的本身，如果h>15,那麼計算的結果取決於h的後四位位運算，這個結果就是h%length的結果。

由於&的運算，任何數字與1進行&運算，其結果都取決與任何數，如果和0進行&運算，其結果都是0，故而從概率來說，和1計算的相同值概率是50%，與0計算的值100%都是0。所以length-1的長度為2^n-1最好，即容量length為2的次冪最合適。

探索之路：HashMap

探索之路：HashMap

HashMap的資料結構

HashMap的工作原理

HashMap的陣列長度

探索之路：HashMap

Android探索之路：在介面上實現超連結進行跳轉

Android探索之路：實現下拉重新整理功能

android探索之路：擷取字串substring()的用法

Android探索之路：實現登入介面的記住密碼功能

OpenCV探索之路（十三）：詳解掩膜mask

OpenCV探索之路（十六）：圖像矯正技術深入探討

OpenCV探索之路（十五）：角點檢測

Angular4.0踩坑之路：探索子路由和懶加載

OpenCV探索之路（二十六）：如何去除票據上的印章

追隨自己的價值觀：用研經理 Anne Diaz 職業探索之路

RabbitMQ探索之路（二）：RabbitMQ在Linux下的安裝

Redis探索之路(六)：Redis的常用命令

OpenCV探索之路（六）：邊緣檢測（canny、sobel、laplacian）

【Java 安全技術探索之路系列：J2SE安全架構】之二：安全管理器

OpenCV探索之路（十八）：使用imwrite調整儲存的圖片質量

【Java安全技術探索之路系列：Java可擴充套件安全架構】之一：Java可擴充套件安全架構開篇

【Java 安全技術探索之路系列：J2SE安全架構】之五：類載入器

【Java安全技術探索之路系列：Java可擴充套件安全架構】之十四：JAAS（一）：JAAS架構介紹

【Java 安全技術探索之路系列：J2SE安全架構】之六：安全管理工具

探索之路：HashMap

探索之路：HashMap

HashMap的資料結構

HashMap的工作原理

HashMap的陣列長度

相關推薦