1. 程式人生 > >JDK1.7&1.8源碼對比分析【集合】HashMap

JDK1.7&1.8源碼對比分析【集合】HashMap

rst www. sub com final 建議 views trie emp

前言

在JDK1.8源碼分析【集合】HashMap文章中,我們分析了HashMap在JDK1.8中新增的特性(引進了紅黑樹數據結構),但是為什麽要進行這個優化呢?這篇文章我們通過對比JDK1.7和1.8來分析優化的原因。

眾所周知,HashMap底層是基於 數組 + 鏈表 的方式實現的,不過在JDK1.7和1.8中具體實現稍有不同。

目錄

一、對比分析

1. 1.7版本

2. 1.8版本

總結

一、對比分析

1. 1.7版本

1.7 中的數據結構圖:

技術分享圖片

先來看看1.7中幾個比較核心的成員變量:

/**
 * The default initial capacity - MUST be a power of two.
 * 初始桶大小,因為底層是數組,所以這是數組的大小
 
*/ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * 桶最大值 */ static final int MAXIMUM_CAPACITY = 1 << 30; /**
* The load factor used when none specified in constructor. * 默認的負載因子 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * An empty table instance to share when the table is not inflated. */ static final Entry<?,?>[] EMPTY_TABLE = {}; /** * The table, resized as necessary. Length MUST Always be a power of two. * 真正存放數據的數組
*/ transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; /** * The number of key-value mappings contained in this map. * Map存放數量的大小 */ transient int size; /** * The next size value at which to resize (capacity * load factor). * 桶大小,可在初始化時顯式指定 * @serial */ // If table == EMPTY_TABLE then this is the initial capacity at which the // table will be created when inflated. int threshold; /** * The load factor for the hash table. * 負載因子,可在初始化時顯式指定 * * @serial */ final float loadFactor;

這幾個成員變量中,比較有意思的是負載因子。由於給定的HashMap的容量大小是固定的,比如默認初始化:

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}


/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

給定的默認容量為 16,負載因子為 0.75。Map 在使用過程中不斷的往裏面存放數據,當數量達到了 16 * 0.75 = 12 就需要將當前 16 的容量進行擴容,而擴容這個過程涉及到 rehash、復制數據等操作,所以非常消耗性能。因此通常建議能提前預估 HashMap 的大小最好,盡量的減少擴容帶來的性能損耗。

根據代碼可以看到真正存放數據的是:

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

這個數組,接下來看看它是如何實現的:

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;

    /**
     * Creates new entry.
     */
    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }

    public final K getKey() {
        return key;
    }

    public final V getValue() {
        return value;
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry e = (Map.Entry)o;
        Object k1 = getKey();
        Object k2 = e.getKey();
        if (k1 == k2 || (k1 != null && k1.equals(k2))) {
            Object v1 = getValue();
            Object v2 = e.getValue();
            if (v1 == v2 || (v1 != null && v1.equals(v2)))
                return true;
        }
        return false;
    }

    public final int hashCode() {
        return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
    }

    public final String toString() {
        return getKey() + "=" + getValue();
    }

    /**
     * This method is invoked whenever the value in an entry is
     * overwritten by an invocation of put(k,v) for a key k that‘s already
     * in the HashMap.
     */
    void recordAccess(HashMap<K,V> m) {
    }

    /**
     * This method is invoked whenever the entry is
     * removed from the table.
     */
    void recordRemoval(HashMap<K,V> m) {
    }
}

Entry 是 HashMap 中的一個內部類,從他的成員變量很容易看出:

  • key是寫入的鍵;
  • value是key對應的值;
  • next用於實現鏈表結構,指向下一個鏈表節點;
  • hash存放的是當前key的hashCode。

知曉了基本結構,再來看看put、get函數:

put函數

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    // 判斷當前數組是否需要初始化
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // 如果 key 為空,則 put 一個空值進去
    if (key == null)
        return putForNullKey(value);
    // 根據 key 計算出 hashcode
    int hash = hash(key);
    // 根據計算出的 hashcode 定位出所在桶
    int i = indexFor(hash, table.length);
    // 如果桶是一個鏈表則需要遍歷判斷裏面的 hashcode、key 是否和傳入 key 相等,如果相等則進行覆蓋,並返回原來的值
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    // 如果桶是空的,說明當前位置沒有數據存入;新增一個 Entry 對象寫入當前位置
    addEntry(hash, key, value, i);
    return null;
}
/**
 * Adds a new entry with the specified key, value and hash code to
 * the specified bucket.  It is the responsibility of this
 * method to resize the table if appropriate.
 *
 * Subclass overrides this to alter the behavior of put method.
 */
void addEntry(int hash, K key, V value, int bucketIndex) {
    // 判斷是否需要擴容
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 如果需要就進行兩倍擴充,並將當前的 key 重新 hash 並定位
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    // 將當前位置的桶傳入到新建的桶中,如果當前桶有值就會在位置形成鏈表
    createEntry(hash, key, value, bucketIndex);
}

/**
 * Like addEntry except that this version is used when creating entries
 * as part of Map construction or "pseudo-construction" (cloning,
 * deserialization).  This version needn‘t worry about resizing the table.
 *
 * Subclass overrides this to alter the behavior of HashMap(Map),
 * clone, and readObject.
 */
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

get函數

再來看看get函數:

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it‘s also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

/**
 * Returns the entry associated with the specified key in the
 * HashMap.  Returns null if the HashMap contains no mapping
 * for the key.
 */
final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    // 根據 key 計算出 hashcode,然後定位到具體的桶中
    int hash = (key == null) ? 0 : hash(key);
    // 判斷該位置是否為鏈表
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        // 根據 key、key 的 hashcode 是否相等來返回值
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    // 啥都沒取到就直接返回 null
    return null;
}

2. 1.8版本

不知道通過1.7的實現大家看出需要優化的點沒有?

其中一個很明顯的地方就是:當 Hash 沖突嚴重時,在桶上形成的鏈表會變的越來越長,這樣在查詢時的效率就會越來越低;時間復雜度為O(N)。

因此 1.8 中重點優化了這個查詢效率。

1.8 中的數據結構圖:

技術分享圖片

還是一樣,先來看看幾個核心的成員變量:

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 * 用於判斷是否需要將鏈表轉換為紅黑樹的閾值
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

/**
 * JDK1.7是HashEntry,1.8修改為Node
 */
transient Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

Node 的核心組成其實也是和 1.7 中的 HashEntry 一樣,存放的都是key、value、hashCode、next 等數據。

再來看看存取數據的put、get函數。

put函數

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don‘t change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 判斷當前桶是否為空,空的就需要初始化(resize 中會判斷是否進行初始化)
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 根據當前 key 的 hashcode 定位到具體的桶中並判斷是否為空,為空表明沒有 Hash 沖突就直接在當前位置創建一個新桶即可
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
                // 如果當前桶有值( Hash 沖突),那麽就要比較當前桶中的 key、key 的 hashcode 與寫入的 key 是否相等,相等就賦值給 e
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 如果當前桶為紅黑樹,那就要按照紅黑樹的方式寫入數據
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 如果是個鏈表,就需要將當前的 key、value 封裝成一個新節點寫入到當前桶的後面(形成鏈表)
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 判斷當前鏈表的大小是否大於預設的閾值,大於時就要轉換為紅黑樹
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 如果在遍歷過程中找到 key 相同時直接退出遍歷
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 如果 e != null 就相當於存在相同的 key,那就需要將值覆蓋
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 判斷是否需要進行擴容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

get函數

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it‘s also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * Implements Map.get and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 將 key hash 之後取得所定位的桶
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 判斷桶的第一個位置(有可能是鏈表、紅黑樹)的 key 是否為查詢的 key,是就直接返回 value
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 如果第一個不匹配,則判斷它的下一個是紅黑樹還是鏈表
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                // 紅黑樹就按照樹的查找方式返回值
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 不然就按照鏈表的方式遍歷匹配返回值
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

從這兩個核心方法(get/put)可以看出 1.8 中對大鏈表做了優化,修改為紅黑樹之後查詢效率直接提高到了O(logn)。

但是 HashMap 原有的問題也都存在,比如在並發場景下使用時容易出現死循環。

final HashMap<String, String> map = new HashMap<String, String>();
for (int i = 0; i < 1000; i++) {
    new Thread(new Runnable() {
        @Override
        public void run() {
            map.put(UUID.randomUUID().toString(), "");
        }
    }).start();
}

但是為什麽呢?看過上文的還記得在 HashMap 擴容的時候會調用resize() 方法,就是這裏的並發操作容易在一個桶上形成環形鏈表;這樣當獲取一個不存在的 key 時,計算出的 index 正好是環形鏈表的下標就會出現死循環。下一篇將詳細介紹HashMap死循環的原因。

還有一個值得註意的是 HashMap 的遍歷方式,通常有以下幾種:

Iterator<Map.Entry<String, Integer>> entryIterator = map.entrySet().iterator();
while (entryIterator.hasNext()) {
    Map.Entry<String, Integer> next = entryIterator.next();
    System.out.println("key=" + next.getKey() + " value=" + next.getValue());
}

Iterator<String> iterator = map.keySet().iterator();
while (iterator.hasNext()){
    String key = iterator.next();
    System.out.println("key=" + key + " value=" + map.get(key));

}

強烈建議使用第一種 EntrySet 進行遍歷。

第一種可以把 key value 同時取出,第二種還得需要通過 key 取一次 value,效率較低。

總結

HashMap無論是 1.7 還是 1.8 其實都能看出 JDK 沒有對它做任何的同步操作,所以並發會出問題,甚至出現死循環導致系統不可用。因此 JDK 推出了專項專用的 ConcurrentHashMap ,該類位於java.util.concurrent 包下,專門用於解決並發問題。

JDK1.7&1.8源碼對比分析【集合】HashMap