跟著原始碼看ArrayList、LinkedList、HashMap、HashSet的內部儲存機制
阿新 • • 發佈:2019-02-15
近來閒著沒事,就突發奇想來研究下java中常用的各種集合的內部儲存機制。為什麼呢,因為不同的儲存機制是為了適用不同的使用場景。如鏈式儲存的特性就是儲存長度可以隨意改變,插入刪除方便,缺點就是每次讀取都要從頭一個一個的找,讀取不方便;線性儲存的特性就是可以快速隨意查詢,讀取方便,但插入刪除的話可能就要挪移其它的資料位置了,就是插入刪除不方便。因為在日常程式設計中常碰到對集合資料的存取操作,為了達到對資料的高效率使用,我們就有必要了解這些資料在計算機的內部儲存機制。
- ArrayList
當從名字我們就可以判斷出它的底層儲存是陣列儲存,也就是線性儲存,array翻譯成英文就是陣列。下面看下ArrayList的原始碼
ArrayList的預設無參建構函式裡就一句程式碼,而array 的型別是Object[],java中語法規定這是陣列的申明形式,而陣列是線性儲存的一種形式。j陣列的一個特性就是初始化陣列時必須設定它的一個儲存長度,且之後不能改變。所以上面的判斷沒錯,ArrayList適用於需要頻繁讀取操作的場景。public class ArrayList<E> extends AbstractList<E> implements Cloneable, Serializable, RandomAccess { /** * The minimum amount by which the capacity of an ArrayList will increase. * This tuning parameter controls a time-space tradeoff. This value (12) * gives empirically good results and is arguably consistent with the * RI's specified default initial capacity of 10: instead of 10, we start * with 0 (sans allocation) and jump to 12. */ private static final int MIN_CAPACITY_INCREMENT = 12; /** * The number of elements in this list. */ int size; /** * The elements in this list, followed by nulls. */ transient Object[] array; /** * Constructs a new instance of {@code ArrayList} with the specified * initial capacity. * * @param capacity * the initial capacity of this {@code ArrayList}. */ public ArrayList(int capacity) { if (capacity < 0) { throw new IllegalArgumentException("capacity < 0: " + capacity); } array = (capacity == 0 ? EmptyArray.OBJECT : new Object[capacity]); } /** * Constructs a new {@code ArrayList} instance with zero initial capacity. */ public ArrayList() { array = EmptyArray.OBJECT; }
- LinkedList
Link的意思就是連結,連結是鏈式儲存的一種,所以它就是鏈式儲存。/** * Constructs a new empty instance of {@code LinkedList}. */ public LinkedList() { voidLink = new Link<E>(null, null, null); voidLink.previous = voidLink; voidLink.next = voidLink; }
private static final class Link<ET> { ET data; Link<ET> previous, next; Link(ET o, Link<ET> p, Link<ET> n) { data = o; previous = p; next = n; } }
上面就是LinkedList的一個構造方法和Link的一個構造方法,LinkedList裡的一個數據就是Link型別。Link的中儲存的是它儲存的資料和它自己的前後資料指向。
LinkedList適用於需要頻繁地資料插入刪除操作的場景。 - HashMap
它的話從名字就不好判讀是哪一種儲存型別了。從名字看它是根據雜湊值儲存的鍵值對集合,但是這個集合底層又是怎麼儲存的呢?看程式碼/** * Constructs a new empty {@code HashMap} instance. */ @SuppressWarnings("unchecked") public HashMap() { table = (HashMapEntry<K, V>[]) EMPTY_TABLE; threshold = -1; // Forces first put invocation to replace EMPTY_TABLE }
/** * The hash table. If this hash map contains a mapping for null, it is * not represented this hash table. */ transient HashMapEntry<K, V>[] table;
構造方法中HashMap儲存的是一個table,而table的型別是陣列,因而HashMap底層儲存屬於執行緒陣列儲存。因它帶了一個雜湊值,故HashMap裡陣列的資料的位置會因每個資料的雜湊值不同而動態改變。上面講到陣列的長度不能改變,當HashMap儲存的資料長度超過它的容量的時候,它又是怎麼增加資料的呢?
我們看tab = doubleCapacity();/** * Maps the specified key to the specified value. * * @param key * the key. * @param value * the value. * @return the value of any previous mapping with the specified key or * {@code null} if there was no such mapping. */ @Override public V put(K key, V value) { if (key == null) { return putValueForNullKey(value); } int hash = Collections.secondaryHash(key); HashMapEntry<K, V>[] tab = table; int index = hash & (tab.length - 1); for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) { if (e.hash == hash && key.equals(e.key)) { preModify(e); V oldValue = e.value; e.value = value; return oldValue; } } // No entry for (non-null) key is present; create one modCount++; if (size++ > threshold) { tab = doubleCapacity(); index = hash & (tab.length - 1); } addNewEntry(key, value, hash, index); return null; }
當插入資料時長度超過它的容量時,內部又new了一個長度為原有長度兩倍的陣列,然後把原來的資料儲存到新資料中。/** * Doubles the capacity of the hash table. Existing entries are placed in * the correct bucket on the enlarged table. If the current capacity is, * MAXIMUM_CAPACITY, this method is a no-op. Returns the table, which * will be new unless we were already at MAXIMUM_CAPACITY. */ private HashMapEntry<K, V>[] doubleCapacity() { HashMapEntry<K, V>[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { return oldTable; } int newCapacity = oldCapacity * 2; HashMapEntry<K, V>[] newTable = makeTable(newCapacity); if (size == 0) { return newTable; } for (int j = 0; j < oldCapacity; j++) { /* * Rehash the bucket using the minimum number of field writes. * This is the most subtle and delicate code in the class. */ HashMapEntry<K, V> e = oldTable[j]; if (e == null) { continue; } int highBit = e.hash & oldCapacity; HashMapEntry<K, V> broken = null; newTable[j | highBit] = e; for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) { int nextHighBit = n.hash & oldCapacity; if (nextHighBit != highBit) { if (broken == null) newTable[j | nextHighBit] = n; else broken.next = n; broken = e; highBit = nextHighBit; } } if (broken != null) broken.next = null; } return newTable; }
HashMap也是適用於需要頻繁讀取操作的場景。 - HashSet
基於雜湊值的set集合,它是怎麼儲存的呢;請看下面*/ public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, Serializable { private static final long serialVersionUID = -5024744406713321676L; transient HashMap<E, HashSet<E>> backingMap; /** * Constructs a new empty instance of {@code HashSet}. */ public HashSet() { this(new HashMap<E, HashSet<E>>()); }
它裡面new了一個HashMap,天哪!原來HashSet裡面是這樣的。也就是HashSet的資料是儲存在HashMap中,所以 HashSet也是適用於需要頻繁讀取操作的場景。HashSet(HashMap<E, HashSet<E>> backingMap) { this.backingMap = backingMap; }