HashMap和ConcurrentHashMap源码学习
开篇:之前在从事传统软件开发时,使用最多的是HashMap进行存储。到了互联网公司之后发现许多map都为CouncurrentHashMap所以想研究明白。
经过查看源码可知:
HashMap
其实是对一个Entry类型的数组进行操作,所有的get、put、remove都是通过二次HASH后定位进行链表操作。当然也是非线程安全的。
put
- public V put(K key, V value) {
- if (key == null)
- return putForNullKey(value);
- int hash = hash(key.hashCode());
- int i = indexFor(hash, table.length);
- for (Entry<K,V> e = table[i]; e != null; e = e.next) {
- Object k;
- if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
- V oldValue = e.value;
- e.value = value;
- e.recordAccess(this);
- return oldValue;
- }
- }
- modCount++;
- addEntry(hash, key, value, i);
- return null;
- }
get
- public V get(Object key) {
- if (key == null)
- return getForNullKey();
- int hash = hash(key.hashCode());
- for (Entry<K,V> e = table[indexFor(hash, table.length)];
- e != null;
- e = e.next) {
- Object k;
- if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
- return e.value;
- }
- return null;
- }
迭代
- final Entry<K,V> nextEntry() {
- if (modCount != expectedModCount)
- throw new ConcurrentModificationException();
- Entry<K,V> e = next;
- if (e == null)
- throw new NoSuchElementException();
- if ((next = e.next) == null) {
- Entry[] t = table;
- while (index < t.length && (next = t[index++]) == null)
- ;
- }
- current = e;
- return e;
- }
从nextEntry代码可以看出在迭代开始时会对expectedModCount进行赋值,如果在迭代期间发生结构性变化,会抛出ConcurrentModificationException。
另外put、remove、 未加任何锁。 所以在高并发环境下是没有保证的。
注:HashTable,单纯的对put 、get、remove进行syncronized进行同步,大大降低效率。
ConcurrentHashMap
学习内容引用 http://www.iteye.com/topic/344876
首先通过源码学习了ConcurrentHashMap的实现
ConcurrentHashMap实现思路是对HashEntry[] 进行分段。进行结构性调整操作时如put、remove只会锁住本段对象。
- /**
- * The default initial capacity for this table,
- * used when not otherwise specified in a constructor.
- */
- static final int DEFAULT_INITIAL_CAPACITY = 16;
- /**
- * The default load factor for this table, used when not
- * otherwise specified in a constructor.
- */
- static final float DEFAULT_LOAD_FACTOR = 0.75f;
- /**
- * The default concurrency level for this table, used when not
- * otherwise specified in a constructor.
- */
- static final int DEFAULT_CONCURRENCY_LEVEL = 16;
- /**
- * The maximum capacity, used if a higher value is implicitly
- * specified by either of the constructors with arguments. MUST
- * be a power of two <= 1<<30 to ensure that entries are indexable
- * using ints.
- */
- static final int MAXIMUM_CAPACITY = 1 << 30;
- /**
- * The maximum number of segments to allow; used to bound
- * constructor arguments.
- */
- static final int MAX_SEGMENTS = 1 << 16; // slightly conservative
- /**
- * Number of unsynchronized retries in size and containsValue
- * methods before resorting to locking. This is used to avoid
- * unbounded retries if tables undergo continuous modification
- * which would make it impossible to obtain an accurate result.
- */
- static final int RETRIES_BEFORE_LOCK = 2;
常量定义:默认容量16、默认负载因子0.75、默认并发级别、最大容量2^30、最大并发段2^16。
- final int segmentMask;
- /**
- * Shift value for indexing within segments.
- */
- final int segmentShift;
- /**
- * The segments, each of which is a specialized hash table
- */
- final Segment<K,V>[] segments;
Segment<K,V>[] 段的数组。segmentMash段的数组下标长度。
segment常量
- transient volatile int count;
- /**
- * Number of updates that alter the size of the table. This is
- * used during bulk-read methods to make sure they see a
- * consistent snapshot: If modCounts change during a traversal
- * of segments computing size or checking containsValue, then
- * we might have an inconsistent view of state so (usually)
- * must retry.
- */
- transient int modCount;
- /**
- * The table is rehashed when its size exceeds this threshold.
- * (The value of this field is always <tt>(int)(capacity *
- * loadFactor)</tt>.)
- */
- transient int threshold;
- /**
- * The per-segment table.
- */
- transient volatile HashEntry<K,V>[] table;
- /**
- * The load factor for the hash table. Even though this value
- * is same for all segments, it is replicated to avoid needing
- * links to outer object.
- * @serial
- */
- final float loadFactor;
put操作
- V put(K key, int hash, V value, boolean onlyIfAbsent) {
- lock();
- try {
- int c = count;
- if (c++ > threshold) // ensure capacity
- rehash();
- HashEntry<K,V>[] tab = table;
- int index = hash & (tab.length – 1);
- HashEntry<K,V> first = tab[index];
- HashEntry<K,V> e = first;
- while (e != null && (e.hash != hash || !key.equals(e.key)))
- e = e.next;
- V oldValue;
- if (e != null) {
- oldValue = e.value;
- if (!onlyIfAbsent)
- e.value = value;
- }
- else {
- oldValue = null;
- ++modCount;
- tab[index] = new HashEntry<K,V>(key, hash, first, value);
- count = c; // write-volatile
- }
- return oldValue;
- } finally {
- unlock();
- }
- }
put操作首先会对容量进行校验,然后查找是否有相同key的值,如果存在在根据需要是否替换旧值,如果不存在则新增,整个过程是加锁状态进行。
并且put是直接把HashEntry放到链首。
注:count是volatile的,并且对count的赋值一定要在操作的最后,以保证读取count总是最新的。
引:写volatile变量和它之前的读写操作是不能reorder的,读volatile变量和它之后的读写操作也是不能reorder的。
修改modCount发生在修改count之前,由于count是volatile变量,修改modCount不能和写count的操作reorder,读取count和它之后的操作,比如读取modCount,不能reorder。有了这两个不能reorder才能保证读取了count之后,能读到线程在写count之前的写入的modCount值,这个modCount值是几乎最新的。
如果在读modCount之前不读count,读modCount甚至可能会reorder到写modCount之前。
用reorder解释总是太复杂了,不如用happens-before来得简洁。当一个线程I对count的读时,它读到的值必定是另一个线程,假设是线程II,最近对count的写。这个两个操作存在happens-before关系,即线程II对count的写happens-before线程I对count的读,记作:II:W(count) < I:R(count)。单线程的happens-before规则,又有II:W(modCount) < II:W(count)(查看源代码会发现在写count之前必定有写modCount),以及 I:R(count) < I:R(modCount),根据传递规则有,II:W(modCount) < I:R(modCount),这就是说线程I至少能够读取到线程II写count之前的modCount值。我曾经写了一篇关于happens-before的文章,有些表达可能有误,但大致还是对的,http://www.iteye.com/topic/260515。
containsValue
- public boolean containsValue(Object value) {
- if (value == null)
- throw new NullPointerException();
- // See explanation of modCount use above
- final Segment<K,V>[] segments = this.segments;
- int[] mc = new int[segments.length];
- // Try a few times without locking
- for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
- int sum = 0;
- int mcsum = 0;
- for (int i = 0; i < segments.length; ++i) {
- int c = segments[i].count;
- mcsum += mc[i] = segments[i].modCount;
- if (segments[i].containsValue(value))
- return true;
- }
- boolean cleanSweep = true;
- if (mcsum != 0) {
- for (int i = 0; i < segments.length; ++i) {
- int c = segments[i].count;
- if (mc[i] != segments[i].modCount) {
- cleanSweep = false;
- break;
- }
- }
- }
- if (cleanSweep)
- return false;
- }
- // Resort to locking all segments
- for (int i = 0; i < segments.length; ++i)
- segments[i].lock();
- boolean found = false;
- try {
- for (int i = 0; i < segments.length; ++i) {
- if (segments[i].containsValue(value)) {
- found = true;
- break;
- }
- }
- } finally {
- for (int i = 0; i < segments.length; ++i)
- segments[i].unlock();
- }
- return found;
- }
为了取到最新的modCount 必须在读取前获取count。原因参考上面的引。
判断是否包含某value时,进行2次尝试,如果中间存在修改。则加锁在判断一次。
remove
- V remove(Object key, int hash, Object value) {
- lock();
- try {
- int c = count – 1;
- HashEntry<K,V>[] tab = table;
- int index = hash & (tab.length – 1);
- HashEntry<K,V> first = tab[index];
- HashEntry<K,V> e = first;
- while (e != null && (e.hash != hash || !key.equals(e.key)))
- e = e.next;
- V oldValue = null;
- if (e != null) {
- V v = e.value;
- if (value == null || value.equals(v)) {
- oldValue = v;
- // All entries following removed node can stay
- // in list, but all preceding ones need to be
- // cloned.
- ++modCount;
- HashEntry<K,V> newFirst = e.next;
- for (HashEntry<K,V> p = first; p != e; p = p.next)
- newFirst = new HashEntry<K,V>(p.key, p.hash,
- newFirst, p.value);
- tab[index] = newFirst;
- count = c; // write-volatile
- }
- }
- return oldValue;
- } finally {
- unlock();
- }
- }
删除节点:由于HashEntry的next节点为fina引用l不可修改,所以当删除节点时候被删除节点之前的节点都要复制一遍。
为什么要定义成FINAL:网上有解释为 因为next属性为final的,不可以被重置。直接删除,就无法保持链结果的连续性了。 这点我不太理解。因为删除操作加锁了并发不会出现问题。这块有待研究。