C# Dictionary, SortedDictionary, SortedList
就我个人觉得Dictionary, SortedDictionary, SortedList 这几个类的使用是比较简单的,只要稍微花点时间在网上查找一点资料,然后在阅读以下源码就理解的很清楚了。为什么要写这一片文章了,看一下code吧:
Dictionary<int, object> dict = new Dictionary<int, object>();
//load data to dict
int key = 1;
object obj = null;
if (dict.ContainsKey(key))
{
obj = dict[key];
}
本来程序在初始化的时候会初始化一个Dictionary,然后在程序很多地方需要读Dictionary,然后一同事刚开始就是这样写的code,后来说字典查找ContainsKey比较慢,所以就改为SortedDictionary,按照key排序的字典。 而我一般是用普通的Dictionary的 dict.TryGetValue(key, out obj)方法就可以了。所以就有了这篇文章,先说一下 结论吧:
Dictionary<TKey,TValue>泛型类提供了从一组键到一组值的映射。字典中的每个添加项都由一个值及其相关联的键组成。通过键来检索值的速度是非常快的,接近于 O(1),这是因为Dictionary<TKey,TValue>类是作为一个哈希表来实现的。检索速度取决于为 TKey 指定的类型的哈希算法的质量。
SortedDictionary<TKey, TValue>泛型类是检索运算复杂度为 O(log n) 的二叉搜索树,其中n是字典中的元素数。就这一点而言,它与SortedList<TKey, TValue>泛型类相似。这两个类具有相似的对象模型,并且都具有O(logn)的检索运算复杂度。这两个类的区别在于内存的使用以及插入和移除元素的速度:
SortedList<TKey, TValue>使用的内存比SortedDictionary<TKey, TValue>少。SortedDictionary<TKey, TValue>可对未排序的数据执行更快的插入和移除操作:它的时间复杂度为O(logn),而SortedList<TKey, TValue>为 O(n)。如果使用排序数据一次性填充列表,则SortedList<TKey, TValue>比SortedDictionary<TKey, TValue>快。
首先来看Dictionary的实现:
public class Dictionary<TKey,TValue>: IDictionary<TKey,TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue>, ISerializable, IDeserializationCallback { { private struct Entry { public int hashCode; // Lower 31 bits of hash code, -1 if unused public int next; // Index of next entry, -1 if last public TKey key; // Key of entry public TValue value; // Value of entry } private int[] buckets; private Entry[] entries; private IEqualityComparer<TKey> comparer; public Dictionary(int capacity): this(capacity, null) {} public Dictionary(IEqualityComparer<TKey> comparer): this(0, comparer) {} public Dictionary(int capacity, IEqualityComparer<TKey> comparer) { if (capacity < 0) ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity); if (capacity > 0) Initialize(capacity); this.comparer = comparer ?? EqualityComparer<TKey>.Default; } private void Initialize(int capacity) { int size = HashHelpers.GetPrime(capacity); buckets = new int[size]; for (int i = 0; i < buckets.Length; i++) buckets[i] = -1; entries = new Entry[size]; freeList = -1; } public TValue this[TKey key] { get { int i = FindEntry(key); if (i >= 0) return entries[i].value; ThrowHelper.ThrowKeyNotFoundException(); return default(TValue); } set { Insert(key, value, false); } } private int FindEntry(TKey key) { if( key == null) { ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); } if (buckets != null) { int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF; for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) { if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i; } } return -1; } private void Insert(TKey key, TValue value, bool add) { if( key == null ) { ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); } if (buckets == null) Initialize(0); int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF; int targetBucket = hashCode % buckets.Length; for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) { if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) { if (add) { ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate); } entries[i].value = value; version++; return; } } int index; if (freeCount > 0) { index = freeList; freeList = entries[index].next; freeCount--; } else { if (count == entries.Length) { Resize(); targetBucket = hashCode % buckets.Length; } index = count; count++; } entries[index].hashCode = hashCode; entries[index].next = buckets[targetBucket]; entries[index].key = key; entries[index].value = value; buckets[targetBucket] = index; version++; if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer)) { comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer); Resize(entries.Length, true); } } }
Dictionary<TKey,TValue>的数据成员转换为Entry结构,真正保存数据的是这里的Entry[] entries 数组,第一个元素小标为0,第二个为1......,但是查找和添加Dictionary<TKey,TValue>我们都是通过key来实现的,那么一个key究竟对应哪一个下标了,就需要这里的int[] buckets数组了。就如这里的FindEntry方法一样,首先获取key的哈希值获取buckets的下标(比如一个初始化为100个元素的字典,计算出来再buckets中的第50个元素),buckets 对应的值就是entries 数组的下标(buckets[50]=0,那么就应该取entries[0]的值了)。如果字典的元素个数是可以确定的话,那么建议指定capacity
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;
entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;
现在我们来看看SortedList的实现:
public class SortedList<TKey, TValue> : IDictionary<TKey, TValue>, System.Collections.IDictionary, IReadOnlyDictionary<TKey, TValue> { private TKey[] keys; private TValue[] values; private IComparer<TKey> comparer; public SortedList(int capacity) { if (capacity < 0) ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity, ExceptionResource.ArgumentOutOfRange_NeedNonNegNumRequired); keys = new TKey[capacity]; values = new TValue[capacity]; comparer = Comparer<TKey>.Default; } public SortedList(IDictionary<TKey, TValue> dictionary, IComparer<TKey> comparer) : this((dictionary != null ? dictionary.Count : 0), comparer) { if (dictionary==null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.dictionary); dictionary.Keys.CopyTo(keys, 0); dictionary.Values.CopyTo(values, 0); Array.Sort<TKey, TValue>(keys, values, comparer); _size = dictionary.Count; } public TValue this[TKey key] { get { int i = IndexOfKey(key); if (i >= 0) return values[i]; ThrowHelper.ThrowKeyNotFoundException(); return default(TValue); } set { if (((Object) key) == null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); int i = Array.BinarySearch<TKey>(keys, 0, _size, key, comparer); if (i >= 0) { values[i] = value; version++; return; } Insert(~i, key, value); } } public int IndexOfKey(TKey key) { if (key == null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); int ret = Array.BinarySearch<TKey>(keys, 0, _size, key, comparer); return ret >=0 ? ret : -1; } private void Insert(int index, TKey key, TValue value) { if (_size == keys.Length) EnsureCapacity(_size + 1); if (index < _size) { Array.Copy(keys, index, keys, index + 1, _size - index); Array.Copy(values, index, values, index + 1, _size - index); } keys[index] = key; values[index] = value; _size++; version++; } }
SortedList<TKey, TValue>的key和value分别存在TKey[] keys和TValue[] values数组里面,但是查找key用的不是哈希算法,而是二分查找 Array.BinarySearch<TKey>(keys, 0, _size, key, comparer),但是插入的时候却有
if (index < _size) {
Array.Copy(keys, index, keys, index + 1, _size - index);
Array.Copy(values, index, values, index + 1, _size - index);
}这样的code,意思就是如果SortedList里面已经有10个值,如果新插入的值应该是第一个, 那么需要把后面10个元素依次移动一个位置。移除元素也有类似的情况。
最后我们来看SortedDictionary的实现:
public class SortedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue> { public SortedDictionary(IDictionary<TKey,TValue> dictionary, IComparer<TKey> comparer) { if( dictionary == null) { ThrowHelper.ThrowArgumentNullException(ExceptionArgument.dictionary); } _set = new TreeSet<KeyValuePair<TKey, TValue>>(new KeyValuePairComparer(comparer)); foreach(KeyValuePair<TKey, TValue> pair in dictionary) { _set.Add(pair); } } public SortedDictionary(IComparer<TKey> comparer) { _set = new TreeSet<KeyValuePair<TKey, TValue>>(new KeyValuePairComparer(comparer)); } public TValue this[TKey key] { get { if ( key == null) { ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); } TreeSet<KeyValuePair<TKey, TValue>>.Node node = _set.FindNode(new KeyValuePair<TKey, TValue>(key, default(TValue))); if ( node == null) { ThrowHelper.ThrowKeyNotFoundException(); } return node.Item.Value; } set { if( key == null) { ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key); } TreeSet<KeyValuePair<TKey, TValue>>.Node node = _set.FindNode(new KeyValuePair<TKey, TValue>(key, default(TValue))); if ( node == null) { _set.Add(new KeyValuePair<TKey, TValue>(key, value)); } else { node.Item = new KeyValuePair<TKey, TValue>( node.Item.Key, value); _set.UpdateVersion(); } } } internal class TreeSet<T> : SortedSet<T> {} } public class SortedSet<T> : ISet<T>, ICollection<T>, ICollection, ISerializable, IDeserializationCallback, IReadOnlyCollection<T> { internal virtual Node FindNode(T item) { Node current = root; while (current != null) { int order = comparer.Compare(item, current.Item); if (order == 0) { return current; } else { current = (order < 0) ? current.Left : current.Right; } } return null; } public bool Add(T item) { return AddIfNotPresent(item); } internal virtual bool AddIfNotPresent(T item) { if (root == null) { // empty tree root = new Node(item, false); count = 1; version++; return true; } // // Search for a node at bottom to insert the new node. // If we can guanratee the node we found is not a 4-node, it would be easy to do insertion. // We split 4-nodes along the search path. // Node current = root; Node parent = null; Node grandParent = null; Node greatGrandParent = null; //even if we don't actually add to the set, we may be altering its structure (by doing rotations //and such). so update version to disable any enumerators/subsets working on it version++; int order = 0; while (current != null) { order = comparer.Compare(item, current.Item); if (order == 0) { // We could have changed root node to red during the search process. // We need to set it to black before we return. root.IsRed = false; return false; } // split a 4-node into two 2-nodes if (Is4Node(current)) { Split4Node(current); // We could have introduced two consecutive red nodes after split. Fix that by rotation. if (IsRed(parent)) { InsertionBalance(current, ref parent, grandParent, greatGrandParent); } } greatGrandParent = grandParent; grandParent = parent; parent = current; current = (order < 0) ? current.Left : current.Right; } Debug.Assert(parent != null, "Parent node cannot be null here!"); // ready to insert the new node Node node = new Node(item); if (order > 0) { parent.Right = node; } else { parent.Left = node; } // the new node will be red, so we will need to adjust the colors if parent node is also red if (parent.IsRed) { InsertionBalance(node, ref parent, grandParent, greatGrandParent); } // Root node is always black root.IsRed = false; ++count; return true; } }
SortedDictionary的实现基本是靠TreeSet<T> (SortedSet<T>)来完成的,它的查找和添加都是在一个红黑树里面实现的。
Dictionary, SortedDictionary, SortedList 3个都有含类似IComparer<TKey> comparer的构造方法,Dictionary和SortedList 里面存储是用数组,所有它俩都有int capacity的指定,然而SortedDictionary依赖于树,所以没有该参数。所以Dictionary查找,插入、修改时间复杂度为O(1)(里面主要是哈希算法的时间,建议一个哈希桶里面存放一个元素),SortedList的查找时间复杂度为O(logn),但是插入和删除需要移动后面的元素,所以时间复杂 为O(n),SortedDictionary依赖于红黑树,所以查找、插入和修改 时间复杂度为O(logn)。