数据结构 - 10 - 跳表（SkipList）

1.结构

单链表和向量实现难度低，维护容易却查找效率低，AVL树、红黑树等高级数据结构效率高但较为复杂，难以维护。

SkipList的实现较红黑树简单得多，而且同时保证了查询和维护等操作平均仅需O(logn)时间。

不难看出，跳表的结构大致可以被描述为：带索引的有序的链表

底层的链表存储了所有数据，而上层的链表则作为索引存在。

普通单链表节点的指针域中仅有一个next指向后继元素，双向链表中增加了一个前驱。

而在跳表中，一个节点中可以有4个指针，分别指向前驱、后继、上邻、下邻。

2.SkipListNode

根据跳表的定义很容易设计出这样一个类

key：关键码

value：数据域

pre，next，up，down：分别指向前后上下的节点

public class SkipListNode<T> {

    public int key;

    public T value;

    /** 前驱、后继 */
    public SkipListNode<T> pre, next;

    /** 上邻、下邻 */
    public SkipListNode<T> up, down;

    public static final int HEAD_KEY = Integer.MIN_VALUE;
    public static final int TAIL_KEY = Integer.MAX_VALUE;

    public SkipListNode(int k, T v) {
        key = k;
        value = v;
    }

    public int getKey() {
        return key;
    }

    public void setKey(int key) {
        this.key = key;
    }

    public T getValue() {
        return value;
    }

    public void setValue(T value) {
        this.value = value;
    }

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null) {
            return false;
        }
        if (!(object instanceof SkipListNode<?>)) {
            return false;
        }
        SkipListNode<T> skipListNode;
        try {
            skipListNode = (SkipListNode<T>) object;
        } catch (ClassCastException ex) {
            return false;
        }
        return (skipListNode.getKey() == key) && (skipListNode.getValue() == value);
    }

    @Override
    public String toString() {
        return "key-value:" + key + "," + value;
    }
}

3.SkipList

包含了几个重要的操作：查找（get），插入（put），删除（remove），加层（addEmptyLevel）

查找（get）

最重要的方法，其他的几个方法都是以查找为基础的。

实际进行查找操作的方法是findNode。

查找的过程很清晰，从顶层的头节点开始顺序查找，一旦遇到下一个关键码大于目标关键码（确定了目标在下层的区间），则转入下层查找。

目标关键码在跳表中不一定存在，则此时findNode返回的将是小于且距离目标关键码的值最近的节点。

    /**
     * 如果传入的key值在跳跃表中不存在，则findNode返回跳跃表中key值小于key，并且key值相差最小的底层节点
     * 所以不能用此方法来代替get
     * @param key key
     * @return key值相差最小的底层节点
     */
    public SkipListNode<T> findNode(int key) {
        SkipListNode<T> p = head;
        while (true) {
            while (p.next.key != SkipListNode.TAIL_KEY && p.next.key <= key) {
                p = p.next;
            }
            //转入下层
            if (p.down != null) {
                p = p.down;
            } else {
                break;
            }
        }
        return p;
    }

因为存在查找失败的情况，所以要给findNode加一层判断，当findNode返回的节点确实等于目标关键码而非最近小于时，才将该节点返回。

    public SkipListNode<T> get(int key) {
        SkipListNode<T> p = findNode(key);
        if (p.key == key) {
            return p;
        }
        return null;
    }

加层（addEmptyLevel）

这个没什么好说的，将各个变量初始化，listLevel++

    public void addEmptyLevel() {
        SkipListNode<T> p1 = new SkipListNode<T>(SkipListNode.HEAD_KEY, null);
        SkipListNode<T> p2 = new SkipListNode<T>(SkipListNode.TAIL_KEY, null);
        p1.next = p2;
        p1.down = head;
        p2.pre = p1;
        p2.down = tail;
        head.up = p1;
        tail.up = p2;
        head = p1;
        tail = p2;
        listLevel++;
    }

插入（put）

实际的插入操作是由insertNode方法完成的。

单纯的插入操作其实很简单，insertNode要求传入待插入节点的前驱。

    /**
     * 单层插入
     * @param p 前驱
     * @param q 待插入的节点
     */
    private void insertNode(SkipListNode<T> p, SkipListNode<T> q) {
        q.next = p.next;
        q.pre = p;
        p.next.pre = q;
        p.next = q;
    }

首先在跳表中findNode，由上可知，findNode的返回值分两种情况：第一种，当key已存在，关键码不能重复，所以终止插入操作；第二种，当key不存在，此时findNode返回最近小于key的节点，正好将这个节点作为待插入节点的前驱，调用insertNode方法即可。

插入操作进行到这里已经完成了，但插入新节点后，还涉及到比较复杂的升层问题。

首先考虑SkipList的空间复杂度，SkipList是基于有序链表实现的，不难看出，为了加快查找速度，SkipList为有序链表建立了索引，即以空间换时间，牺牲了一定的空间效率换取了更高的查找效率，这算得上是比较常用的手段。但是，SkipList的层数并非越多越好，索引也不是越密集越好，如果没有一定的策略，在浪费了空间的同时，时间的效率也无法得到保证。

网上主流的随机策略是升层的概率逐层递减为1/2，根据数学归纳法计算数学期望，总体的消耗量将仅为O(n)。

（证明见《数据结构与算法分析（C++语言版）第3版》邓俊辉 P254）

在jdk实现的ConcurrentSkipListMap中，随机策略有所不同。

    /**
     * 插入
     * put方法有一些需要注意的步骤：
     * 1.如果put的key值在跳跃表中存在，则进行修改操作；
     * 2.如果put的key值在跳跃表中不存在，则需要进行新增节点的操作，并且需要由random随机数决定新加入的节点的高度（最大level）；
     * 3.当新添加的节点高度达到跳跃表的最大level，需要添加一个空白层（除了-oo和+oo没有别的节点）
     * @param key key
     * @param value value
     */
    public void put(int key, T value) {
        SkipListNode<T> p = findNode(key);
        //如果已经存在，则终止插入操作
        if (p.key == key) {
            p.value = value;
            return;
        }

        SkipListNode<T> q = new SkipListNode<>(key, value);
        insertNode(p, q);

        //随机决定是否升层
        int currentLevel = 0;
        while (random.nextDouble() > PROBABILITY) {
            if (currentLevel >= listLevel) {
                addEmptyLevel();
            }
            while (p.up == null) {
                System.out.println(p);
                p = p.pre;
            }
            p = p.up;
            SkipListNode<T> z = new SkipListNode<>(key, null);
            insertNode(p, z);
            z.down = q;
            q.up = z;
            //把指针移到上一层
            q = z;
            currentLevel++;
        }
        size++;
    }

删除（remove）

删除操作本质上和普通链表的删除没有什么区别，唯一的不同是要注意其他层的节点要一并删除。

    /**
     * 首先查找到包含key值的节点，将节点从链表中移除，接着如果有更高level的节点，则重复这个操作即可。
     * @param key key
     * @return T(删除的数据)
     */
    public T remove(int key) {
        SkipListNode<T> p = get(key);
        if (p == null) {
            return null;
        }
        T oldV = p.value;
        SkipListNode<T> q;
        while (p != null) {
            q = p.next;
            q.pre = p.pre;
            p.pre.next = q;
            p = p.up;
        }
        return oldV;
    }

全部代码

public class SkipList<T> {

    private SkipListNode<T> head, tail;

    private int size;
    private int listLevel;

    private final Random random;
    private static final double PROBABILITY = 0.5;

    public SkipList() {
        head = new SkipListNode<T>(SkipListNode.HEAD_KEY, null);
        tail = new SkipListNode<>(SkipListNode.TAIL_KEY, null);
        head.next = tail;
        tail.pre = head;
        size = 0;
        listLevel = 0;
        random = new Random();
    }

    /**
     * 根据key查找
     * @param key key
     * @return SkipListNode<T>
     */
    public SkipListNode<T> get(int key) {
        SkipListNode<T> p = findNode(key);
        if (p.key == key) {
            return p;
        }
        return null;
    }

    /**
     * 首先查找到包含key值的节点，将节点从链表中移除，接着如果有更高level的节点，则repeat这个操作即可。
     * @param key key
     * @return T(删除的数据)
     */
    public T remove(int key) {
        SkipListNode<T> p = get(key);
        if (p == null) {
            return null;
        }
        T oldV = p.value;
        SkipListNode<T> q;
        while (p != null) {
            q = p.next;
            q.pre = p.pre;
            p.pre.next = q;
            p = p.up;
        }
        return oldV;
    }

    /**
     * 插入
     * put方法有一些需要注意的步骤：
     * 1.如果put的key值在跳跃表中存在，则进行修改操作；
     * 2.如果put的key值在跳跃表中不存在，则需要进行新增节点的操作，并且需要由random随机数决定新加入的节点的高度（最大level）；
     * 3.当新添加的节点高度达到跳跃表的最大level，需要添加一个空白层（除了-oo和+oo没有别的节点）
     * @param key key
     * @param value value
     */
    public void put(int key, T value) {
        SkipListNode<T> p = findNode(key);
        //如果已经存在，则终止插入操作
        if (p.key == key) {
            p.value = value;
            return;
        }

        SkipListNode<T> q = new SkipListNode<>(key, value);
        insertNode(p, q);

        //随机决定是否升层
        int currentLevel = 0;
        while (random.nextDouble() > PROBABILITY) {
            if (currentLevel >= listLevel) {
                addEmptyLevel();
            }
            while (p.up == null) {
                System.out.println(p);
                p = p.pre;
            }
            p = p.up;
            SkipListNode<T> z = new SkipListNode<>(key, null);
            insertNode(p, z);
            z.down = q;
            q.up = z;
            //把指针移到上一层
            q = z;
            currentLevel++;
        }
        size++;
    }

    /**
     * 如果传入的key值在跳跃表中不存在，则findNode返回跳跃表中key值小于key，并且key值相差最小的底层节点;
     * 所以不能用此方法来代替get
     * @param key key
     * @return key值相差最小的底层节点
     */
    public SkipListNode<T> findNode(int key) {
        SkipListNode<T> p = head;
        while (true) {
            while (p.next.key != SkipListNode.TAIL_KEY && p.next.key <= key) {
                p = p.next;
            }
            //转入下层
            if (p.down != null) {
                p = p.down;
            } else {
                break;
            }
        }
        return p;
    }

    public boolean isEmpty() {
        return size == 0;
    }

    public int size() {
        return size;
    }

    /**
     * 添加新的一层
     */
    public void addEmptyLevel() {
        SkipListNode<T> p1 = new SkipListNode<T>(SkipListNode.HEAD_KEY, null);
        SkipListNode<T> p2 = new SkipListNode<T>(SkipListNode.TAIL_KEY, null);
        p1.next = p2;
        p1.down = head;
        p2.pre = p1;
        p2.down = tail;
        head.up = p1;
        tail.up = p2;
        head = p1;
        tail = p2;
        listLevel++;
    }

    /**
     * 单层插入
     * @param p 前驱
     * @param q 待插入的节点
     */
    private void insertNode(SkipListNode<T> p, SkipListNode<T> q) {
        q.next = p.next;
        q.pre = p;
        p.next.pre = q;
        p.next = q;
    }

    public int getLevel() {
        return listLevel;
    }
}

SkipList.java

4.java.util.concurrent.ConcurrentSkipListMap

ConcurrentSkipListMap是线程安全的有序的哈希表，适用于高并发的场景。
ConcurrentSkipListMap和TreeMap，它们虽然都是有序的哈希表。但是，第一，它们的线程安全机制不同，TreeMap是非线程安全的，而ConcurrentSkipListMap是线程安全的。第二，ConcurrentSkipListMap是通过跳表实现的，而TreeMap是通过红黑树实现的。

ConcurrentSkipListMap内部实现的跳表主要有三个方法：doGet（查找），doPut（插入），doRemove（删除）

    /**
     * Gets value for key. Almost the same as findNode, but returns
     * the found value (to avoid retries during re-reads)
     *
     * @param key the key
     * @return the value, or null if absent
     */
    private V doGet(Object key) {
        if (key == null)
            throw new NullPointerException();
        Comparator<? super K> cmp = comparator;
        outer: for (;;) {
            for (Node<K,V> b = findPredecessor(key, cmp), n = b.next;;) {
                Object v; int c;
                if (n == null)
                    break outer;
                Node<K,V> f = n.next;
                if (n != b.next)                // inconsistent read
                    break;
                if ((v = n.value) == null) {    // n is deleted
                    n.helpDelete(b, f);
                    break;
                }
                if (b.value == null || v == n)  // b is deleted
                    break;
                if ((c = cpr(cmp, key, n.key)) == 0) {
                    @SuppressWarnings("unchecked") V vv = (V)v;
                    return vv;
                }
                if (c < 0)
                    break outer;
                b = n;
                n = f;
            }
        }
        return null;
    }

    /* ---------------- Insertion -------------- */

    /**
     * Main insertion method.  Adds element if not present, or
     * replaces value if present and onlyIfAbsent is false.
     * @param key the key
     * @param value the value that must be associated with key
     * @param onlyIfAbsent if should not insert if already present
     * @return the old value, or null if newly inserted
     */
    private V doPut(K key, V value, boolean onlyIfAbsent) {
        Node<K,V> z;             // added node
        if (key == null)
            throw new NullPointerException();
        Comparator<? super K> cmp = comparator;
        outer: for (;;) {
            for (Node<K,V> b = findPredecessor(key, cmp), n = b.next;;) {
                if (n != null) {
                    Object v; int c;
                    Node<K,V> f = n.next;
                    if (n != b.next)               // inconsistent read
                        break;
                    if ((v = n.value) == null) {   // n is deleted
                        n.helpDelete(b, f);
                        break;
                    }
                    if (b.value == null || v == n) // b is deleted
                        break;
                    if ((c = cpr(cmp, key, n.key)) > 0) {
                        b = n;
                        n = f;
                        continue;
                    }
                    if (c == 0) {
                        if (onlyIfAbsent || n.casValue(v, value)) {
                            @SuppressWarnings("unchecked") V vv = (V)v;
                            return vv;
                        }
                        break; // restart if lost race to replace value
                    }
                    // else c < 0; fall through
                }

                z = new Node<K,V>(key, value, n);
                if (!b.casNext(n, z))
                    break;         // restart if lost race to append to b
                break outer;
            }
        }

        int rnd = ThreadLocalRandom.nextSecondarySeed();
        if ((rnd & 0x80000001) == 0) { // test highest and lowest bits
            int level = 1, max;
            while (((rnd >>>= 1) & 1) != 0)
                ++level;
            Index<K,V> idx = null;
            HeadIndex<K,V> h = head;
            if (level <= (max = h.level)) {
                for (int i = 1; i <= level; ++i)
                    idx = new Index<K,V>(z, idx, null);
            }
            else { // try to grow by one level
                level = max + 1; // hold in array and later pick the one to use
                @SuppressWarnings("unchecked")Index<K,V>[] idxs =
                    (Index<K,V>[])new Index<?,?>[level+1];
                for (int i = 1; i <= level; ++i)
                    idxs[i] = idx = new Index<K,V>(z, idx, null);
                for (;;) {
                    h = head;
                    int oldLevel = h.level;
                    if (level <= oldLevel) // lost race to add level
                        break;
                    HeadIndex<K,V> newh = h;
                    Node<K,V> oldbase = h.node;
                    for (int j = oldLevel+1; j <= level; ++j)
                        newh = new HeadIndex<K,V>(oldbase, newh, idxs[j], j);
                    if (casHead(h, newh)) {
                        h = newh;
                        idx = idxs[level = oldLevel];
                        break;
                    }
                }
            }
            // find insertion points and splice in
            splice: for (int insertionLevel = level;;) {
                int j = h.level;
                for (Index<K,V> q = h, r = q.right, t = idx;;) {
                    if (q == null || t == null)
                        break splice;
                    if (r != null) {
                        Node<K,V> n = r.node;
                        // compare before deletion check avoids needing recheck
                        int c = cpr(cmp, key, n.key);
                        if (n.value == null) {
                            if (!q.unlink(r))
                                break;
                            r = q.right;
                            continue;
                        }
                        if (c > 0) {
                            q = r;
                            r = r.right;
                            continue;
                        }
                    }

                    if (j == insertionLevel) {
                        if (!q.link(r, t))
                            break; // restart
                        if (t.node.value == null) {
                            findNode(key);
                            break splice;
                        }
                        if (--insertionLevel == 0)
                            break splice;
                    }

                    if (--j >= insertionLevel && j < level)
                        t = t.down;
                    q = q.down;
                    r = q.right;
                }
            }
        }
        return null;
    }

    /* ---------------- Deletion -------------- */

    /**
     * Main deletion method. Locates node, nulls value, appends a
     * deletion marker, unlinks predecessor, removes associated index
     * nodes, and possibly reduces head index level.
     *
     * Index nodes are cleared out simply by calling findPredecessor.
     * which unlinks indexes to deleted nodes found along path to key,
     * which will include the indexes to this node.  This is done
     * unconditionally. We can't check beforehand whether there are
     * index nodes because it might be the case that some or all
     * indexes hadn't been inserted yet for this node during initial
     * search for it, and we'd like to ensure lack of garbage
     * retention, so must call to be sure.
     *
     * @param key the key
     * @param value if non-null, the value that must be
     * associated with key
     * @return the node, or null if not found
     */
    final V doRemove(Object key, Object value) {
        if (key == null)
            throw new NullPointerException();
        Comparator<? super K> cmp = comparator;
        outer: for (;;) {
            for (Node<K,V> b = findPredecessor(key, cmp), n = b.next;;) {
                Object v; int c;
                if (n == null)
                    break outer;
                Node<K,V> f = n.next;
                if (n != b.next)                    // inconsistent read
                    break;
                if ((v = n.value) == null) {        // n is deleted
                    n.helpDelete(b, f);
                    break;
                }
                if (b.value == null || v == n)      // b is deleted
                    break;
                if ((c = cpr(cmp, key, n.key)) < 0)
                    break outer;
                if (c > 0) {
                    b = n;
                    n = f;
                    continue;
                }
                if (value != null && !value.equals(v))
                    break outer;
                if (!n.casValue(v, null))
                    break;
                if (!n.appendMarker(f) || !b.casNext(n, f))
                    findNode(key);                  // retry via findNode
                else {
                    findPredecessor(key, cmp);      // clean index
                    if (head.right == null)
                        tryReduceLevel();
                }
                @SuppressWarnings("unchecked") V vv = (V)v;
                return vv;
            }
        }
        return null;
    }

java.util.concurrent.ConcurrentSkipListMap

实现方法大致相同，值得注意的是其采用的随机策略。

(rnd & 0x80000001) == 0

0x80000001展开为二进制为10000000000000000000000000000001，即只有最高位和最低位为1

排除了负数（负数最高位为1）和奇数（奇数最低位为1），即只有随机数rnd为正偶数的情况下，才会考虑升层。

而升层的次数等于rnd倒数第二位开始的1的位数。

        int rnd = ThreadLocalRandom.nextSecondarySeed();
        if ((rnd & 0x80000001) == 0) { // test highest and lowest bits
            int level = 1, max;
            while (((rnd >>>= 1) & 1) != 0)
                ++level;
            ...
        }

5.ZSET与SkipList

Redis提供的五种数据结构之一的ZSET（有序集合），其底层由跳表实现（zskiplist）

server.h（Redis由C语言实现）

#define ZSKIPLIST_MAXLEVEL 64 /* Should be enough for 2^64 elements */
#define ZSKIPLIST_P 0.25      /* Skiplist P = 1/4 */
/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

zskiplist是限高的，其随机升层的策略如下

  /* Returns a random level for the new skiplist node we are going to create.
   * The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL
   * (both inclusive), with a powerlaw-alike distribution where higher
   * levels are less likely to be returned. */
  int zslRandomLevel(void) {
      int level = 1;
      while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
          level += 1;
      return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
  }

参考资料：《数据结构与算法分析（C++语言版）第3版》邓俊辉

　　　　　 Redis源码阅读

　　　　　Java并发集合（二）-ConcurrentSkipListMap分析和使用

　　　　　死磕 java集合之ConcurrentSkipListMap源码分析

　　　　　skiplist(跳表)的原理及JAVA实现

posted @ 2020-09-02 23:10 CofJus 阅读(392) 评论(0) 编辑收藏举报

刷新页面返回顶部

CofJus

Talk is cheap,but sometimes necessary.

数据结构 - 10 - 跳表（SkipList）

公告