linux源码解读（十四）：红黑树在内核的应用——红黑树原理和api解析

　 1、红黑树是一种非常重要的数据结构，有比较明显的两个特点：

插入、删除、查找的时间复杂度接近O(logN)，N是节点个数，明显比链表快；是一种性能非常稳定的二叉树！
中序遍历的结果是从小到大排好序的

　　基于以上两个特点，红黑树比较适合的应用场景:

需要动态插入、删除、查找的场景，包括但不限于：
- 　　某些数据库的增删改查，比如select * from xxx where 这类条件检索
- linux内核中进程通过红黑树组织管理，便于快速插入、删除、查找进程的task_struct
- linux内存中内存的管理：分配和回收。用红黑树组织已经分配的内存块，当应用程序调用free释放内存的时候，可以根据内存地址在红黑树中快速找到目标内存块
- hashmap中(key,value)增、删、改查的实现；java 8就采用了RBTree替代链表
- Ext3文件系统，通过红黑树组织目录项
排好序的场景，比如：
- 　　linux定时器的实现：hrtimer以红黑树的形式组织，树的最左边的节点就是最快到期的定时

　　从上述的应用场景可以看出来红黑树是非常受欢迎的一种数据结构，接下来深入分析一些典型的场景，看看linux的内核具体是怎么使用红黑树的！

2、先来看看红黑树的定义，在include\linux\rbtree.h文件中：

struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));
    /* The alignment might seem pointless, but allegedly CRIS needs it */

　　结构体非常简单，只有3个字段，凡是有一丁点开发经验的人员都会有疑问：红黑树有那么多应用场景，这个结构体居然一个应用场景的业务字段都没有，感觉就像个还没装修的毛坯房，这个该怎么用了？这恰恰是设计的精妙之处：红黑树在linux内核有大量的应用场景，如果把rb_node的定义加上了特定应用场景的业务字段，那这个结构体就只能在这个特定的场景下用了，完全没有了普适性，变成了场景紧耦合的；这样的结构体多了会增加后续代码维护的难度，所以rb_node结构体的定义就极简了，只保留了红黑树节点自身的3个属性：左孩子、右孩子、节点颜色（list_head结构体也是这个思路）；这么简单、不带业务场景属性的结构体该怎么用了？先举个简单的例子，看懂后能更快地理解linux源码的原理。比如一个班级有50个学生，每个学生有id、name和score分数，现在要用红黑树组织所有的学生，先定义一个student的结构体：

struct Student{
    int id;
    char *name;
    int scroe
    struct rb_node s_rb;
};

　　前面3个都是业务字段，第4个是红黑树的字段（student和rb_node结构体看起来是两个分开的结构体，但经过编译器编译后会合并字段，最终就是一块连续的内存，有点类似c++的继承关系）；linux提供了红黑树基本的增、删、改、查、左旋、右旋、设置颜色等操作，如下：

#define rb_parent(r)   ((struct rb_node *)((r)->rb_parent_color & ~3)) //低两位清0
#define rb_color(r)   ((r)->rb_parent_color & 1)                       //取最后一位
#define rb_is_red(r)   (!rb_color(r))                                  //最后一位为0？
#define rb_is_black(r) rb_color(r)                                     //最后一位为1？
#define rb_set_red(r)  do { (r)->rb_parent_color &= ~1; } while (0)    //最后一位置0
#define rb_set_black(r)  do { (r)->rb_parent_color |= 1; } while (0)   //最后一位置1

static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p) //设置父亲
{
    rb->rb_parent_color = (rb->rb_parent_color & 3) | (unsigned long)p;
}
static inline void rb_set_color(struct rb_node *rb, int color)          //设置颜色
{
    rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
}
//左旋、右旋
void __rb_rotate_left(struct rb_node *node, struct rb_root *root);
void __rb_rotate_right(struct rb_node *node, struct rb_root *root);
//删除节点
void rb_erase(struct rb_node *, struct rb_root *);
void __rb_erase_color(struct rb_node *node, struct rb_node *parent, struct rb_root *root);
//替换节点
void rb_replace_node(struct rb_node *old, struct rb_node *new, struct rb_root *tree);
//插入节点

void rb_link_node(struct rb_node * node, struct rb_node * parent, struct rb_node ** rb_link);

//遍历红黑树
extern struct rb_node *rb_next(const struct rb_node *); //后继
extern struct rb_node *rb_prev(const struct rb_node *); //前驱
extern struct rb_node *rb_first(const struct rb_root *);//最小值
extern struct rb_node *rb_last(const struct rb_root *); //最大值

　　上面的操作接口传入的参数都是rb_node，怎么才能用于来操作用户自定义业务场景的红黑树了，就比如上面的student结构体？既然这些接口的传入参数都是rb_node，如果不改参数和函数实现，就只能按照别人的要求传入rb_node参数，自定义结构体的字段怎么才能“顺带”加入红黑树了？这个也简单，自己生成结构体，然后把结构体的rb_node参数传入即可，如下：

/*
 将对象加到红黑树上
 s_root            红黑树root节点
 ptr_stu        对象指针
 rb_link        对象节点所在的节点
 rb_parent        父节点
 */
void student_link_rb(struct rb_root *s_root, struct Student *ptr_stu,
        struct rb_node **rb_link, struct rb_node *rb_parent)
{
    rb_link_node(&ptr_stu->s_rb, rb_parent, rb_link);
    rb_insert_color(&ptr_stu->s_rb, s_root);
}

void add_student(struct rb_root *s_root, struct Student *stu, struct Student **stu_header)
{
    struct rb_node **rb_link, *rb_parent;
    // 插入红黑树
    student_link_rb(s_root, stu, rb_link, rb_parent);
}

　　假如以score分数作为构建红黑树的key，构建的树如下：每个student节点的rb_right和rb_left指针指向的都是rb_node的起始地址，也就是_rb_parent_color的值，但是score、name、id这些值其实才是业务上急需读写的，怎么得到这些字段的值了?

linux的开发人员早就想好了读取的方法：先得到student实例的开始地址，再通过偏移读字段不就行了么？如下：

#define container_of(ptr, type, member) ({                \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);  \
    (type *)( (char *)__mptr - offsetof(type,member) );})

　　通过上面的宏定义就能得到student实例的首地址了，用法如下：调用container_of方法，传入rbnode的实例（确认student实例的位置）、student结构体和内部rb_node的位置（用以计算rb_node在结构体内部的偏移，然后反推student实例的首地址）：得到student实例的首地址，接下来就可以愉快的直接使用id、name、score等字段了；

struct Student* find_by_id(struct rb_root *root, int id)
{
    struct Student *ptr_stu = NULL;
    struct rb_node *rbnode = root->rb_node;
    while (NULL != rbnode)
    {
//最核心的代码：三个参数分别时rb_node的实例，student结构体的定义和内部的rb_node字段位置
        struct Student *ptr_tmp = container_of(rbnode, struct Student, s_rb);
        if (id < ptr_tmp->id)
        {
            rbnode = rbnode->rb_left;
        }
        else if (id > ptr_tmp->id)
        {
            rbnode = rbnode->rb_right;
        }
        else
        {
            ptr_stu = ptr_tmp;
            break;
        }
    }
    return ptr_stu;
}

　　总结一下红黑树使用的大致流程：

开发人员根据业务场景需求定义结构体的字段，务必包含rb_node；
生成结构体的实例stu，调用rb_link_node添加节点构建红黑树。当然传入的参数是stu->s_rb
遍历查找的时候根据找s_rb实例、自定义结构体、rb_node在结构体的名称得到自定义结构体实例的首地址，然后就能愉快的读写业务字段了！

3、上述的案例够简单吧，linux内部各种复杂场景使用红黑树的原理和这个一毛一样，没有任何本质区别！理解了上述案例的原理，也就理解了linux内核使用红黑树的原理！接下来看看红黑树一些关机api实现的方法了：

（1）红黑树是排好序的，中序遍历的结果就是从小到大排列的；最左边就是整棵树的最小节点，所以一直向左就能找到第一个、也是最小的节点；

/*
 * This function returns the first node (in sort order) of the tree.
 */
struct rb_node *rb_first(const struct rb_root *root)
{
    struct rb_node    *n;

    n = root->rb_node;
    if (!n)
        return NULL;
    while (n->rb_left)
        n = n->rb_left;
    return n;
}

　　同理：一路向右能找到整棵树最大的节点

struct rb_node *rb_last(const struct rb_root *root)
{
    struct rb_node    *n;

    n = root->rb_node;
    if (!n)
        return NULL;
    while (n->rb_right)
        n = n->rb_right;
    return n;
}

　　（2）找到某个节点下一个节点：比如A节点数值是50，从A节点的右孩开始（右孩所有节点都比A大），往左找 as far as get null；也就是整个树中比A大的最小节点；这个功能可以用来做条件查询！

struct rb_node *rb_next(const struct rb_node *node)
{
    struct rb_node *parent;

    if (RB_EMPTY_NODE(node))
        return NULL;

    /*
     * If we have a right-hand child, go down and then left as far
     * as we can.
     */
    if (node->rb_right) {
        node = node->rb_right;
        while (node->rb_left)
            node=node->rb_left;
        return (struct rb_node *)node;
    }

    /*
     * No right-hand children. Everything down and left is smaller than us,
     * so any 'next' node must be in the general direction of our parent.
     * Go up the tree; any time the ancestor is a right-hand child of its
     * parent, keep going up. First time it's a left-hand child of its
     * parent, said parent is our 'next' node.
     */
    while ((parent = rb_parent(node)) && node == parent->rb_right)
        node = parent;

    return parent;
}

　　同理，找到整个树中比A小的最大节点：

struct rb_node *rb_prev(const struct rb_node *node)
{
    struct rb_node *parent;

    if (RB_EMPTY_NODE(node))
        return NULL;

    /*
     * If we have a left-hand child, go down and then right as far
     * as we can.
     */
    if (node->rb_left) {
        node = node->rb_left;
        while (node->rb_right)
            node=node->rb_right;
        return (struct rb_node *)node;
    }

    /*
     * No left-hand children. Go up till we find an ancestor which
     * is a right-hand child of its parent.
     */
    while ((parent = rb_parent(node)) && node == parent->rb_left)
        node = parent;

    return parent;
}

　　（3）替换一个节点：把周围的指针改向，然后改节点颜色

void rb_replace_node(struct rb_node *victim, struct rb_node *new,
             struct rb_root *root)
{
    struct rb_node *parent = rb_parent(victim);

    /* Set the surrounding nodes to point to the replacement */
    __rb_change_child(victim, new, parent, root);
    if (victim->rb_left)
        rb_set_parent(victim->rb_left, new);
    if (victim->rb_right)
        rb_set_parent(victim->rb_right, new);

    /* Copy the pointers/colour from the victim to the replacement */
    *new = *victim;
}

　　（4）插入一个节点：分不同情况左旋、右旋；

static __always_inline void
__rb_insert(struct rb_node *node, struct rb_root *root,
        void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
{
    struct rb_node *parent = rb_red_parent(node), *gparent, *tmp;

    while (true) {
        /*
         * Loop invariant: node is red
         *
         * If there is a black parent, we are done.
         * Otherwise, take some corrective action as we don't
         * want a red root or two consecutive red nodes.
         */
        if (!parent) {
            rb_set_parent_color(node, NULL, RB_BLACK);
            break;
        } else if (rb_is_black(parent))
            break;

        gparent = rb_red_parent(parent);

        tmp = gparent->rb_right;
        if (parent != tmp) {    /* parent == gparent->rb_left */
            if (tmp && rb_is_red(tmp)) {
                /*
                 * Case 1 - color flips
                 *
                 *       G            g
                 *      / \          / \
                 *     p   u  -->   P   U
                 *    /            /
                 *   n            n
                 *
                 * However, since g's parent might be red, and
                 * 4) does not allow this, we need to recurse
                 * at g.
                 */
                rb_set_parent_color(tmp, gparent, RB_BLACK);
                rb_set_parent_color(parent, gparent, RB_BLACK);
                node = gparent;
                parent = rb_parent(node);
                rb_set_parent_color(node, parent, RB_RED);
                continue;
            }

            tmp = parent->rb_right;
            if (node == tmp) {
                /*
                 * Case 2 - left rotate at parent
                 *
                 *      G             G
                 *     / \           / \
                 *    p   U  -->    n   U
                 *     \           /
                 *      n         p
                 *
                 * This still leaves us in violation of 4), the
                 * continuation into Case 3 will fix that.
                 */
                tmp = node->rb_left;
                WRITE_ONCE(parent->rb_right, tmp);
                WRITE_ONCE(node->rb_left, parent);
                if (tmp)
                    rb_set_parent_color(tmp, parent,
                                RB_BLACK);
                rb_set_parent_color(parent, node, RB_RED);
                augment_rotate(parent, node);
                parent = node;
                tmp = node->rb_right;
            }

            /*
             * Case 3 - right rotate at gparent
             *
             *        G           P
             *       / \         / \
             *      p   U  -->  n   g
             *     /                 \
             *    n                   U
             */
            WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
            WRITE_ONCE(parent->rb_right, gparent);
            if (tmp)
                rb_set_parent_color(tmp, gparent, RB_BLACK);
            __rb_rotate_set_parents(gparent, parent, root, RB_RED);
            augment_rotate(gparent, parent);
            break;
        } else {
            tmp = gparent->rb_left;
            if (tmp && rb_is_red(tmp)) {
                /* Case 1 - color flips */
                rb_set_parent_color(tmp, gparent, RB_BLACK);
                rb_set_parent_color(parent, gparent, RB_BLACK);
                node = gparent;
                parent = rb_parent(node);
                rb_set_parent_color(node, parent, RB_RED);
                continue;
            }

            tmp = parent->rb_left;
            if (node == tmp) {
                /* Case 2 - right rotate at parent */
                tmp = node->rb_right;
                WRITE_ONCE(parent->rb_left, tmp);
                WRITE_ONCE(node->rb_right, parent);
                if (tmp)
                    rb_set_parent_color(tmp, parent,
                                RB_BLACK);
                rb_set_parent_color(parent, node, RB_RED);
                augment_rotate(parent, node);
                parent = node;
                tmp = node->rb_left;
            }

            /* Case 3 - left rotate at gparent */
            WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
            WRITE_ONCE(parent->rb_left, gparent);
            if (tmp)
                rb_set_parent_color(tmp, gparent, RB_BLACK);
            __rb_rotate_set_parents(gparent, parent, root, RB_RED);
            augment_rotate(gparent, parent);
            break;
        }
    }
}

　　rb_node最牛逼的地方：去掉了业务属性的字段，和业务场景松耦合，让rb_node结构体和对应的方法可以做到在不同的业务场景通用；同时配合container_of函数，又能通过rb_node实例地址快速反推出业务结构体实例的首地址，方便读写业务属性的字段，这种做法高！实在是高！

　　4、红黑树为什么这么牛？个人认为最核心的要点在于其动态的高度调整！换句话说：在增、删、改的过程中，为了避免红黑树退化成单向链表，红黑树会动态地调整树的高度，让树高不超过2lg(n+1)；相比AVL 树，红黑树只需维护一个黑高度，效率高很多；这样一来，增删改查的时间复杂度就控制在了O(lgn)! 那么红黑树又是怎么控制树高度的了？就是红黑树那5条规则（这不是废话么？）！最核心的就是第4、5点！

（1）先看看第4点：任何相邻的节点都不能同时为红色，也就是说：红节点是被黑节点隔开的；随意选一条从根节点到叶子节点的路径，因为要满足这点，所以每存在一个红节点，至少对应了一个黑节点，即红色节点个数<=黑色节点个数；假如黑色节点数量是n，那么整棵树节点的数量<=2n;

　（2）再看看第5点：每个节点，从该节点到达其可达叶子节点的所有路径，都包含相同数目的黑色节点；新加入的节点初始颜色是红色，如果其父节点也是红色，就需要挨个往上回溯更改每个父节点的颜色了！更改颜色后如果打破了第5点，就需要通过旋转重构红黑树，本质上是降低整棵树的高度，避免整棵树退化成链表，举个例子：初始红黑树如下：

增加8节点，节点初始是红色，是7节点的右子节点；因为7节点也是红色，所以要调整成黑色；但是这样一来，2->4->6->7就有3个黑节点了，这时需要继续往上回溯6、4、2节点，分别更改这3个节点的颜色，导致根节点2成了红色，同时5和6都是红色，这两个节点都不符合规定；此时再左旋4节点，让4来做根节点，降低了树的高度，后续再增删改查时还是能保持时间复杂度是O(n)!

参考：

1、https://www.bilibili.com/video/BV135411h7wJ?p=1 红黑树介绍

2、https://cloud.tencent.com/developer/article/1922776 数据结构红黑树

3、https://blog.csdn.net/weixin_46381158/article/details/117999284 红黑树基本用法

4、https://rbtree.phpisfuture.com/ 红黑树在线演示

5、https://segmentfault.com/a/1190000023101310 红黑树前世今生

posted @ 2022-01-13 21:42 第七子007 阅读(2147) 评论(0) 编辑收藏举报

刷新页面返回顶部

第七子007

linux源码解读（十四）：红黑树在内核的应用——红黑树原理和api解析

公告