继承与多态

参考：

https://juejin.cn/post/6857057314504081416
https://www.zhihu.com/question/389546003/answer/1194780618
https://www.zhihu.com/question/29251261/answer/1297439131
https://stackoverflow.com/questions/18035293/what-is-early-static-and-late-dynamic-binding-in-c
实验环境：os: centos8.5 / kernel: 4.18.0 / gcc: 8.5.0 / arch: x86-64

1. 概述

多态(polymorphism)是 c++ 支持的 3 大特性之一，通过继承关系，一套虚函数接口(基类)可以拥有许多不同的实现(派生类)

2. 对象内存模型

注意这里展示的是对象的内存模型，一个实例化对象拥有的资源是类中的非静态成员变量，类中的成员函数是所有对象共有的，不属于某个对象，静态成员是整个程序共有的，也不属于某个对象。同样，虚函数表也是所有对象共有的。

2.1 无虚函数的单一继承

如下函数：

class Base {
public:
  ~Base() {}
  void interface() {}
public:
  int b;
};

class Derived : public Base {
public:
  ~Derived() {}
  void interface() {}
public:
  int d;
};

int main() {
  Derived d;
  return 0;
}

使用 clang -Xclang -fdump-record-layouts test.cpp 指令导出对象内存结构如下：

可见，Derived 类通过继承 Base 类，Base 类中被继承的成员变量出现在了 Derived 类的开头(offset 为 0)，Derived 类的 int d 成员变量被排在了 Base 类后面，offset 为 sizeof(int b)。
在 Derived 类对象中，我们可以称继承而来的 Base 类成员变量为 subobject。
需要关注的一点是，即使 Base 类的 int b 成员是一个 private 的，Derived 类也能继承下来，但是没有访问权限。

2.2 无虚函数的多重继承

如下函数：

class Base01 {
public:
  ~Base01() {}
  void interface() {}
public:
  int b01;
};

class Base02 {
public:
  ~Base02() {}
  void interface() {}
public:
  int b02;
};

class Derived : public Base01, public Base02 {
public:
  ~Derived() {}
  void interface() {}
public:
  int d;
};

int main() {
  Derived d;
  return 0;
}

使用 clang -Xclang -fdump-record-layouts test.cpp 指令导出对象内存结构如下：

可见，在 Derived 类对象中，第一个继承的 Base01 类成员排在了 offset=0 的位置，第二个继承的 Base02 类成员排在了 Base01 后面, offset 为 sizeof(int b01)，Derived 类的 int d 成员变量被排在了两个基类后面，offset 为 sizeof(int b01)+sizeof(int b02)。
即多重继承类似单一继承，多个 subobject 紧挨着从派生类的起始地址开始放置。

2.3 有虚函数的单一继承

如下函数：

class Base {
public:
  virtual ~Base() {}
  virtual void interface() {}
public:
  int b;
};

class Derived : public Base {
public:
  ~Derived() {}
  void interface() override {}
public:
  int d;
};

int main() {
  Derived d;
  return 0;
}

使用 clang -Xclang -fdump-record-layouts test.cpp 指令导出对象内存结构如下：

可见，在基类和派生类对象中，int b 的 offset 从 0 变为了 8，这 8 个字节被 Base vtable pointer 指针占据。这既是虚函数表指针，基类和派生类各一个，分别指向不同的虚函数表。

2.4 有虚函数的多重继承

如下函数：

class Base01 {
public:
  virtual ~Base01() {}
  virtual void interface() {}
public:
  int b01;
};

class Base02 {
public:
  virtual ~Base02() {}
  virtual void interface() {}
public:
  int b02;
};

class Derived : public Base01, public Base02 {
public:
  ~Derived() {}
  void interface() override {}
public:
  int d;
};

int main() {
  Derived d;
  return 0;
}

使用 clang -Xclang -fdump-record-layouts test.cpp 指令导出对象内存结构如下：

可见，因为 Derived 类继承的两个基类都有虚函数，所以 Derived 类拥有两个虚函数表指针。

3. 虚函数表结构

一个可执行程序的生成分为预处理、编译、汇编、连接 4 个阶段，虚函数表在编译阶段生成，一般存储在只读数据段(.rodata)中。
在单一继承中，每个类对应一个虚函数表(含虚函数的继承)，每个实例化的对象拥有一个由编译器安插的虚函数表指针，即类的不同实例化对象的虚函数表指针指向同一个虚函数表。
在多重继承中，无论有多少个含虚函数的基类，每个类也只对应一个虚函数表。但是，在实例化对象中，编译器会安插多个虚函数表指针，指针指向表的不同位置。
在前面讨论的类对象内存模型中，是没有虚函数表的，因为虚函数表不属于某个对象，本节主要来看看虚函数表的结构。

3.1 单一继承下的虚函数表

如下函数：

#include <stdio.h>

class Base {
public:
  virtual ~Base() {}
  virtual void interface() {
    printf("base interface called\n");
  }
public:
  int b;
};

class Derived : public Base {
public:
  ~Derived() {}
  void interface() override {
    printf("derived interface called\n");
  }
public:
  int d;
};

typedef void(*func)();
int main() {
  Derived obj;
  long long* vptr = (long long*)(*(long long*)&obj);
  printf("vptr: %lx\n", vptr);
  func f = (func)*(vptr+2);
  f();
  return 0;
}

使用 g++ -fdump-lang-class test.cpp 指令导出虚函数表内存结构如下：

Base 基类的虚函数表中，总共有 5 项，从 offset=0 开始，依次为 top_offset、typeinfo 地址、complete destructor 函数地址、deleting destructor 函数地址、interface 虚函数地址，其中 vptr=((& Base::_ZTV4Base) + 16) 表明虚函数表指针指向的是 complete destructor 函数地址所在的槽位(slot)，而不是虚函数表的起始位置。
complete destructor 函数、deleting destructor 函数的作用可以参考：https://www.zhihu.com/question/597138428/answer/3028100403?utm_id=0(补充一点就是，deleting destructor 存储的是基类的析构函数，complete destructor 会去调用 deleting destructor 以正确的析构基类)
Derived 派生类的虚函数表也有 5 项，其中 interface 虚函数覆盖了基类的虚函数，如果 Derived 没有定义 interface() 函数, 那么此函数地址将与基类 interface() 函数地址相同。

使用 g++ test.cpp -o mytest --std=c++11 编译并执行函数：

main() 函数中，long long* vptr = (long long*)(*(long long*)&obj) 语句得到 Derived 类的虚函数表指针，并打印出指针存储的地址(即虚函数表地址)，func f = (func)*(vptr+2) 语句将一个函数指针指向 interface() 函数然后调用，结果表明调用的是 Derived 类定义的 interface() 版本。

3.2 多重继承下的虚函数表

如下函数：

#include <stdio.h>

class Base01 {
public:
  virtual ~Base01() {}
  virtual void interface() {
    printf("base01 interface called\n");
  }
public:
  int b01;
};

class Base02 {
public:
  virtual ~Base02() {}
  virtual void interface() {
    printf("base02 interface called\n");
  }
public:
  int b02;
};

class Derived : public Base01, public Base02 {
public:
  ~Derived() {}
  void interface() override {
    printf("derived interface called\n");
  }
public:
  int d;
};

int main() {
  Derived obj;
  return 0;
}

使用 g++ -fdump-lang-class test.cpp 指令导出虚函数表内存结构如下：

Base01 和 Base02 两个基类的虚函数表结构与单一继承相同。在 Derived 派生类中，只有一个虚函数表，但是两个 vptr 指向了表的不同位置，结合虚函数表和对象内存结构图示如下：

4. 动态绑定

4.1 动态绑定的一般概念

我们知道，函数在编译好后会被加载到代码段，并拥有一个可执行的地址，函数调用需要知道函数的地址。与动态绑定对应的就是静态绑定，即在编译阶段就能确定目标函数的地址(注意是虚拟地址空间地址，非物理地址)，而动态绑定是指编译阶段无法获知目标函数的地址，只能等到运行阶段才能知道。
说到动态绑定并非一定会联系到多态，如下的一段函数(https://stackoverflow.com/questions/18035293/what-is-early-static-and-late-dynamic-binding-in-c)：

using FuncType = int(*)(int,int); // pointer to a function
                                  // taking 2 ints and returning one.
int add(int a, int b) { return a + b; }
int substract(int a, int b) { return a - b; }
int main() {
    char op = 0;
    std::cin >> op;
    FuncType const function = op == '+' ? &add : &substract;
    std::cout << function(4, 5) << "\n";
}

function 指向的具体函数，依赖于用户输入，在编译阶段无法确定，只有在运行阶段才能确定，这也是动态绑定。

4.2 动态绑定与多态

与多态联系到一起的时候，如下代码：

#include <stdio.h>

class Base {
public:
  virtual ~Base() {}
  virtual void interface() {
    printf("base interface called\n");
  }
public:
  int b;
};

class Derived : public Base {
public:
  ~Derived() {}
  void interface() override {
    printf("derived interface called\n");
  }
public:
  int d;
};

void job(Base* obj) {
  obj->interface();
}

在 job() 函数中，因为调用的 interface() 函数是虚函数，而编译器无法在编译阶段获知 obj 指向的实际类是哪个，所以无法确定调用 Base::interface() 还是 Derived::interface()，只有在运行阶段，才能根据虚函数表确定 interface() 的实际地址，这便是与多态相关的动态绑定。
注意，obj->interface() 函数实际上会被编译器转换为 (*(obj->vptr)[n])(obj)，其中 n 是 interface() 函数在虚函数表中的槽(slot)位编号。所以 Derived::interface() 和 Base::interface() 在各自的虚函数表中槽位编号必须一致。

5. rtti(运行时类型识别)与向上、向下转型

5.1 向上转型(upcast)

即派生类转换为基类，如 Base* obj = new Derived。这种转换为隐式转换，所谓隐式转换，意味着编译器在编译时会适当的安插一些代码，特别是对于多重继承：

  // 多重继承: class Derived: public Base01, public Base02 {}
  对于：
      Base02* obj = new Derived;
  会被隐式转换：
      Base02* obj = (Base02*)((char*)(new Derived) + sizeof(Base01));

对于单一继承，或者多重继承最左边的基类，因为对象地址是对齐的，则不需要加上偏移。
可以看到，向上转型必须知道实际类的类型，必须知道类的继承关系，如果不知道，那么偏移量将无法确定，如下函数：

#include <stdio.h>

class Base01 {
public:
  int data = 100;
};

class Base02 {
public:
  int data = 101;
};

class Derived : public Base01, public Base02 {
public:
  int data = 102;
};

int main() {
  void* tmp = new Derived;
  // Derived* tmp = new Derived;
  Base02* obj = (Base02*)(tmp);
  printf("data value: %d\n", obj->data);
  return 0;
}

执行 g++ test.cpp -o mytest 并运行：

可以看到，由于 tmp 是 void* 类型，向上转型的时候结果将是未定义的。
另外注意，向上转型需要编译器知道实际对象的类型，与 rtti 无关。

5.2 向下转型(downcast)

即基类转换为派生类，如 Derived* obj = (Derived)base，或 Derived obj = dynamic_cast<Derived*>(base)。支持此种转换需要知道对象的实际类型，以及 subobject 在派生类中的偏移。
与向上转型不同，编译器在编译期间实际上无法知道对象的实际类型，无法在编译期间安插转型代码，只能在运行阶段才能完成转型。

5.3 rtti(运行时类型识别)

前面在介绍虚函数表的时候，虚函数指针之上还有两个槽(slot)：

第一个偏移量为 0 的槽，被称为 top_offset，是一个数值
第二个偏移量为 8 的槽，被称为 typeinfo，是一个指向 class typeinfo 类的指针

需要注意的是，只有基类拥有虚函数才会有虚函数表，才能存储 rtti 信息。如果基类不含有虚函数，那么 dynamic_cast 在编译阶段就会报错。

5.3.1 top_offset

在多重继承中，实现向下转型时，必须知道基类的 subobject 在对象中的偏移，以实现：

  Derived* obj = (Derived*)base
向下转型，使得 obj 指针指向 Derived 对象的起始位置(offset可正可负)：
  Derived* obj = (Derived*)((char*)base + offset)

不同于向上转型，offset 无法在编译时获得(不知道对象的实际类型)，为此，虚函数表中的 top_offset 专门用来作为转型所需的 offset。例如多重继承 class Derived: public Base01, public Base02 {}，使用 g++ -fdump-lang-class test.cpp 指令查看派生类的虚函数表结构：

虚函数表偏移量为 0 的 top_offset 值为 0，因为 Base01 的 subobject 本身与 Derived 类对象起始地址对齐；虚函数表偏移量为 40 的 top_offset 值为 -16，因此从 Base02 向下转型为 Derived 时，需要减去 16 字节的地址才能回到 Derived 对象的起始地址。

5.3.2 typeinfo

向下转型也需要安全性保证，即对象实际类型需要与等式左边的类类型相同，不然不管不顾直接加上 top_offset，将会导致程序错误。
为了实现运行时对象类型识别，c++ 引入了 typeinfo 类，并为每个类示例化了一个 typeinfo 类对象(实际上 typeinfo 类是一个基类)，将对象地址放置在虚函数表中，向下转型时，根据虚函数表找到 typeinfo 对象，即找到了对象的实际类型。
注意，在多重继承中，派生类的虚函数表有多个 typeinfo 对象：

实际上，这两个槽指向的是同一个 typeinfo 对象，这样同一个类的多个虚表指针也都能得到相同的 typeinfo 对象。

6. 虚析构函数

释放一个对象有两个任务需要完成：

正确的调用析构函数
free 掉申请的内存

不考虑虚函数，对于如下代码：

多重继承(非虚)：
class Derived: public Base01, public Base02 {}
对象一：
Derived* obj01 = new Derived;
delete obj01;
对象二：
Base01* obj02 = new Derived;
delete obj02;
对象三：
Base02* obj03 = new Derived;
delete obj03;

在 Derived 类的析构函数中，编译器会在用户代码的最后插入对两个基类对象析构函数的调用
delete obj01 时，会调用 Derived::destruct()，然后依次调用基类 subobject 的析构函数；且因为 obj01 的地址是 new Derived 返回的起始地址，free 内存也能成功。
delete obj02 时，调用的是 Base01::destruct()，无法调用到 Derived 和 Base02 的析构函数；但是因为 obj02 所属的 subobject 是与 Derived 对象起始地址对齐的，free 内存能够成功。
delete obj03 时，调用的是 Base02::destruct()，无法调用到 Derived 和 Base01 的析构函数；且 obj03 所属的 subobject 与 Derived 对象起始地址不对齐，差了 sizeof(Base01) 的 offset，free 内存将失败(直接dump)。

如果两个基类定义了虚函数，那么 delete obj02 和 delete obj03 时，根据虚函数表，调用的是 Derived::destruct()，然后依次调用基类 subobject 的析构函数；同时调用析构函数时能够根据 rtti 信息能够向下转型(downcast)为 Derived 对象正确得到 this 指针，free 内存才能成功。

posted @ 2022-01-18 10:50 小夕nike 阅读(98) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

小夕nike

继承与多态

1. 概述

2. 对象内存模型

2.1 无虚函数的单一继承

2.2 无虚函数的多重继承

2.3 有虚函数的单一继承

2.4 有虚函数的多重继承

3. 虚函数表结构

3.1 单一继承下的虚函数表

3.2 多重继承下的虚函数表

4. 动态绑定

4.1 动态绑定的一般概念

4.2 动态绑定与多态

5. rtti(运行时类型识别)与向上、向下转型

5.1 向上转型(upcast)

5.2 向下转型(downcast)

5.3 rtti(运行时类型识别)

5.3.1 top_offset

5.3.2 typeinfo

6. 虚析构函数

公告