C++中的const限定符导致的链接问题

问题

如下所示有两个cxx源文件, 分别定义与使用一个const变量. 将其一起编译时报错: undefined reference of 'meow::miao'.

[01:06:44] hansy@hansy:~/testcase$ cat 1.cc 
namespace meow {
  const int miao = 1;
}
[01:06:47] hansy@hansy:~/testcase$ cat 2.cc 
namespace meow {
  extern const int miao;
}

using namespace meow;
int main() {
  return miao;
}
[01:06:49] hansy@hansy:~/testcase$ 
[01:06:52] hansy@hansy:~/testcase$ g++ 1.cc -c -o 1.o && g++ 2.cc -c -o 2.o && gcc 1.o 2.o
2.o:在函数‘main’中:
2.cc:(.text+0x6):对‘meow::miao’未定义的引用
collect2: error: ld returned 1 exit status

原因

nm看下链接的符号, 发现两个文件中的符号并不一致, 怪不得linker找不到符号, 问题是c++filt打印的名字却是同一个.

[01:07:05] hansy@hansy:~/testcase$ nm 1.o | grep miao
0000000000000000 r _ZN4meowL4miaoE
[01:07:11] hansy@hansy:~/testcase$ nm 2.o | grep miao
                 U _ZN4meow4miaoE
[01:07:15] hansy@hansy:~/testcase$ c++filt _ZN4meowL4miaoE
meow::miao
[01:07:20] hansy@hansy:~/testcase$ c++filt _ZN4meow4miaoE
meow::miao

尝试去掉const重新编译发现结果发生变化:

[01:07:40] hansy@hansy:~/testcase$ g++ 1.cc -c -o 1.o && g++ 2.cc -c -o 2.o && gcc 1.o 2.o
[01:07:42] hansy@hansy:~/testcase$ nm 1.o | grep miao
0000000000000000 D _ZN4meow4miaoE

所以问题出在const上, 查了下c++ standard, 在3.5节找到了答案.

A name is said to have linkage when it might denote the same object, reference, function, type, template,
namespace or value as a name introduced by a declaration in another scope:
— When a name has external linkage , the entity it denotes can be referred to by names from scopes of
other translation units or from other scopes of the same translation unit.
— When a name has internal linkage , the entity it denotes can be referred to by names from other scopes
in the same translation unit.
— When a name has no linkage , the entity it denotes cannot be referred to by names from other scopes.
A name having namespace scope (3.3.6) has internal linkage if it is the name of
— a variable, function or function template that is explicitly declared static; or,
— a variable that is explicitly declared const or constexpr and neither explicitly declared extern nor
previously declared to have external linkage; or
— a data member of an anonymous union.

即在C++中const限定符会导致被声明const的变量在链接时仅作用于本编译单元内, 如果要使其生效则需要显示声明为extern或之前声明时已声明为外部链接(这和C标准有点不一样).

那么compiler是如何实现的呢? 从上面实验可以看到是通过name mangling实现的: 定义const变量的符号名比定义非const的变量多一个字符L, 而未定义的变量的符号都是不带L的.
关于name mangling的实现之前没有具体研究过, 参考wiki上的介绍说下自己的理解:
首先mangling的方式不止一种, 不同的mangling之间不能兼容(很明显不同的名字导致链接时找不到对应的符号). 其中最常见的两种mangling分别是用于Windows的Visual C++规范与用于linux的Itanium规范.
因为手边没有gcc源码, 就以llvm代码为例看下compiler是如何处理mangling来解决linkage visibility问题的.

在clang/lib/AST目录下有两个文件MicrosoftMangle.cpp与ItaniumMangle.cpp(分别对应两类mangling). 我们可以看下ItaniumMangle.cpp中CXXNameMangler::mangleUnqualifiedName()的实现:

  switch (Name.getNameKind()) {
  case DeclarationName::Identifier: {
    const IdentifierInfo *II = Name.getAsIdentifierInfo();
    if (II) {
      // Match GCC's naming convention for internal linkage symbols, for
      // symbols that are not actually visible outside of this TU. GCC
      // distinguishes between internal and external linkage symbols in
      // its mangling, to support cases like this that were valid C++ prior
      // to DR426:
      //   
      //   void test() { extern void foo(); }
      //   static void foo();
      //   
      // Don't bother with the L marker for names in anonymous namespaces; the
      // 12_GLOBAL__N_1 mangling is quite sufficient there, and this better
      // matches GCC anyway, because GCC does not treat anonymous namespaces as
      // implying internal linkage.
      if (ND && ND->getFormalLinkage() == InternalLinkage &&
          !ND->isExternallyVisible() &&
          getEffectiveDeclContext(ND)->isFileContext() &&
          !ND->isInAnonymousNamespace())
        Out << 'L'; 

      auto *FD = dyn_cast<FunctionDecl>(ND);
      bool IsRegCall = FD &&
                       FD->getType()->castAs<FunctionType>()->getCallConv() ==
                           clang::CC_X86RegCall;
      if (IsRegCall)
        mangleRegCallName(II);
      else 
        mangleSourceName(II);

      writeAbiTags(ND, AdditionalAbiTags);
      break;
    }
    ......
  }

有趣的是注释已经解释了L的作用: 传统的Gcc通过mangling来判断外部链接的符号与内部链接的符号, 对于内部链接符号默认给名字加L处理. 而如果只声明不定义的符号默认是外部链接的(本文件内没有定义), 所以导致符号名不一致.

总结

好像没什么要总结的, 这么一个破问题研究了半天, 人wo生shi寂zai寞shi如tai雪cai啊le.

posted @ 2020-10-11 04:16  Five100Miles  阅读(301)  评论(0编辑  收藏  举报