LLVM笔记(12) - 指令选择(四) legalize

本节介绍指令选择中legalize的概念, 为中端IR精确匹配机器指令需要同时检查操作符与操作数, 在正式指令选择前对不合法的操作符或操作数作出转换的过程即legalize.
通常情况下给定一个后端架构其支持的指令集:

  1. 不一定能支持表达所有中端IR的操作. 一个显而易见的例子是在一个不支持浮点运算的架构上运行浮点运算的代码, 编译器会使用软浮点函数调用来替换这些操作, 而在一个支持浮点运算的架构上编译器会选择生成对应的浮点运算指令. 将不支持的中端IR语义转换为架构支持的行为被称作legalize operation.
  2. 另一方面对具体一条指令其支持的操作数通常也不能支持所有的数据类型. 以ARM为例其加法指令支持32bit整型相加, 若中端IR输入是两个8bit整型数据相加, 编译器需要将两个操作数先零扩展为32bit整型, 使用加法指令相加后再将结果截断为8bit整型. 变换数据类型使其符合指令操作数类型的过程被称作legalize type(矢量数据又称legalize vector).
    可见不同于combine, legalize是一个与架构强相关的概念, 在legalize过程中, 公共的SDNode将逐步替换为架构独有的SDNode, 甚至有些情况下直接转换为机器指令. 我们首先会介绍一些legalize中的基本概念, 解释架构如何设置回调来指定legalize的方式, 然后分析legalize的具体实现.

设置架构的legalize方式

为表明对一个操作该如何使其合法化, 在TargetLoweringBase类(defined in include/llvm/CodeGen/TargetLowering.h)中定义了五种legalize的枚举, 用来对应不同的legalize方式.

enum LegalizeAction : uint8_t {
  Legal,      // The target natively supports this operation.
  Promote,    // This operation should be executed in a larger type.
  Expand,     // Try to expand this to other ops, otherwise use a libcall.
  LibCall,    // Don't try to expand this to other ops, always use a libcall.
  Custom      // Use the LowerOperation hook to implement custom lowering.
};

enum LegalizeTypeAction : uint8_t {
  TypeLegal,           // The target natively supports this type.
  TypePromoteInteger,  // Replace this integer with a larger one.
  TypeExpandInteger,   // Split this integer into two of half the size.
  TypeSoftenFloat,     // Convert this float to a same size integer type.
  TypeExpandFloat,     // Split this float into two of half the size.
  TypeScalarizeVector, // Replace this one-element vector with its element.
  TypeSplitVector,     // Split this vector into two of half the size.
  TypeWidenVector,     // This vector should be widened into a larger vector.
  TypePromoteFloat     // Replace this float with a larger one.
};

LegalizeAction枚举含义如下:

  1. legal - 架构支持该操作/操作数类型.
  2. promote - 架构支持该操作, 但需要将其扩展为更大的数据类型(比如前面提到的ARM上的8bit加法可以转换为32bit加法).
  3. expand - 架构不支持该操作, 可扩展为其它指令或一个libc调用(与promote相反, 比如ARM上的64bit加法需要拆成两个32bit加法的组合).
  4. libcall - 架构不支持该操作, 直接调用libc接口替换(比如前面提到的软浮点运算).
  5. custom - 自定义接口, 需要在LowerOperation中实现.

如果一个操作是legal的, legalize过程将会忽略这个节点, 否则根据分类做对应的legalize, 其中custom类型需要编译器开发人员自己实现回调来使节点合法化.
LegalizeTypeAction枚举含义类似, 但是根据数据类型不同细分了更多的种类.
每个架构需要指定每个数据类型/操作在这个架构上legalize的类型, 这些信息保存在TargetLoweringBase类中.

class TargetLoweringBase {
  /// This indicates the default register class to use for each ValueType the
  /// target supports natively.
  const TargetRegisterClass *RegClassForVT[MVT::LAST_VALUETYPE];

  /// For any value types we are promoting or expanding, this contains the value
  /// type that we are changing to.  For Expanded types, this contains one step
  /// of the expand (e.g. i64 -> i32), even if there are multiple steps required
  /// (e.g. i64 -> i16).  For types natively supported by the system, this holds
  /// the same type (e.g. i32 -> i32).
  MVT TransformToType[MVT::LAST_VALUETYPE];

  /// For each operation and each value type, keep a LegalizeAction that
  /// indicates how instruction selection should deal with the operation.  Most
  /// operations are Legal (aka, supported natively by the target), but
  /// operations that are not should be described.  Note that operations on
  /// non-legal value types are not described here.
  LegalizeAction OpActions[MVT::LAST_VALUETYPE][ISD::BUILTIN_OP_END];
};

寄存器类型决定了架构操作的数据类型, 因此RegClassForVT数组保存了每种数据类型映射的对应的寄存器类型.
当该架构不支持某个数据类型时需要将其转换为合法的数据类型, TransformToType数组保存了需要转换的目标类型.
最后对于一个给定某一数据类型的操作, 其legalize的类型被保存在二维数组OpActions中. 在索引该数组时注意如果架构不支持某一数据类型那么必然不支持该数据类型的所有操作. 此时该数据类型的操作的legalize方式不应该从这个数组中查找, 而是从TransformToType中首先转换为合法的数据类型再在对应的操作中查找.
TargetLoweringBase中还保存extend load / trunc store / condition code等节点的legalize方式, 限于篇幅这里一一解释.
我们先看下何修改OpActions, TargetLoweringBase提供了两个接口来设置与访问这个数组.

class TargetLoweringBase {
public:
  /// Indicate that the specified operation does not work with the specified
  /// type and indicate what to do about it. Note that VT may refer to either
  /// the type of a result or that of an operand of Op.
  void setOperationAction(unsigned Op, MVT VT,
                          LegalizeAction Action) {
    assert(Op < array_lengthof(OpActions[0]) && "Table isn't big enough!");
    OpActions[(unsigned)VT.SimpleTy][Op] = Action;
  }

  /// Return how this operation should be treated: either it is legal, needs to
  /// be promoted to a larger size, needs to be expanded to some other code
  /// sequence, or the target has a custom expander for it.
  LegalizeAction getOperationAction(unsigned Op, EVT VT) const {
    if (VT.isExtended()) return Expand;
    // If a target-specific SDNode requires legalization, require the target
    // to provide custom legalization for it.
    if (Op >= array_lengthof(OpActions[0])) return Custom;
    return OpActions[(unsigned)VT.getSimpleVT().SimpleTy][Op];
  }
};

每个架构都有一个[arch]TargetLowering类继承TargetLoweringBase类, 在该类的构造函数中会调用setOperationAction()初始化OpActions. 我们以RISCV为例, 截取RISCVTargetLowering::RISCVTargetLowering()(defined in lib/Target/RISCV/RISCVISelLowering.cpp)的部分代码.

RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
                                         const RISCVSubtarget &STI)
    : TargetLowering(TM), Subtarget(STI) {
  MVT XLenVT = Subtarget.getXLenVT();

  // Set up the register classes.
  addRegisterClass(XLenVT, &RISCV::GPRRegClass);

  // Compute derived properties from the register classes.
  computeRegisterProperties(STI.getRegisterInfo());

  setOperationAction(ISD::BR_JT, MVT::Other, Expand);
  setOperationAction(ISD::BR_CC, XLenVT, Expand);
  setOperationAction(ISD::SELECT, XLenVT, Custom);
  setOperationAction(ISD::SELECT_CC, XLenVT, Expand);

  setOperationAction(ISD::GlobalAddress, XLenVT, Custom);
  setOperationAction(ISD::BlockAddress, XLenVT, Custom);
  setOperationAction(ISD::ConstantPool, XLenVT, Custom);
  setOperationAction(ISD::GlobalTLSAddress, XLenVT, Custom);

  ......
};

注意到上面的代码中还调用了TargetLoweringBase::addRegisterClass()与TargetLoweringBase::computeRegisterProperties()(defined in include/llvm/CodeGen/TargetLowering.h), 前者会会设置RegClassForVT, 后者需要在设置所有架构支持的寄存器后调用, 它会根据添加的寄存器类型来计算TransformToType.

class TargetLoweringBase {
public:
  /// Add the specified register class as an available regclass for the
  /// specified value type. This indicates the selector can handle values of
  /// that class natively.
  void addRegisterClass(MVT VT, const TargetRegisterClass *RC) {
    assert((unsigned)VT.SimpleTy < array_lengthof(RegClassForVT));
    RegClassForVT[VT.SimpleTy] = RC;
  }
};

/// computeRegisterProperties - Once all of the register classes are added,
/// this allows us to compute derived properties we expose.
void TargetLoweringBase::computeRegisterProperties(
    const TargetRegisterInfo *TRI) {
  // Everything defaults to needing one register.
  for (unsigned i = 0; i != MVT::LAST_VALUETYPE; ++i) {
    NumRegistersForVT[i] = 1;
    RegisterTypeForVT[i] = TransformToType[i] = (MVT::SimpleValueType)i;
  }
  // ...except isVoid, which doesn't need any registers.
  NumRegistersForVT[MVT::isVoid] = 0;

  // Find the largest integer register class.
  unsigned LargestIntReg = MVT::LAST_INTEGER_VALUETYPE;
  for (; RegClassForVT[LargestIntReg] == nullptr; --LargestIntReg)
    assert(LargestIntReg != MVT::i1 && "No integer registers defined!");

  // Every integer value type larger than this largest register takes twice as
  // many registers to represent as the previous ValueType.
  for (unsigned ExpandedReg = LargestIntReg + 1;
       ExpandedReg <= MVT::LAST_INTEGER_VALUETYPE; ++ExpandedReg) {
    NumRegistersForVT[ExpandedReg] = 2*NumRegistersForVT[ExpandedReg-1];
    RegisterTypeForVT[ExpandedReg] = (MVT::SimpleValueType)LargestIntReg;
    TransformToType[ExpandedReg] = (MVT::SimpleValueType)(ExpandedReg - 1);
    ValueTypeActions.setTypeAction((MVT::SimpleValueType)ExpandedReg,
                                   TypeExpandInteger);
  }

  // Inspect all of the ValueType's smaller than the largest integer
  // register to see which ones need promotion.
  unsigned LegalIntReg = LargestIntReg;
  for (unsigned IntReg = LargestIntReg - 1;
       IntReg >= (unsigned)MVT::i1; --IntReg) {
    MVT IVT = (MVT::SimpleValueType)IntReg;
    if (isTypeLegal(IVT)) {
      LegalIntReg = IntReg;
    } else {
      RegisterTypeForVT[IntReg] = TransformToType[IntReg] =
        (MVT::SimpleValueType)LegalIntReg;
      ValueTypeActions.setTypeAction(IVT, TypePromoteInteger);
    }
  }

  ......
};

computeRegisterProperties()比较复杂, 这里只截取计算TransformToType的实现. 可以看到它首先查找架构支持的最大整型寄存器对应的数据类型, 所有大于该类型的数据类型被分成两个小一级的数据类型(expand), 所有小于该类型的且非法的数据类型(以X86_64为例, 它支持16bit 32bit 64bit等多个整型类型, 那就无需promote)会向上一级转化(promote).

legalize type

SelectionDAG的策略是首先legalize type再legalize operation, 这样做的好处是在legalize type后DAG中所有的节点值的数据类型都是架构支持的类型(尽管存在有些操作对应部分数据类型是非法的).
如果反过来先legalize operation可以吗? 我的理解是不行的, 因为在legalize operation时需要知道其legalize的方式, 而这个分类本身又和数据类型相关(expand / promote). 如果先做legalize operation等于在legalize operation过程中还要考虑legalize type的问题.
因此在实现架构相关的custom legalize时也要考虑这个问题: 在legalize type时做数据类型的合法化, 在legalize operation时做操作的合法化.
让我们先来看下legalize type的实现, legalize type的入口是SelectionDAG::LegalizeTypes()(defined in lib/CodeGen/SelectionDAG/LegalizeTypes.cpp), 它会调用DAGTypeLegalizer::run(), 后者会检查DAG中所有SDValue并legalize非法的value type.
DAGTypeLegalizer::run()的实现有点类似于DAGCombiner::Run(). 区别在于combine时不关心节点的先后顺序, 只需保证所有节点都遍历过即可, 而legalize type则要求先legalize操作数再legalize节点值.

bool DAGTypeLegalizer::run() {
  bool Changed = false;

  // Create a dummy node (which is not added to allnodes), that adds a reference
  // to the root node, preventing it from being deleted, and tracking any
  // changes of the root.
  HandleSDNode Dummy(DAG.getRoot());
  Dummy.setNodeId(Unanalyzed);

  // The root of the dag may dangle to deleted nodes until the type legalizer is
  // done.  Set it to null to avoid confusion.
  DAG.setRoot(SDValue());

  // Walk all nodes in the graph, assigning them a NodeId of 'ReadyToProcess'
  // (and remembering them) if they are leaves and assigning 'Unanalyzed' if
  // non-leaves.
  for (SDNode &Node : DAG.allnodes()) {
    if (Node.getNumOperands() == 0) {
      AddToWorklist(&Node);
    } else {
      Node.setNodeId(Unanalyzed);
    }
  }

  ......
}

run()首先会新建一个dummy节点引用root节点防止root节点被优化, 将DAG的root设为空防止悬空的节点. 然后遍历DAG中所有的无操作数的节点加入worklist, 将其它节点设为未分析的状态. 之前在介绍SDNode时提到SDNode.NodeId的含义与处理的流程相关, 在legalize中NodeId代表了节点的legalize状态, 其定义见lib/CodeGen/SelectionDAG/LegalizeTypes.h:

class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
  const TargetLowering &TLI;
  SelectionDAG &DAG;
public:
  /// This pass uses the NodeId on the SDNodes to hold information about the
  /// state of the node. The enum has all the values.
  enum NodeIdFlags {
    /// All operands have been processed, so this node is ready to be handled.
    ReadyToProcess = 0,

    /// This is a new node, not before seen, that was created in the process of
    /// legalizing some other node.
    NewNode = -1,

    /// This node's ID needs to be set to the number of its unprocessed
    /// operands.
    Unanalyzed = -2,

    /// This is a node that has already been processed.
    Processed = -3

    // 1+ - This is a node which has this many unprocessed operands.
  };
};

注意当其大于0时表示该节点还有若干操作数未经过legalize, 每legalize一个节点就会将节点的user的计数减1, 当节点计数到0时将其加入worklist, 小于0时代表节点未初始化或已处理.
回到run(), 类似combine一样每次从worklist中取出一个节点做legalize, DAGTypeLegalizer::getTypeAction()会调用TargetLowering::getTypeAction()查询节点legalize方式, 后者的实现见上文. 节点legalize后将其设为Processed状态, 同时检查节点的user, 减少其引用计数, 若引用计数到0将其加入worklist.

bool DAGTypeLegalizer::run() {
  ......
getTypeAction
  // Now that we have a set of nodes to process, handle them all.
  while (!Worklist.empty()) {
#ifndef EXPENSIVE_CHECKS
    if (EnableExpensiveChecks)
#endif
      PerformExpensiveChecks();

    SDNode *N = Worklist.back();
    Worklist.pop_back();
    assert(N->getNodeId() == ReadyToProcess &&
           "Node should be ready if on worklist!");

    LLVM_DEBUG(dbgs() << "Legalizing node: "; N->dump(&DAG));
    if (IgnoreNodeResults(N)) {
      LLVM_DEBUG(dbgs() << "Ignoring node results\n");
      goto ScanOperands;
    }

    // Scan the values produced by the node, checking to see if any result
    // types are illegal.
    for (unsigned i = 0, NumResults = N->getNumValues(); i < NumResults; ++i) {
      EVT ResultVT = N->getValueType(i);
      LLVM_DEBUG(dbgs() << "Analyzing result type: " << ResultVT.getEVTString()
                        << "\n");
      switch (getTypeAction(ResultVT)) {
      case TargetLowering::TypeLegal:
        LLVM_DEBUG(dbgs() << "Legal result type\n");
        break;
      // The following calls must take care of *all* of the node's results,
      // not just the illegal result they were passed (this includes results
      // with a legal type).  Results can be remapped using ReplaceValueWith,
      // or their promoted/expanded/etc values registered in PromotedIntegers,
      // ExpandedIntegers etc.
      case TargetLowering::TypePromoteInteger:
        PromoteIntegerResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeExpandInteger:
        ExpandIntegerResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeSoftenFloat:
        SoftenFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeExpandFloat:
        ExpandFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeScalarizeVector:
        ScalarizeVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeSplitVector:
        SplitVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeWidenVector:
        WidenVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypePromoteFloat:
        PromoteFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      }
    }

    ......

NodeDone:

    // If we reach here, the node was processed, potentially creating new nodes.
    // Mark it as processed and add its users to the worklist as appropriate.
    assert(N->getNodeId() == ReadyToProcess && "Node ID recalculated?");
    N->setNodeId(Processed);

    for (SDNode::use_iterator UI = N->use_begin(), E = N->use_end();
         UI != E; ++UI) {
      SDNode *User = *UI;
      int NodeId = User->getNodeId();

      // This node has two options: it can either be a new node or its Node ID
      // may be a count of the number of operands it has that are not ready.
      if (NodeId > 0) {
        User->setNodeId(NodeId-1);

        // If this was the last use it was waiting on, add it to the ready list.
        if (NodeId-1 == ReadyToProcess)
          Worklist.push_back(User);
        continue;
      }

      // If this is an unreachable new node, then ignore it.  If it ever becomes
      // reachable by being used by a newly created node then it will be handled
      // by AnalyzeNewNode.
      if (NodeId == NewNode)
        continue;

      // Otherwise, this node is new: this is the first operand of it that
      // became ready.  Its new NodeId is the number of operands it has minus 1
      // (as this node is now processed).
      assert(NodeId == Unanalyzed && "Unknown node ID!");
      User->setNodeId(User->getNumOperands() - 1);

      // If the node only has a single operand, it is now ready.
      if (User->getNumOperands() == 1)
        Worklist.push_back(User);
    }
  }

  ......
}

legalize vector

TODO

legalize operation

SelectionDAG类提供了两个接口来调用legalize operation, 其中LegalizeOp()用来legalize一个操作, 我们在combine中已经见到过, legalize整个DAG则需要调用Legalize(). 两者都调用SelectionLegalize::LegalizeOp()来实现legalize operation.

void SelectionDAG::Legalize() {
  AssignTopologicalOrder();

  SmallPtrSet<SDNode *, 16> LegalizedNodes;
  // Use a delete listener to remove nodes which were deleted during
  // legalization from LegalizeNodes. This is needed to handle the situation
  // where a new node is allocated by the object pool to the same address of a
  // previously deleted node.
  DAGNodeDeletedListener DeleteListener(
      *this,
      [&LegalizedNodes](SDNode *N, SDNode *E) { LegalizedNodes.erase(N); });

  SelectionDAGLegalize Legalizer(*this, LegalizedNodes);

  // Visit all the nodes. We start in topological order, so that we see
  // nodes with their original operands intact. Legalization can produce
  // new nodes which may themselves need to be legalized. Iterate until all
  // nodes have been legalized.
  while (true) {
    bool AnyLegalized = false;
    for (auto NI = allnodes_end(); NI != allnodes_begin();) {
      --NI;

      SDNode *N = &*NI;
      if (N->use_empty() && N != getRoot().getNode()) {
        ++NI;
        DeleteNode(N);
        continue;
      }

      if (LegalizedNodes.insert(N).second) {
        AnyLegalized = true;
        Legalizer.LegalizeOp(N);

        if (N->use_empty() && N != getRoot().getNode()) {
          ++NI;
          DeleteNode(N);
        }
      }
    }
    if (!AnyLegalized)
      break;

  }

  // Remove dead nodes now.
  RemoveDeadNodes();
}

bool SelectionDAG::LegalizeOp(SDNode *N,
                              SmallSetVector<SDNode *, 16> &UpdatedNodes) {
  SmallPtrSet<SDNode *, 16> LegalizedNodes;
  SelectionDAGLegalize Legalizer(*this, LegalizedNodes, &UpdatedNodes);

  // Directly insert the node in question, and legalize it. This will recurse
  // as needed through operands.
  LegalizedNodes.insert(N);
  Legalizer.LegalizeOp(N);

  return LegalizedNodes.count(N);
}

legalize operation处理节点也是分先后顺序的, 但是不同于combine和legalize type, 它会首先对DAG中节点进行一次拓扑排序.
SelectionDAG::AssignTopologicalOrder()(define in lib/CodeGen/SelectionDAG/SelectionDAG.cpp)会根据节点的操作数做排序, 相同操作数的节点按之前的先后顺序排序, 排序后节点的NodeId等于它在DAG中的位置.
注意代码中的SortedPos表示当前排序的队列尾, 第一遍遍历时先将NodeId设置为节点的操作数个数(没有操作数的节点就直接排序), 之后再次遍历节点将已排序的节点的user的引用计数减少直到计数为0(表明节点的前继均已排序, 该节点也可加入排序), 当遍历到结尾时队列已全部有序.

unsigned SelectionDAG::AssignTopologicalOrder() {
  unsigned DAGSize = 0;

  // SortedPos tracks the progress of the algorithm. Nodes before it are
  // sorted, nodes after it are unsorted. When the algorithm completes
  // it is at the end of the list.
  allnodes_iterator SortedPos = allnodes_begin();

  // Visit all the nodes. Move nodes with no operands to the front of
  // the list immediately. Annotate nodes that do have operands with their
  // operand count. Before we do this, the Node Id fields of the nodes
  // may contain arbitrary values. After, the Node Id fields for nodes
  // before SortedPos will contain the topological sort index, and the
  // Node Id fields for nodes At SortedPos and after will contain the
  // count of outstanding operands.
  for (allnodes_iterator I = allnodes_begin(),E = allnodes_end(); I != E; ) {
    SDNode *N = &*I++;
    checkForCycles(N, this);
    unsigned Degree = N->getNumOperands();
    if (Degree == 0) {
      // A node with no uses, add it to the result array immediately.
      N->setNodeId(DAGSize++);
      allnodes_iterator Q(N);
      if (Q != SortedPos)
        SortedPos = AllNodes.insert(SortedPos, AllNodes.remove(Q));
      assert(SortedPos != AllNodes.end() && "Overran node list");
      ++SortedPos;
    } else {
      // Temporarily use the Node Id as scratch space for the degree count.
      N->setNodeId(Degree);
    }
  }

  // Visit all the nodes. As we iterate, move nodes into sorted order,
  // such that by the time the end is reached all nodes will be sorted.
  for (SDNode &Node : allnodes()) {
    SDNode *N = &Node;
    checkForCycles(N, this);
    // N is in sorted position, so all its uses have one less operand
    // that needs to be sorted.
    for (SDNode::use_iterator UI = N->use_begin(), UE = N->use_end();
         UI != UE; ++UI) {
      SDNode *P = *UI;
      unsigned Degree = P->getNodeId();
      assert(Degree != 0 && "Invalid node degree");
      --Degree;
      if (Degree == 0) {
        // All of P's operands are sorted, so P may sorted now.
        P->setNodeId(DAGSize++);
        if (P->getIterator() != SortedPos)
          SortedPos = AllNodes.insert(SortedPos, AllNodes.remove(P));
        assert(SortedPos != AllNodes.end() && "Overran node list");
        ++SortedPos;
      } else {
        // Update P's outstanding operand count.
        P->setNodeId(Degree);
      }
    }
    if (Node.getIterator() == SortedPos) {
#ifndef NDEBUG
      allnodes_iterator I(N);
      SDNode *S = &*++I;
      dbgs() << "Overran sorted position:\n";
      S->dumprFull(this); dbgs() << "\n";
      dbgs() << "Checking if this is due to cycles\n";
      checkForCycles(this, true);
#endif
      llvm_unreachable(nullptr);
    }
  }

  return DAGSize;
}

注意到legalize type一开始只处理操作数为0的节点, 且下一次处理时取的队尾节点, 因此是DFS处理. 而legalize operation是先排序, 然后按顺序一一处理, 是BFS方式. 为什么使用不同算法?
可以通过截取的部分SelectionDAGLegalize::LegalizeOp()代码看到legalize单个节点的流程与legalize type基本类似, 一个区别是legalize operation有多一种custom legalize.
custom方式会调用TargetLoweringBase::LowerOperation()实现架构自定义的legalize, 其返回值被用来替换原节点, 注意custom的legalize不能返回空的节点(否则报错崩溃).

void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
  ......

  // Figure out the correct action; the way to query this varies by opcode
  TargetLowering::LegalizeAction Action = TargetLowering::Legal;
  bool SimpleFinishLegalizing = true;
  switch (Node->getOpcode()) {
  ......
  default:
    if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
      Action = TargetLowering::Legal;
    } else {
      Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
    }
    break;
  }

  if (SimpleFinishLegalizing) {
    SDNode *NewNode = Node;

    switch (Action) {
    case TargetLowering::Legal:
      LLVM_DEBUG(dbgs() << "Legal node: nothing to do\n");
      return;
    case TargetLowering::Custom:
      LLVM_DEBUG(dbgs() << "Trying custom legalization\n");
      // FIXME: The handling for custom lowering with multiple results is
      // a complete mess.
      if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {
        if (!(Res.getNode() != Node || Res.getResNo() != 0))
          return;

        if (Node->getNumValues() == 1) {
          LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
          // We can just directly replace this node with the lowered value.
          ReplaceNode(SDValue(Node, 0), Res);
          return;
        }

        SmallVector<SDValue, 8> ResultVals;
        for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
          ResultVals.push_back(Res.getValue(i));
        LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
        ReplaceNode(Node, ResultVals.data());
        return;
      }
      LLVM_DEBUG(dbgs() << "Could not custom legalize node\n");
      LLVM_FALLTHROUGH;
    case TargetLowering::Expand:
      if (ExpandNode(Node))
        return;
      LLVM_FALLTHROUGH;
    case TargetLowering::LibCall:
      ConvertNodeToLibcall(Node);
      return;
    case TargetLowering::Promote:
      PromoteNode(Node);
      return;
    }
  }

  switch (Node->getOpcode()) {
  default:
    llvm_unreachable("Do not know how to legalize this operator!");

  case ISD::CALLSEQ_START:
  case ISD::CALLSEQ_END:
    break;
  case ISD::LOAD:
    return LegalizeLoadOps(Node);
  case ISD::STORE:
    return LegalizeStoreOps(Node);
  }
}

以RISCV为例, 由于指令集限制(不支持32bit立即数编码), 移动一个全局地址到寄存器需要通过两条指令实现(lui+addi / auipc+addi), 因此RISCV自定义了相关的legalize实现.
前面我们看到RISCV自定义了GlobalAddress节点的legalize方式, 在lib/Target/RISCV/RISCVISelLowering.cpp中实现了对应的lowering. 注意到RISCV直接使用机器指令的NodeType来替换原节点, 即lowering过程可能会干涉到指令选择(在lowering过程中对某些节点提前选择指令替换).

SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
                                            SelectionDAG &DAG) const {
  switch (Op.getOpcode()) {
  default:
    report_fatal_error("unimplemented operand");
  case ISD::GlobalAddress:
    return lowerGlobalAddress(Op, DAG);
  ......
  }
}

SDValue RISCVTargetLowering::lowerGlobalAddress(SDValue Op,
                                                SelectionDAG &DAG) const {
  SDLoc DL(Op);
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  int64_t Offset = N->getOffset();
  MVT XLenVT = Subtarget.getXLenVT();

  const GlobalValue *GV = N->getGlobal();
  bool IsLocal = getTargetMachine().shouldAssumeDSOLocal(*GV->getParent(), GV);
  SDValue Addr = getAddr(N, DAG, IsLocal);

  // In order to maximise the opportunity for common subexpression elimination,
  // emit a separate ADD node for the global address offset instead of folding
  // it in the global address node. Later peephole optimisations may choose to
  // fold it back in when profitable.
  if (Offset != 0)
    return DAG.getNode(ISD::ADD, DL, Ty, Addr,
                       DAG.getConstant(Offset, DL, XLenVT));
  return Addr;
}

template <class NodeTy>
SDValue RISCVTargetLowering::getAddr(NodeTy *N, SelectionDAG &DAG,
                                     bool IsLocal) const {
  SDLoc DL(N);
  EVT Ty = getPointerTy(DAG.getDataLayout());

  if (isPositionIndependent()) {
    SDValue Addr = getTargetNode(N, DL, Ty, DAG, 0);
    if (IsLocal)
      // Use PC-relative addressing to access the symbol. This generates the
      // pattern (PseudoLLA sym), which expands to (addi (auipc %pcrel_hi(sym))
      // %pcrel_lo(auipc)).
      return SDValue(DAG.getMachineNode(RISCV::PseudoLLA, DL, Ty, Addr), 0);

    // Use PC-relative addressing to access the GOT for this symbol, then load
    // the address from the GOT. This generates the pattern (PseudoLA sym),
    // which expands to (ld (addi (auipc %got_pcrel_hi(sym)) %pcrel_lo(auipc))).
    return SDValue(DAG.getMachineNode(RISCV::PseudoLA, DL, Ty, Addr), 0);
  }

  switch (getTargetMachine().getCodeModel()) {
  default:
    report_fatal_error("Unsupported code model for lowering");
  case CodeModel::Small: {
    // Generate a sequence for accessing addresses within the first 2 GiB of
    // address space. This generates the pattern (addi (lui %hi(sym)) %lo(sym)).
    SDValue AddrHi = getTargetNode(N, DL, Ty, DAG, RISCVII::MO_HI);
    SDValue AddrLo = getTargetNode(N, DL, Ty, DAG, RISCVII::MO_LO);
    SDValue MNHi = SDValue(DAG.getMachineNode(RISCV::LUI, DL, Ty, AddrHi), 0);
    return SDValue(DAG.getMachineNode(RISCV::ADDI, DL, Ty, MNHi, AddrLo), 0);
  }
  case CodeModel::Medium: {
    // Generate a sequence for accessing addresses within any 2GiB range within
    // the address space. This generates the pattern (PseudoLLA sym), which
    // expands to (addi (auipc %pcrel_hi(sym)) %pcrel_lo(auipc)).
    SDValue Addr = getTargetNode(N, DL, Ty, DAG, 0);
    return SDValue(DAG.getMachineNode(RISCV::PseudoLLA, DL, Ty, Addr), 0);
  }
  }
}

日常小姐:

  1. legalize的目的? 将架构不支持的数据类型/操作转换为支持的操作, 保证指令选择时一定有pattern覆盖.
  2. legalize的内容? type legalize和operation legalize, 分别针对数据类型(映射到哪种寄存器)与节点类型(映射到哪条指令)做legalize.
  3. 影响legalize的后端接口? TargetLoweringBase中的callback.
  4. legalize问题的定位? 按阶段打印DAG图, 比较每次优化后DAG是否是对目标架构而言合法的DAG图, 注意不同的legalize需要在不同阶段实现不能搞错次序.
posted @ 2020-05-14 02:01  Five100Miles  阅读(6499)  评论(0编辑  收藏  举报