TVM:Relay算子实现流程
转载:https://blog.csdn.net/zx_ros/article/details/123526147
自定义算子的步骤:
1.定义算子属性节点
2.编写算子的输入输出类型推导关系函数
3.使用RELAY_REGISTER_OP宏注册算子
4.实现算子的compute函数
5.注册算子的compute函数和调度schedule
6.实现算子调用时生成call node的函数,注册该函数在前端被调用的Python API 钩子
7.将钩子封装成简洁的python接口
1.定义算子属性节点
算子属性是算子在编译时必须提供的,并且是有确定值的参数。例如卷积算子的 stride 和 dilation参数(python/tvm/relay/op/nn/nn.py):
def conv2d(
data,
weight,
strides=(1, 1),
padding=(0, 0),
dilation=(1, 1),
groups=1,
channels=None,
kernel_size=None,
data_layout="NCHW",
kernel_layout="OIHW",
out_layout="",
out_dtype="",
):
这些参数定义在 include/tvm/relay/attrs/目录下的文件中。以nn.conv2d卷积为例,属性定义如下(见include/tvm/relay/attrs/nn.h):
struct Conv2DAttrs : public tvm::AttrsNode<Conv2DAttrs> {
Array<IndexExpr> strides;
Array<IndexExpr> padding;
Array<IndexExpr> dilation;
int groups;
IndexExpr channels;
Array<IndexExpr> kernel_size;
tvm::String data_layout;
tvm::String kernel_layout;
tvm::String out_layout;
tvm::String auto_scheduler_rewritten_layout; // The layout after auto-scheduler's layout rewrite
DataType out_dtype;
TVM_DECLARE_ATTRS(Conv2DAttrs, "relay.attrs.Conv2DAttrs") {
TVM_ATTR_FIELD(strides)
.set_default(Array<IndexExpr>({1, 1}))
.describe("Specifies the strides of the convolution.");
TVM_ATTR_FIELD(padding)
.set_default(Array<IndexExpr>({0, 0}))
.describe(
"If padding is non-zero, then the input is implicitly zero-padded"
"Padding support both symmetric and asymmetric as"
"one int : same padding used on all sides"
"two int : bottom, right will use same padding as top, left"
"four int : padding width in the order of (top, left, bottom, right)");
TVM_ATTR_FIELD(dilation)
.set_default(Array<IndexExpr>({1, 1}))
.describe("Specifies the dilation rate to use for dilated convolution.");
TVM_ATTR_FIELD(groups).set_default(1).describe(
"Controls the connections between inputs and outputs."
"At groups=1, all inputs are convolved to all outputs."
"At groups=2, the operation becomes equivalent to having two convolution"
"layers side by side, each seeing half the input channels, and producing"
"half the output channels, and both subsequently concatenated.");
TVM_ATTR_FIELD(channels)
.describe(
"The number of output channels in the convolution."
" If it is not set, inferred by shape of the weight.")
.set_default(NullValue<IndexExpr>());
TVM_ATTR_FIELD(kernel_size)
.describe("Specifies the dimensions of the convolution window.")
.set_default(NullValue<Array<IndexExpr>>());
TVM_ATTR_FIELD(data_layout)
.set_default("NCHW")
.describe(
"Dimension ordering of input data. Can be 'NCHW', 'NHWC', etc."
"'N', 'C', 'H', 'W' stands for batch, channel, height, and width"
"dimensions respectively. Convolution is applied on the 'H' and"
"'W' dimensions.");
TVM_ATTR_FIELD(kernel_layout)
.set_default("OIHW")
.describe(
"Dimension ordering of weight. Can be 'OIHW', 'OIHW16o16i', etc."
"'O', 'I', 'H', 'W' stands for num_filter, input_channel, height, and width"
"dimensions respectively.");
TVM_ATTR_FIELD(out_layout)
.set_default("")
.describe(
"Dimension ordering of output. Can be 'NCHW', 'NHWC', etc."
"'N', 'C', 'H', 'W' stands for batch, channel, height, and width"
"dimensions respectively. Default to be same as input layout.");
// use 0 bits to indicate none.
TVM_ATTR_FIELD(out_dtype)
.set_default(NullValue<DataType>())
.describe("Output data type, set to explicit type under mixed precision setting");
}
};
这里对每个属性,使用TVM_ATTR_FIELD声明属性,使用describe定义属性的描述说明,使用set_default定义属性的类型和默认值。
2.编写算子的类型推导规则
为了使算子更加灵活,有更好的泛化能力,允许使用算子的输入输出类型关系来确定算子的类型(类似C++的模板?)。这种类型关系由若干函数描述,涵盖了一系列的的输入输出类型(不是所有的类型)。关系描述函数以输入输出类型为参数,推导满足关系规则的输入输出类型。这种推导包括编译时可以静态确定的shape信息。本质上,算子的这种关系除了可以推导算子输出类型外,还可以强制执行所有必要的类型规则(通过输入类型检查)
以nn.conv2d算子的类型推导函数为例,算子的类型推导函数为Conv2DRel,实现代码在src/relay/op/nn/convolution.h中。该函数的输入包括类型数组和算子属性。类型数组有三个元素,第一个元素为输入数据的类型,第二个参数为卷积核的类型,第三个是作为函数的返回参数,是推导出的算子输出类型信息,包括输出tensor的shape和数据类型。属性包括输入输出的tensor描述信息,卷积核的tensor描述信息,分组参数,膨胀系数,stride, pad等等。在该函数中检查输入数据类型和卷积核类型;推导算子输入数据的NCHW格式,卷积核的HWIO格式,根据相关参数推导输出的tensor类型以及shape。
3.RELAY_REGISTER_OP宏注册算子
接下来就是注册算子,并将算子与调用接口关联起来。算子注册使用RELAY_REGISTER_OP宏,该宏允许开发者指定算子的以下信息:
1.元数,即算子的参数个数;
2.位置参数的名称和说明;
3.支持级别(1 表示内部实现;较高的数字表示较少的内部支持或外部支持的算子)
4.算子的类型关系
5.优化算子时有用的其他注释
以卷积算子为例,注册代码():
RELAY_REGISTER_OP("nn.conv2d")
.describe(R"code(2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.
- **data**: This depends on the `layout` parameter. Input is 4D array of shape
(batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**: This depends on the `layout` parameter. Output is 4D array of shape
(batch_size, channels, out_height, out_width) if `layout` is `NCHW`.
)code" TVM_ADD_FILELINE)
.set_attrs_type<Conv2DAttrs>()
.set_num_inputs(2)
.add_argument("data", "Tensor", "The input tensor.")
.add_argument("weight", "Tensor", "The weight tensor.")
.set_support_level(2)
.add_type_rel("Conv2D", Conv2DRel<Conv2DAttrs>)
.set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>);
这里输入参数个数set_num_inputs为2,应该就是输入data和卷积核,并使用add_argument设置两个输入的类型;使用add_type_rel添加前面步骤2定义的关系函数Conv2DRel;set_attr设置的参数Conv2DAttrs在步骤1中定义。ConvInferCorrectLayout最后返回的使一个shape推导的类。在Conv2DRel中会根据输入shape、卷积核shape和卷积属性推导输出的shape。
4.定义算子的compute函数
算子的compute函数是算子的计算过程实现。nn.conv2d的算子算法实现入口为python/tvm/topi/nn/conv2d.py中定义的conv2d函数,调用了同文件中的conv接口,在该接口中实现了compute函数并调用:
def conv(
inp: te.Tensor,
filt: te.Tensor,
stride: Union[int, Sequence[int]],
padding: Union[int, Sequence[int]],
dilation: Union[int, Sequence[int]],
groups: int,
order: str,
out_dtype: Union[str, None] = None,
):
...
def compute(*args):
nn, ff, *dim_indices = list(np.array(args)[permutation_to])
return te.sum(
temp.__getitem__(
tuple(
np.array(
[nn, ff // (num_filter // groups) * (in_channel // groups) + rc]
+ [
di * stride + r * dil
for di, stride, r, dil in zip(dim_indices, strides, rs, dilations)
]
)[permutation_from]
)
).astype(out_dtype)
* filt.__getitem__(tuple(np.array([ff, rc] + rs)[permutation_from_kernel])).astype(
out_dtype
),
# Schedules depend on reduction axes being in the same order as the
# layout, so we reorder here.
axis=np.array([rc, *rs])[permutation_from_reductions].tolist(),
)
return te.compute(
list(np.array([batch, out_channel] + out_dimensions)[permutation_from]),
compute,
# tag is expected to be lowercase
tag=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
name=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",
)
这里只是conv2d的默认compute,根据参数和输入数据的排布格式,在 python/tvm/topi/nn/conv2d.py中定义了对应的compute函数。
5.注册算子的compute函数和schedule
在实现算子的compute函数后,我们需要将这个compute函数加入relay算子中。在TVM中,这意味着我们不仅仅只是实现计算方法,还要给出对应的调度schedule策略,也就是为compute挑选合适的schedule。例如,当2d卷积是一个分组卷积时,我们会给它分配合适的计算方法和调度。conv2d的shedule定义在python/tvm/topi/generic/nn.py中,以schedule_conv2d_开头的函数定义了各种数据排布格式对应的调度策略,大部分都是使用了默认的调度方法。
conv2d的strategy函数conv2d_strategy定义在python/tvm/relay/op/strategy/generic.py中。在该函数中,根据输入数据和卷积核的排布格式,给出各种排布组合的计算方法和调度。compute和schedule的组合即strategy。
6.实现算子调用时的Call Node生成函数和python调用钩子
通过前面的步骤,已经在relay中加入了我们的算子,接下来就要能通过Relay Call Node来调用它。这一步我们要写一个接口,将参数传入算子,然后返回一个Relay Call Node。这个Node可以加入Relay的语法树。
前不支持直接调用 Attrs和参数,所以这里用Op::Get从算子注册表中获取算子信息,作为参数传递给Call Node。nn.conv2d的Relay Call Node生成函数(src/relay/op/nn/convolution_make.h):
template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, std::string data_layout,
std::string kernel_layout, std::string out_layout, DataType out_dtype,
std::string op_name) {
auto attrs = make_object<T>();
attrs->strides = std::move(strides);
attrs->padding = std::move(padding);
attrs->dilation = std::move(dilation);
attrs->groups = groups;
attrs->channels = std::move(channels);
attrs->kernel_size = std::move(kernel_size);
attrs->data_layout = std::move(data_layout);
attrs->kernel_layout = std::move(kernel_layout);
attrs->out_layout = std::move(out_layout);
attrs->out_dtype = std::move(out_dtype);
const Op& op = Op::Get(op_name);
return Call(op, {data, weight}, Attrs(attrs), {});
}
当模板参数为Conv2DAttrs的时候,即生成的nn.conv2d的Relay Call Node。这里先是new了一个Conv2DAttrs,接收传入的各参数和属性;然后获取2d卷积注册信息,一并传给Call;最后返回CallNode类型实例的引用。
在定义Relay Call Node函数后,我们要向Python注册一个接口来调用这个函数。这里注册是使用TVM_REGISTER_GLOBAL宏。注册后,在Python中就可以用relay.op._make.xxx(...)形式调用了。nn.conv2d的注册:
// relay.nn.conv2d
TVM_REGISTER_NODE_TYPE(Conv2DAttrs);
TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")
.set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
Array<IndexExpr> dilation, int groups, IndexExpr channels,
Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
String out_layout, DataType out_dtype) {
return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,
kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
"nn.conv2d");
});
7.python调用钩子封装
为了更加方便的使用TVM_REGISTER_GLOBAL注册的钩子,我们可以将钩子封装成一个简单独立的接口。nn.conv2d的封装:
def conv2d(
data,
weight,
strides=(1, 1),
padding=(0, 0),
dilation=(1, 1),
groups=1,
channels=None,
kernel_size=None,
data_layout="NCHW",
kernel_layout="OIHW",
out_layout="",
out_dtype="",
):
r"""2D convolution.
Parameters
----------
data : tvm.relay.Expr
The input data to the operator.
weight : tvm.relay.Expr
The weight expressions.
strides : Optional[int, Tuple[int]]
The strides of convolution.
padding : Optional[int, Tuple[int]]
The padding of convolution on both sides of inputs before convolution.
dilation : Optional[int, Tuple[int]]
Specifies the dilation rate to be used for dilated convolution.
groups : Optional[int]
Number of groups for grouped convolution.
channels : Optional[int]
Number of output channels of this convolution.
kernel_size : Optional[int, Tuple[int]]
The spatial of the convolution kernel.
data_layout : Optional[str]
Layout of the input.
kernel_layout : Optional[str]
Layout of the weight.
out_layout : Optional[str]
Layout of the output, by default, out_layout is the same as data_layout
out_dtype : Optional[str]
Specifies the output data type for mixed precision conv2d.
Returns
-------
result : tvm.relay.Expr
The computed result.
"""
if isinstance(kernel_size, int):
kernel_size = (kernel_size, kernel_size)
if isinstance(strides, int):
strides = (strides, strides)
if isinstance(dilation, int):
dilation = (dilation, dilation)
# TODO enforce 4-way padding in topi/nn/conv2d after #4644 merged
# convert 2-way padding to 4-way padding
padding = get_pad_tuple2d(padding)
return _make.conv2d(
data,
weight,
strides,
padding,
dilation,
groups,
channels,
kernel_size,
data_layout,
kernel_layout,
out_layout,
out_dtype,
)