如何在TVM上集成Codegen（下）

Bring DNNL to TVM: JSON Codegen/Runtime

现在实现将Relay，序列化为JSON表示的DNNL codegen，然后实现DNNL JSON runtime，反序列化和执行。尝试实现codegen，生成C兼容的程序。

要使TVM中的DNNL JSON codegen/runtime在本例中工作，确保DNNL在计算机上可用，使用set(USE_DNNL_CODEGEN ON)，构建TVM配置文件。

DNNL codegen在src/relay/backend/contrib/dnnl/codegen.cc。在这个文件的两个表单中，都实现了DNNL codegen，在跟踪代码时，可以将注意力集中在USE_JSON_RUNTIME宏，所涵盖的部分。

首先用TVM注册API（L510）注册codegen。使TVM编译引擎，将Compiler=<your codegen>的Relay函数，分派到relay.ext.<your codegen>。实现了DNNL编译器（L490）的入口函数。有关详细信息，请阅读代码片段中嵌入的注释：

runtime::Module DNNLCompiler(const ObjectRef& ref) {

// "ref" should be the paritioned Relay function with kCompiler=dnnl.

CHECK(ref->IsInstance<FunctionNode>());

auto func = Downcast<Function>(ref);

// Get the function name as the symbol to match in runtime.

auto func_name = GetExtSymbol(func);

// Serialize the function to a JSON string (introduce later).

DNNLJSONSerializer serializer(func_name, func);

serializer.serialize();

std::string graph_json = serializer.GetJSON();

// The constant tensor names that have been bound to the module.

// All constant tensors will be serialzied along with the JSON graph

// when export_library is invoked.

auto params = serializer.GetParams();

// The function to create DNNL JSON runtime (introduce later).

const auto* pf = runtime::Registry::Get("runtime.DNNLJSONRuntimeCreate");

CHECK(pf != nullptr) << "Cannot find JSON runtime module to create";

// Create a DNNL runtime module that can run the serialized function.

auto mod = (*pf)(func_name, graph_json, params);

return mod;

}

TVM_REGISTER_GLOBAL("relay.ext.dnnl").set_body_typed(DNNLCompiler);

每个 runtime模块，只负责一个Relay函数，可能在一个single .so文件中，有多个DNNL runtime模块。

DNNL JSON序列化

接下来，实现dnnljson序列化器（L429）。

从BYOC JSON codegen (src/relay/backend/contrib/codegen_json/codegen_json.h)派生而来。DNNL JSON serializer中的特殊进程，尝试序列化对，可由DNNL JSON runtime解释的JSON节点的，复合函数调用。假设有一个与模式匹配的复合函数dnnl.conv2d_relu，则BYOC JSON codegen将生成以下JSON节点：

{

op: "kernel",

name: "dnnl.conv2d_relu",

inputs: [[0, 0, 0], [1, 0, 0]],

attrs: {

PartitionedFromPattern: ["nn.conv2d_nn.relu_"],

shape: [1, 32, 14, 14]

}

在runtime仍然需要Conv2D属性，比如padding和stripes，但是BYOC JSON序列化程序，只附加复合函数的属性，不附加body算子。另一方面，定制的DNNL JSON序列化程序，在复合函数中附加第一个，也是唯一一个Conv2D的属性，生成以下JSON节点：

{

op: "kernel",

name: "dnnl.conv2d_relu",

inputs: [[0, 0, 0], [1, 0, 0]],

attrs: {

shape: [1, 32, 14, 14],

data_layout: ["NCHW"],

kernel_layout: ["OIHW"],

strides: [1, 1],

padding: [1, 1, 1, 1]

}

从DNNL JSON序列化程序，只要JSON runtime能够解释，就可以定制序列化程序，生成JSON格式的任何表单。

DNNL JSON Runtime

实现一个DNNL JSON runtime，解释和执行序列化的JSON图。放在src/runtime/contrib/dnnl/dnnl_json_runtime.cc。

同样，首先注册两个api，创建 runtime，可以在任何地方使用。这个runtime.DNNLJSONRuntimeCreate序列化后，在上一部分中使用， runtime.module.loadbinary_dnnl_json可以在 load.so back时使用。

// Create a DNNL JSON runtime to interpret and execute the given JSON graph.

TVM_REGISTER_GLOBAL("runtime.DNNLJSONRuntimeCreate")

    .set_body_typed(DNNLJSONRuntimeCreate);

TVM_REGISTER_GLOBAL("runtime.module.loadbinary_dnnl_json")

    .set_body_typed(JSONRuntimeBase::LoadFromBinary<DNNLJSONRuntime>);

Now we explain DNNL JSON runtime implementation. The basic class structure is:

    // Initialize the DNNL graph engine.

    BuildEngine();

    // Setup constants entries for weights.

    SetupConstants(consts);

   // 1. Fill in the input buffers.

   // 2. Invoke the engine through intepreting the stream.

   // 3. Read and fill output buffers.

Init函数负责通过解释JSON图形字符串，构建DNNL引擎，将常量权重填充到相应的数据输入缓冲区（SetupConstant在JSON runtime基类中实现，只需在Init中调用）。

即使运行多次推断，函数也只会调用一次。

接下来，Run函数首先将输入张量（可能来自用户输入或恒定权重），写入构建DNNL引擎时，初始化的相应DNNL内存缓冲区。启动DNNL引擎，执行JSON图。将DNNL输出内存缓冲区，写回相应的输出张量。

由于DNNL JSON runtime中的rest，实现太过DNNL特定，将停止讨论。虽然DNNL JSON runtime是一个很好的参考，但是JSON runtime可以完全定制。

Bring DNNL to TVM: C Source Codegen

实现DNNL codegen，生成C源代码，调用dnnlapi，执行Relay图表。注释如果试图实现一个codegen，生成JSON格式的其它图形表示，可能需要阅读DNNL to TVM: JSON Codegen/Runtime。

要使TVM中的DNNL C源代码生成，确保DNNL在计算机上可用，使用set(USE_DNNL_CODEGEN C_SRC)，构建TVM配置文件制作.

DNNL codegen在src/relay/backend/contrib/dnnl/codegen.cc。在这个文件中。两个表单中，都实现了DNNL codegen，在跟踪代码时，可以将注意力集中在USE_JSON_RUNTIME runtime宏未涵盖的部分。

用TVM注册API注册codegen。使TVM编译引擎将Compiler=<your codegen>的Relay函数，分派到relay.ext.<your codegen>。实现DNNL编译器的entry函数：

runtime::Module DNNLCompiler(const ObjectRef& ref) {

DNNLModuleCodegen dnnl;

return dnnl.CreateCSourceModule(ref);

}

TVM_REGISTER_GLOBAL("relay.ext.dnnl").set_body_typed(DNNLCompiler);

每个 runtime模块，只负责一个Relay函数，可能在single .so文件中，有多个DNNL runtime模块。

推导了CSourceModuleCodegenBase，实现了DNNLModuleCodegen。CSourceModuleCodegenBase负责序列化等其它模块级流程，只需要在CreateCSourceModule函数中实现DNNL代码生成：

runtime::Module CreateCSourceModule(const ObjectRef& ref) override {

// Include headers

// ...skip...

code_stream_ << "#include <dnnl/dnnl_kernel.h>\n";

// ...skip...

// "ref" should be the paritioned Relay function with kCompiler=dnnl.

CHECK(ref->IsInstance<FunctionNode>());

auto res = GenDNNLFunc(Downcast<Function>(ref));

// "code" is the generated C code with DNNL APIs.

std::string code = code_stream_.str();

// "res" is a tuple of constant weights (symbols, values).

// All constant tensors will be serialzied along with the generated C code

// when export_library is invoked.

String sym = std::get<0>(res);

Array<String> variables = std::get<1>(res);

// Create a CSource module with all above artifacts.

const auto* pf = runtime::Registry::Get("runtime.CSourceModuleCreate");

CHECK(pf != nullptr) << "Cannot find csource module to create the external runtime module";

return (*pf)(code, "c", sym, variables);

}

实现GenDNNLFunc，用DNN API生成可编译的C代码。有关TVM C source runtime模块兼容函数接口的说明，参阅嵌入的注释。

// The example Relay graph: conv2d -> add -> relu.

#include <cstdint>

#include <cstdlib>

#include <cstring>

#include <vector>

#include <tvm/runtime/c_runtime_api.h>

#include <tvm/runtime/container.h>

#include <tvm/runtime/packed_func.h>

#include <dlpack/dlpack.h>

#include <dnnl/dnnl_kernel.h>

using namespace tvm::runtime;

using namespace tvm::runtime::contrib;

// Execute the conv2d->add->relu graph with DNNL.

extern "C" void dnnl_0_(float* dnnl_0_i0, float* dnnl_0_i1,

float* dnnl_0_i2, float* out0) {

// Allocate intermediate buffers.

float* buf_0 = (float*)std::malloc(4 * 4608);

float* buf_1 = (float*)std::malloc(4 * 4608);

float* buf_2 = (float*)std::malloc(4 * 4608);

// Pre-implemented op-based DNNL functions.

dnnl_conv2d(dnnl_0_i0, dnnl_0_i1, buf_0, 1, 32, 14, 14, 32, 1, 0, 0, 3, 3, 1, 1);

dnnl_add(buf_0, dnnl_0_i2, buf_1, 1, 32, 12, 12);

dnnl_relu(buf_1, buf_2, 1, 32, 12, 12);

// Copy the final output to the corresponding buffer.

std::memcpy(out0, buf_2, 4 * 4608);

std::free(buf_0);

std::free(buf_1);

std::free(buf_2);

}

// The wrapper function with all arguments in DLTensor type.

extern "C" int dnnl_0_wrapper_(DLTensor* arg0,

DLTensor* arg1,

DLTensor* arg2,

DLTensor* out0) {

// Cast all DLTensor to primitive type buffers and invoke the above

// execution function.

dnnl_0_(static_cast<float*>(arg0->data),

static_cast<float*>(arg1->data),

static_cast<float*>(arg2->data),

static_cast<float*>(out0->data));

return 0;

}

// The TVM macro to generate TVM runtime compatible function "dnnl_0"

// from our generated "dnnl_0_wrapper_".

TVM_DLL_EXPORT_TYPED_FUNC(dnnl_0, dnnl_0_wrapper_);

预先实现基于算子的DNNL函数位于src/runtime/contrib/dnnl/dnnl.cc。

rest实现在src/relay/backend/contrib/dnnl/codegen.cc，DNNL的具体细节，本文就到此为止。主要思想是实现一个Relay，给定的Relay函数，生成上面的C代码。只要codegen能够生成与TVM运行时，兼容的C代码，可以完全定制codegen满足需求。

C Source Compilation

DNNLCompiler的输出是一个模块，生成的C代码是文本格式的，GCC尚未将其编译为可执行的二进制文件。当用户调用export_libray(mod)时，编译生成的C代码，如下面的代码片段：

def update_lib(lib):

    # Include the path of src/runtime/contrib/dnnl/dnnl.cc

    # Setup the gcc flag to compile DNNL code.

    # The generated C code with DNNL APIs is compiled to a binary lib.so.

    # Load the lib.so back to a runtime module.

with tvm.transform.PassContext(opt_level=3):

Bring DNNL to TVM: Build TVM with DNNL Codegen/Runtime

创建cmake/modules/contrib/DNNL.cmake，在构建TVM时，包括DNNL codegen。为了演示，DNNL codegen在同一个cmake文件中，有两个实现。只能根据需要专注于其中的一个。

cmake文件就绪后，现在用户可以在构建中，指定set(USE_DNNL_CODEGEN ON)的build/config.cmake配置文件，制作启用DNNL codegen。

posted @ 2020-12-15 11:47 吴建明wujianming 阅读(384) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 全程不用写代码，我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了，比商业数据库还牛
· .NET10 - 预览版1新功能体验（一）

公告

昵称：吴建明wujianming
园龄： 7年5个月
粉丝： 532
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

吴建明

如何在TVM上集成Codegen（下）

Bring DNNL to TVM: JSON Codegen/Runtime

Bring DNNL to TVM: Build TVM with DNNL Codegen/Runtime

公告

搜索

常用链接

我的标签

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论