高效Tensor张量生成

Efficient Tensor Creation

从C++中的Excel数据中创建Tensor张量的方法有很多种，在简单性和性能之间都有不同的折衷。本文讨论了一些方法及其权衡。

提示

继续阅读之前请务必阅读C++指南

将数据直接写入Tensor张量

如果能做到这一点就更好了。

不要复制数据或包装现有数据，而是直接将数据写入Tensor张量。

正向

对于进程内和进程外的执行，这将在没有副本的情况下工作

没有内存对齐要求

不需要使用删除程序

反向

可能需要对现有的应用程序进行大量的重构，才能使其正常工作

实例

可以将数据直接接收到Tensor张量的底层缓冲区中：

// Allocate a tensor

auto tensor = allocator->allocate_tensor<float>({6, 6});

// Get a pointer to the underlying buffer

auto data = tensor->get_raw_data_ptr();

// Some function that writes data directly into this buffer

recv_message_into_buffer(data);

或者可以手动填写Tensor张量：

// Allocate a tensor

auto tensor = allocator->allocate_tensor<float>({256, 256});

const auto &dims = tensor->get_dims();

// Get an accessor

auto accessor = tensor->accessor<2>();

// Write data directly into it

for (int i = 0; i < dims[0]; i++)

{

for (int j = 0; j < dims[1]; j++)

{

accessor[i][j] = i * j;

}

甚至可以将其与TBB并行：

// Allocate a tensor

auto tensor = allocator->allocate_tensor<float>({256, 256});

const auto &dims = tensor->get_dims();

// Get an accessor

auto accessor = tensor->accessor<2>();

// Write data into the tensor in parallel

tbb::parallel_for(

// Parallelize in blocks of 16 by 16

tbb:blocked_range2d<size_t>(0, dims[0], 16, 0, dims[1], 16),

// Run this lambda in parallel for each block in the range above

[&](const blocked_range2d<size_t>& r) {

for(size_t i = r.rows().begin(); i != r.rows().end(); i++)

{

for(size_t j = r.cols().begin(); j != r.cols().end(); j++)

{

accessor[i][j] = i * j;

}

);

包装现有内存

如果已经在某个缓冲区中保存了数据，那么这个方法很好。

正向

在进程内执行期间，这将在没有副本的情况下工作

如果已经有数据很容易做到

反向

需要了解什么是删除者以及如何正确使用

为了有效地使用TF，数据需要64字节对齐

注意：这不是一个硬性要求，但是TF可以在引擎盖下复制未对齐的数据

与#1相比，这会在进程外执行期间生成一个额外的副本

实例

从cv：：Mat包装数据：

cv::Mat image = ... // An image from somewhere

auto tensor = allocator->tensor_from_memory<uint8_t>(

// Dimensions

{1, image.rows, image.cols, image.channels()},

// Data

image.data,

// Deleter

[image](void * unused) {

// By capturing `image` in this deleter, we ensure

// that the underlying data does not get deallocated

// before we're done with the tensor.

}

);

将数据复制到Tensor张量中

正向

很容易做到

无内存对齐要求

不需要使用删除程序

反向

在进程内执行期间总是生成一个额外的副本

与#1相比，这会在进程外执行期间生成一个额外的副本（尽管此副本是由用户显式编写的）

实例

从cv：：Mat复制：

cv::Mat image = ... // An image from somewhere

auto tensor = allocator->allocate_tensor<uint8_t>(

// Dimensions

{1, image.rows, image.cols, image.channels()}

);

// Copy data into the tensor

tensor->copy_from(image.data, tensor->get_num_elements());

该用哪一个？

一般来说，按业绩衡量的方法顺序如下：

直接将数据写入Tensor张量

包装现有内存

将数据复制到Tensor张量中

也就是说，分析是朋友。

简单性和性能之间的折衷对于大Tensor张量和小Tensor张量也是不同的，因为副本对于小Tensor张量更便宜。

posted @ 2020-06-15 12:24 吴建明wujianming 阅读(348) 评论(0) 编辑收藏举报

刷新页面返回顶部

吴建明

高效Tensor张量生成

公告