TFRecord格式

TFRecord格式是TensorFlow首选的格式，用于储存大量数据并有效读取数据。这是一种非常简单的二进制格式，只包含大小不同的二进制记录序列（每个记录由一个长度、一个用于检查长度是否受损的CRC校验和、实际数据
以及最后一个CRC校验和组成）。可以使用tf.io.TFRecordWriter 类轻松创建TFRecord文件

import tensorflow as tf

with tf.io.TFRecordWriter('my_data.tfrecord') as f:
    f.write(b'This is the first tfrecord')
    f.write(b'And this is the second record')

然后可以使用tf.data.TFRecordDataset读取一个或多个TFRecord文件：

filepaths = ['my_data.tfrecord']
dataset = tf.data.TFRecordDataset(filepaths)
for item in dataset:
    print(item)

tf.Tensor(b'This is the first tfrecord', shape=(), dtype=string)
tf.Tensor(b'And this is the second record', shape=(), dtype=string)

默认情况下，TFRecordDataset将一个接一个地读取文件，但是可以用过num_parallel_reads使其并行读取多个文件并交织记录。另外，可以使用list_files()和interleave()得到与前面读取多个csv文件相同的结果

压缩的TFRecord文件

有时压缩TFRecord文件可能很有用，尤其是在需要通过网络连接加载它们的情况下。可以通过设置options参数来创建压缩的TFRecord文件：

options = tf.io.TFRecordOptions(compression_type='GZIP')
with tf.io.TFRecordWriter('my_compressed.tfrecord', options) as f:
    f.write(b'This is the first tfrecord')
    f.write(b'And this is the second record')
    filepaths = ['my_compressed.tfrecord']
dataset = tf.data.TFRecordDataset(filepaths, compression_type='GZIP')
for item in dataset:
    print(item)

tf.Tensor(b'This is the first tfrecord', shape=(), dtype=string)
tf.Tensor(b'And this is the second record', shape=(), dtype=string)

协议缓冲区简介

即使每个记录可以使用想要的任何二进制格式，TFRecord文件通常包含序列化地协议缓冲区（也成为protobufs）。这是一种可移植
、可拓展且高效的二进制格式，在2001年由Google开发，并于2008年开源。protobufs现在被广泛使用，尤其是在Google的远程过程调用系统gRPC中

posted @ 2021-10-27 11:15 里列昂遗失的记事本阅读(234) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

里列昂遗失的记事本

TFRecord格式

TFRecord格式

压缩的TFRecord文件

协议缓冲区简介

公告