tf.api

API列表

Dataset基础使用
- tf.data.Dataset.from_tensor_slices 这个api构建Dataset
- 在这个Dataset上具体调用repeat(重复多少次), batch, interleave, map, shuffle, list_files这些api
csv（读取csv的时候）
- 使用tf.data.TextLineDataset读取文本文件，tf.io.decode_csv解析csv文件
Tfrecord(Dataset读取Tfrecord文件) 如下api
- tf.train.FloatList, tf.train.Int64List, tf.train.BytesList
- tf.train.Feature, tf.train.Freaturs, tf.train.Example 封装tf.example写到文件中去
- example.SerialiineToString x序列化
- tf.io.ParseSingleExample 解析一个具体的Example
- tf.io.VarLenFeature, tf.io.FixedLenFeature 解析tf.example
- tf.data.TFRecordDataset 基于tf.record构建dataset, tf.io.TFRecordOptions制定读取文件的类型
interleave将每个元素进行处理产生新的结果，利用interleave将处理的结果合并形成新的数据集
case:文件dataset-->具体的数据集。遍历数据名中的数据集，通过inter leave合并成新的数据集

# interleave 将每个元素进行处理产生新的结果，利用interleave将处理的结果合并形成新的数据集
# case ：文件dataset--> 具体的数据集。遍历数据名中的数据集，通过interleave合并成新的数据集 

dataset2 = dataset.interleave(
    # map_fn,对数据进行怎样的变换
    lambda v: tf.data.Dataset.from_tensor_slices(v), #在这里就【27】中的item  
    cycle_length = 5,    # cycle_length :并行程度，即并行的去同时处理dataset中的多少个元素
    block_length = 5,    # block_length ：从上面变换的结果中，每次取多少个结果出来（map_fnz中)
    #block_length：达到一种均匀混合的效果（因为他从每个里面取出来5个，最后89不够从第一个补充，知道没有元素了）

)
for item in dataset2:
    print(item)

x =  np.array([[1,2], [2,3], [3,4]])
y =  np.array(['cat', 'dag', 'fox'])
#两个维度必须相等
dataset3 = tf.data.Dataset.from_tensor_slices((x,y))
print(dataset3)

for item_x, item_y in dataset3:
    print(item_x, item_y)
for item_x, item_y in dataset3:
    print(item_x.numpy(), item_y.numpy())
    
    
x =  np.array([[1,2], [2,3], [3,4]])

y =  np.array(['cat', 'dag', 'fox'])

#两个维度必须相等

dataset3 = tf.data.Dataset.from_tensor_slices((x,y))

print(dataset3)
for item_x, item_y in dataset3:

    print(item_x, item_y)

for item_x, item_y in dataset3:

    print(item_x.numpy(), item_y.numpy())

<TensorSliceDataset shapes: ((2,), ()), types: (tf.int64, tf.string)>
tf.Tensor([1 2], shape=(2,), dtype=int64) tf.Tensor(b'cat', shape=(), dtype=string)
tf.Tensor([2 3], shape=(2,), dtype=int64) tf.Tensor(b'dag', shape=(), dtype=string)
tf.Tensor([3 4], shape=(2,), dtype=int64) tf.Tensor(b'fox', shape=(), dtype=string)
[1 2] b'cat'
[2 3] b'dag'
[3 4] b'fox'

dataset4 = tf.data.Dataset.from_tensor_slices({"feature":x,
                                              "label":y})
for item in dataset4:
    print(item)

for item in dataset4:
    print(item["feature"].numpy(), item["label"].numpy())
    
{'feature': <tf.Tensor: id=160, shape=(2,), dtype=int64, numpy=array([1, 2])>, 'label': <tf.Tensor: id=161, shape=(), dtype=string, numpy=b'cat'>}
{'feature': <tf.Tensor: id=162, shape=(2,), dtype=int64, numpy=array([2, 3])>, 'label': <tf.Tensor: id=163, shape=(), dtype=string, numpy=b'dag'>}
{'feature': <tf.Tensor: id=164, shape=(2,), dtype=int64, numpy=array([3, 4])>, 'label': <tf.Tensor: id=165, shape=(), dtype=string, numpy=b'fox'>}
[1 2] b'cat'
[2 3] b'dag'
[3 4] b'fox'

tf.data.Dataset.list_files

tf.io.decode_csv(str, record_defaults) 解析 record_defaults字符串的类型是什么
参数：
records：一个string类型的Tensor。每个字符串都是csv中的记录/行，所有记录都应具有相同的格式；
record_defaults：具有特定类型的Tensor对象列表。可接受的类型有float32，float64，int32，int64，string。输入记录的每列一个张量，具有该列的标量默认值或者如果需要该列则为空向量；
field_delim=','：可选的string。默认为","。用于分隔记录中字段的char分隔符；
use_quote_delim=True：可选的bool。默认为True。如果为false，则将双引号视为字符串字段内的常规字符；
na_value=''：要识别为NA/NaN的附加字符串；
select_cols=None：可选的列索引的可选排序列表。如果指定，则仅解析并返回此列的子集；
name=None：操作的名称

tf.stack（）变成向量

aa1=tf.constant([1,2,3])
aa2=tf.constant([4,5,6])
ff=tf.stack([aa1,aa2],axis=1) #tf.stack()是一个矩阵拼接函数，会根据函数中对应的参数调整拼接的维度。 axis=0,表示在第一个维度及逆行数据的拼接，如1x3和1x3的数据拼接会形成一个形状为2x3的数据。axis=1表示在第二维的数据进行拼接。
print(ff)

是做随机采样使用的缓冲大小， buffer_size的值是相对于batch_ size而言的 tensorflow中的数据集类Dataset有一个shuffle方法，用来打乱数据集中数据顺序，训练时非常常用。其中shuffle方法有一个参数buffer_size，非常令人费解，文档的解释如下：

shuffle函数中的参数buffer_size

dataset = dataset.map() 把dataset中的参数进行map（）中的操作在返回一个dataset

.take()沿着坐标轴返回给定位置索引中的元素