机器学习 Tensorflow 线程队列与IO操作

Tensorflow队列

在训练样本的时候，希望读入的训练样本时有序的

tf.FIFOQueue 先进先出队列，按顺序出队列

tf.RandomShuffleQueue 随机出队列

tf.FIFOQueue

FIFOQueue(capacity, dtypes, name='fifo_queue')创建一个以先进先出的顺序对元素进行排队的队列

capacity：整数。可能存储在此队列中的元素数量的上限
dtypes：DType对象列表。长度dtypes必须等于每个队列元素中的张量数,dtype的类型形状，决定了后面进队列元素形状

method

dequeue(name=None) 出列
enqueue(vals, name=None): 入列
enqueue_many(vals, name=None):vals列表或者元组返回一个进队列操作
size(name=None), 返回一个tensor类型的对象, 包含的value是整数

案例（同步操作一个出队列、+1、入队列操作）

入队列需要注意

import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


def fifoqueue():
    # 创建队列指定队列的元素
    queue = tf.FIFOQueue(3, tf.float32)
    # 向队列中添加元素
    en_many = queue.enqueue_many([[0.1, 0.2, 0.3], ])
    # 定义一个出列的操作
    deq_op = queue.dequeue()
    # 对于出列的对象 +1
    # 实现了运算符的重载, 如果是加号 可以将 1转换为tensor类型 并且调用 add
    incre_op = deq_op + 1
    # 让 +1的对象在重新入列
    enq_op = queue.enqueue(incre_op)
    # 必须在会话中运行op对象
    # 以下的操作都是在主线程中完成的都是同步操作
    with tf.Session() as sess:
        # 运行添加元素的op (0.1, 0.2, 0.3)
        sess.run(en_many)

        # # 完成值的处理操作
        for i in range(3):
            sess.run(enq_op)

        # # 将队列的数据取出, 将数据交给模型开始训练
        for i in range(queue.size().eval()):
            ret = sess.run(deq_op)
            print(ret)


if __name__ == '__main__':
    fifoqueue()

分析：当数据量很大时，入队操作从硬盘中读取数据，放入内存中，主线程需要等待入队操作完成，才能进行训练。会话里可以运行多个线程，实现异步读取。

队列管理器

tf.train.QueueRunner(queue, enqueue_ops=None)创建一个QueueRunner

queue：A Queue
enqueue_ops：添加线程的队列操作列表，[]*2,指定两个线

create_threads(sess, coord=None,start=False)创建线程来运行给定会话的入队操作

start：布尔值，如果True启动线程；如果为False调用者
必须调用start()启动线
coord:线程协调器，后面线程管理需要用到

异步操作

通过队列管理器来实现变量加1，入队，主线程出队列的操作，观察效果

分析：

这时候有一个问题就是，入队自顾自的去执行，在需要的出队操作完成之后，程序没法结束。需要一个实现线程间的同步，终止其他线程。

线程协调器

tf.train.Coordinator()

线程协调员,实现一个简单的机制来协调一组线程的终止
request_stop()
should_stop() 检查是否要求停止（一般不用）
join(threads=None, stop_grace_period_secs=120)等待线程终止

return:线程协调员实例

import tensorflow as tf


def async_opration():
    """
    通过队列管理器和线程协调器实现变量+1的
    :return: None
    """
    # 定义一个队列 容量1000, 类型tf.float32
    queue = tf.FIFOQueue(1000, tf.float32)
    # 完成一个自增的操作 并且入列的操作
    var = tf.Variable(0.0)
    # assign_add操作 和 enq_op 不是同步执行, assign_add操作有可能执行很多次才会执行enq_op操作
    incre_op = tf.assign_add(var, tf.constant(1.0))
    # 入列
    enq_op = queue.enqueue(incre_op)
    # 出列
    deq_op = queue.dequeue()

    # 定义队列管理器
    qr = tf.train.QueueRunner(queue=queue, enqueue_ops=[enq_op] * 2)

    init_op = tf.global_variables_initializer()
    # 通过with上下文创建的会话会自动关闭, 主线程已经执行完毕了
    # 子线程会自动停止吗? 子线程并不会退出 而是一种挂起的状态
    with tf.Session() as sess:
    # sess = tf.Session()
        sess.run(init_op)
        # 创建线程协调器
        coord = tf.train.Coordinator()
        # 通过队列管理器来创建需要执行的入列的线程
        # start为True 表示创建的线程 需要立即开启, enqueue的操作已经开始执行, 并且是两个线程在执行
        threads = qr.create_threads(sess=sess, coord=coord, start=True)
        # 入列的操作在另外一个线程执行
        for i in range(1000):
            # 主线程deq 出列
            ret = sess.run(deq_op)
            print(ret)
        # 主线程的任务执行结束
        # 应该请求结束子线程
        coord.request_stop()
        # coord.should_stop()
        # 加上线程同步
        coord.join(threads=threads)

    return None


if __name__ == '__main__':
    async_opration()

tensorflow文件读取

文件读取流程

1、文件读取API-文件队列构造

tf.train.string_input_producer(string_tensor,,shuffle=True)将输出字符串（例如文件名）输入到管道队列
string_tensor 含有文件名的1阶张量
num_epochs:过几遍数据，默认无限过数据
return:具有输出字符串的队列

2、文件读取API-文件阅读器

根据文件格式，选择对应的文件阅读器

class tf.TextLineReader

阅读文本文件逗号分隔值（CSV）格式,默认按行读取
return：读取器实例

tf.FixedLengthRecordReader(record_bytes)

要读取每个记录是固定数量字节的二进制文件
record_bytes:整型，指定每次读取的字节数
return：读取器实例

tf.TFRecordReader

读取TfRecords文件

有一个共同的读取方法：

read(file_queue)：从队列中读取指定数量内容
返回一个Tensors元组（key文件名字，value默认的内容(行或者字节或者图片)）

3、文件读取API-文件内容解码器

由于从文件中读取的是字符串，需要函数去解析这些字符串到张量

tf.decode_csv(records,record_defaults=None,field_delim = None，name = None)

将CSV转换为张量，与tf.TextLineReader搭配使用
records:tensor型字符串，每个字符串是csv中的记录行
record_defaults:指定分割后每个属性的类型，比如分割后会有三列，第二个参数就应该是[[1]，[]，['string']]，不指定类型（设为空[]）也可以。如果分割后的属性比较多，比如有100个，可以用[[ ] * 100]来表示
field_delim:默认分割符”,”

tf.decode_raw(bytes,out_type,little_endian = None，name = None)

将字节转换为一个数字向量表示，字节为一字符串类型的张量,与函数tf.FixedLengthRecordReader搭配使用，将字符串表示的二进制读取为uint8格式

开启线程操作

tf.train.start_queue_runners(sess=None,coord=None) 收集所有图中的队列线程，并启动线程

sess:所在的会话中
coord：线程协调器
return：返回所有线程队列

如果读取的文件为多个或者样本数量为多个，怎么去管道读取？

管道读端批处理

tf.train.batch(tensors,batch_size,num_threads = 1,capacity = 32,name=None)读取指定大小（个数）的张量

tensors：可以是包含张量的列表
batch_size:从队列中读取的批处理大小
num_threads：进入队列的线程数
capacity：整数，队列中元素的最大数量
return:tensors

tf.train.shuffle_batch(tensors,batch_size,capacity,min_after_dequeue, num_threads=1,)

乱序读取指定大小（个数）的张量
min_after_dequeue:留下队列里的张量个数，能够保持随机打乱

文件读取案例

import tensorflow as tf
import os


def csv_reader():
    # 获取./data/csvdata/ 路径所有的文件
    file_names = os.listdir('./csvdata/')
    file_names = [os.path.join('./csvdata/', file_name) for file_name in file_names]
    # file_names = ["./data/csvdata/" + file_name  for file_name in file_names]

    print(file_names)
    # 通过文件名创建文件队列 file_queue
    file_queue = tf.train.string_input_producer(file_names)

    # 创建文件读取器 reader 按行读取
    reader = tf.TextLineReader()

    # 通过reader对象调用read  reader.read(file_queue)
    # 返回的结果 是 key value value 指的是某一个文件的一行
    key, value = reader.read(file_queue)
    print(key, value)
    # 对value 进行decode操作
    col1, col2 = tf.decode_csv(value, record_defaults=[['null'], ['null']],field_delim=',')

    # 建立管道读的批处理
    col1_batch, col2_batch = tf.train.batch(tensors=[col1, col2],  batch_size=100, num_threads=2, capacity=10)

    with tf.Session() as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)

        ret = sess.run([col1_batch, col2_batch])
        print(ret)
        # 主线程的任务执行完毕之后, 应该请求关闭子线程
        coord.request_stop()
        coord.join(threads)


if __name__ == '__main__':
    csv_reader()

"C:\Program Files\Python36\python.exe" D:/数据分析/机器学习/day5/3-代码/day5_test.py
['./csvdata/A.csv', './csvdata/B.csv', './csvdata/C.csv']
Tensor("ReaderReadV2:0", shape=(), dtype=string) Tensor("ReaderReadV2:1", shape=(), dtype=string)
2020-01-13 22:51:39.323455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 22:51:39.324455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 22:51:39.324455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 22:51:39.325455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 22:51:39.325455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 22:51:39.326455: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
[array([b'Sea1', b'Sea2', b'Sea3', b'Alpha1', b'Alpha2', b'Alpha3',
       b'Bee1', b'Bee2', b'Bee3', b'Sea1', b'Sea2', b'Sea3', b'Alpha1',
       b'Alpha2', b'Alpha3', b'Bee1', b'Bee2', b'Bee3', b'Alpha1',
       b'Alpha2', b'Alpha3', b'Sea1', b'Sea2', b'Sea3', b'Bee1', b'Bee2',
       b'Bee3', b'Sea1', b'Sea2', b'Sea3', b'Bee1', b'Bee2', b'Bee3',
       b'Alpha1', b'Alpha2', b'Alpha3', b'Sea1', b'Sea2', b'Sea3',
       b'Bee1', b'Bee2', b'Bee3', b'Alpha1', b'Alpha2', b'Alpha3',
       b'Sea1', b'Sea2', b'Sea3', b'Alpha1', b'Alpha2', b'Alpha3',
       b'Bee1', b'Bee2', b'Bee3', b'Bee1', b'Bee2', b'Bee3', b'Sea1',
       b'Sea2', b'Sea3', b'Alpha1', b'Alpha2', b'Alpha3', b'Sea1',
       b'Sea2', b'Sea3', b'Alpha1', b'Alpha2', b'Alpha3', b'Bee1',
       b'Bee2', b'Bee3', b'Alpha1', b'Alpha2', b'Alpha3', b'Sea1',
       b'Sea2', b'Sea3', b'Bee1', b'Bee2', b'Bee3', b'Sea1', b'Sea2',
       b'Sea3', b'Bee1', b'Bee2', b'Bee3', b'Alpha1', b'Alpha2',
       b'Alpha3', b'Bee1', b'Bee2', b'Bee3', b'Sea1', b'Sea2', b'Sea3',
       b'Alpha1', b'Alpha2', b'Alpha3', b'Bee1'], dtype=object), array([b'C1', b'C2', b'C3', b'A1', b'A2', b'A3', b'B1', b'B2', b'B3',
       b'C1', b'C2', b'C3', b'A1', b'A2', b'A3', b'B1', b'B2', b'B3',
       b'A1', b'A2', b'A3', b'C1', b'C2', b'C3', b'B1', b'B2', b'B3',
       b'C1', b'C2', b'C3', b'B1', b'B2', b'B3', b'A1', b'A2', b'A3',
       b'C1', b'C2', b'C3', b'B1', b'B2', b'B3', b'A1', b'A2', b'A3',
       b'C1', b'C2', b'C3', b'A1', b'A2', b'A3', b'B1', b'B2', b'B3',
       b'B1', b'B2', b'B3', b'C1', b'C2', b'C3', b'A1', b'A2', b'A3',
       b'C1', b'C2', b'C3', b'A1', b'A2', b'A3', b'B1', b'B2', b'B3',
       b'A1', b'A2', b'A3', b'C1', b'C2', b'C3', b'B1', b'B2', b'B3',
       b'C1', b'C2', b'C3', b'B1', b'B2', b'B3', b'A1', b'A2', b'A3',
       b'B1', b'B2', b'B3', b'C1', b'C2', b'C3', b'A1', b'A2', b'A3',
       b'B1'], dtype=object)]

Process finished with exit code 0

输出结果

tensorflow图像读取

图像数字化三要素

三要素：长度、宽度、通道数

三要素与张量的关系

图像基本操作

目的：

1、增加图片数据的统一性
2、所有图片转换成指定大小
3、缩小图片数据量，防止增加开销

操作：

1、缩小图片大小

图像基本操作API

tf.image.resize_images(images, size)缩小图片
images：4-D形状[batch, height, width, channels]或3-D形状的张量[height, width, channels]的图片数据
size：1-D int32张量：new_height, new_width，图像的新尺寸返回4-D格式或者3-D格式图片

图像读取API

图像读取器

tf.WholeFileReader 将文件的全部内容作为值输出的读取器
return：读取器实例
read(file_queue):输出将是一个文件名（key）和该文件的内容(值）

图像解码器

tf.image.decode_jpeg(contents)

将JPEG编码的图像解码为uint8张量
return:uint8张量，3-D形状[height, width, channels]

tf.image.decode_png(contents)

将PNG编码的图像解码为uint8或uint16张量
return:张量类型，3-D形状[height, width, channels]

图片批处理案例流程

1、构造图片文件队列
2、构造图片阅读器
3、读取图片数据
4、处理图片数据

import tensorflow as tf
import os


def pic_reader():
    file_names = os.listdir('./dog/')
    file_names = [os.path.join('./dog/', file_name) for file_name in file_names]
    # 创建文件队列
    file_queue = tf.train.string_input_producer(file_names)
    # 创建读取器
    reader = tf.WholeFileReader()
    # key是文件名, value图片的数组数据
    key, value = reader.read(file_queue)
    # 通过解码的方式获取value的信息
    image = tf.image.decode_jpeg(value)
    # 在进行图片的批处理之前 需要讲图片的形状修改为一样的 [200, 200,?] -->  [height, width,?]
    resize_image = tf.image.resize_images(image, size=[200,200])
    # 设置图片的管道, [200, 200,?]  --> [200, 200, None]图片的形状还没有固定,可以通过set_shape
    resize_image.set_shape([200, 200, 3])
    print(resize_image)
    # 要去进行批处理的时候还需要知道图片的通道数
    image_batch = tf.train.batch(tensors=[resize_image],batch_size=100, num_threads=2,capacity=100)
    print(image_batch)
    with tf.Session() as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess, coord=coord)
        ret = sess.run(image_batch)
        print(ret)

        coord.request_stop()
        coord.join(threads)


if __name__ == '__main__':
    pic_reader()

"C:\Program Files\Python36\python.exe" D:/数据分析/机器学习/day5/3-代码/day5_test.py
Tensor("Squeeze:0", shape=(200, 200, 3), dtype=float32)
Tensor("batch:0", shape=(100, 200, 200, 3), dtype=float32)
2020-01-13 23:34:10.831393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 23:34:10.831393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 23:34:10.832393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 23:34:10.832393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 23:34:10.832393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-13 23:34:10.833393: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
[[[[ 33.          47.          86.        ]
   [ 36.725       50.725       88.235     ]
   [ 40.45        54.45        91.45      ]
   ...
   [  6.2350006    3.2350006    0.        ]
   [  6.           3.           0.        ]
   [  6.           3.           0.        ]]

  [[ 31.03        45.03        84.03      ]
   [ 33.28735     47.28735     86.265     ]
   [ 36.027348    50.027348    88.03205   ]
   ...
   [  6.988525     5.9585247    1.97      ]
   [  6.985        5.955        1.97      ]
   [  6.985        5.955        1.97      ]]

  [[ 36.82        49.85        93.7       ]
   [ 37.6097      50.639698    93.044395  ]
   [ 38.8894      52.394703    92.8494    ]
   ...
   [  9.167951     7.1979504    8.047951  ]
   [  7.9506106    5.980611     6.8306108 ]
   [  7.           5.0299997    5.88      ]]

  ...

  [[ 11.460022    14.325027    19.325027  ]
   [  5.3584476    6.089023     8.854024  ]
   [ 12.266381     7.366382     7.406382  ]
   ...
   [ 21.872694     3.8938441    2.3638453 ]
   [ 18.013        2.494126     1.5141152 ]
   [ 13.1349945    0.           0.        ]]

  [[  0.           0.           5.        ]
   [  7.151982     7.1072836    9.849935  ]
   [ 22.589865    17.673965    17.65457   ]
   ...
   [ 23.177933     5.6338344    4.089736  ]
   [ 20.740635     3.7300262    2.7194169 ]
   [ 16.359985     0.2999878    0.23999023]]

  [[  0.           0.           5.        ]
   [ 21.605       20.115       22.135     ]
   [ 50.07        44.64        42.7       ]
   ...
   [ 32.649994    14.649994    12.649994  ]
   [ 31.430038    13.430038    11.430038  ]
   [ 28.          10.           8.        ]]]


 [[[195.         194.         166.        ]
   [194.5        193.5        165.5       ]
   [193.         192.         164.        ]
   ...
   [154.         144.         108.        ]
   [154.         144.         108.        ]
   [151.         145.         111.        ]]

  [[195.         194.         166.        ]
   [194.5        193.5        165.5       ]
   [193.         192.         164.        ]
   ...
   [154.         144.         108.        ]
   [154.         144.         108.        ]
   [151.         145.         111.        ]]

  [[195.         194.         166.        ]
   [194.5        193.5        165.5       ]
   [193.         192.         164.        ]
   ...
   [155.         145.         109.        ]
   [155.         145.         109.        ]
   [152.         146.         110.52      ]]

  ...

  [[ 91.          80.          52.        ]
   [ 91.5         80.5         50.5       ]
   [ 94.          86.          50.        ]
   ...
   [ 87.69501     63.695007    35.695007  ]
   [ 92.          68.          42.        ]
   [ 94.109985    74.109985    41.109985  ]]

  [[ 89.73999     78.73999     50.73999   ]
   [ 90.5         79.5         49.5       ]
   [ 94.          86.          50.        ]
   ...
   [ 87.609985    63.609985    35.609985  ]
   [ 90.21997     66.21997     40.21997   ]
   [ 90.849976    70.849976    37.849976  ]]

  [[ 87.869995    76.869995    48.869995  ]
   [ 89.369995    78.369995    48.369995  ]
   [ 94.          86.          50.        ]
   ...
   [ 83.435       59.434998    31.434998  ]
   [ 84.869995    60.869995    34.869995  ]
   [ 87.435       67.435       34.434998  ]]]


 [[[ 22.          22.          32.        ]
   [ 33.535       32.535       39.545002  ]
   [ 10.060001    10.060001    10.080002  ]
   ...
   [118.72476    120.72476    132.72476   ]
   [132.96008    134.96008    146.96008   ]
   [123.960205   123.960205   131.9602    ]]

  [[ 60.1         60.1         70.1       ]
   [ 40.140324    39.140324    46.150326  ]
   [ 16.0429      16.034351    16.080002  ]
   ...
   [ 43.8519      46.8519      54.1419    ]
   [ 35.26266     38.26266     45.55266   ]
   [ 51.585205    50.875206    56.730206  ]]

  [[ 79.97        79.97        91.39      ]
   [ 36.090004    35.090004    42.100002  ]
   [ 18.8358      18.8258      20.8558    ]
   ...
   [ 40.415207    41.415207    45.705204  ]
   [ 32.694893    33.694893    37.984894  ]
   [ 58.731674    54.601673    58.891674  ]]

  ...

  [[100.869995    76.869995    72.869995  ]
   [119.936615    95.936615    91.936615  ]
   [167.12769    143.12769    139.12769   ]
   ...
   [149.20277    124.20277    119.20277   ]
   [152.75244    127.75244    122.75244   ]
   [127.0557     104.0557      98.0557    ]]

  [[128.51016    103.930145   101.38019   ]
   [128.22784    103.50427    101.821434  ]
   [126.68005    101.81003    100.420105  ]
   ...
   [141.86383    115.993805   111.43316   ]
   [143.90657    117.74654    114.776596  ]
   [134.34804    110.76802    105.05803   ]]

  [[143.13007    117.130066   118.130066  ]
   [123.900116    97.40511    101.39511   ]
   [126.49757     99.497574   104.497574  ]
   ...
   [158.9558     130.9558     127.47078   ]
   [156.62021    127.62021    129.62021   ]
   [137.70328    112.703285   107.703285  ]]]


 ...


 [[[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.85       143.8        138.8       ]
   ...
   [131.52493    100.099945    82.22492   ]
   [138.34996    106.39996     91.14995   ]
   [139.         107.          92.        ]]

  [[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.8775     143.8275     138.8275    ]
   ...
   [130.97493     99.54994     81.67492   ]
   [137.79996    105.84996     90.599945  ]
   [138.45       106.45        91.45      ]]

  [[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.9        143.85       138.85      ]
   ...
   [130.42493     98.99995     81.23992   ]
   [137.24995    105.299965    90.05995   ]
   [137.9        105.9         90.9       ]]

  ...

  [[116.849945   116.849945   114.849945  ]
   [117.89995    117.89995    115.89995   ]
   [118.49494    118.49494    116.49494   ]
   ...
   [157.77997    161.77997    162.77997   ]
   [162.81996    166.81996    167.81996   ]
   [163.29999    167.29999    168.29999   ]]

  [[111.89999    111.89999    109.89999   ]
   [112.94999    112.94999    110.94999   ]
   [113.37998    113.37998    111.37998   ]
   ...
   [157.94498    161.94498    162.94498   ]
   [161.82997    165.82997    166.82997   ]
   [162.2        166.2        167.2       ]]

  [[111.         111.         109.        ]
   [112.05       112.05       110.05      ]
   [112.45       112.45       110.45      ]
   ...
   [157.97498    161.97498    162.97498   ]
   [161.64998    165.64998    166.64998   ]
   [162.         166.         167.        ]]]


 [[[225.         219.         207.        ]
   [214.         208.         196.        ]
   [213.         207.         195.        ]
   ...
   [221.85016    215.85016    199.85016   ]
   [212.99002    206.99002    190.99002   ]
   [216.         210.         198.        ]]

  [[213.015      207.015      195.015     ]
   [217.97       211.97       199.97      ]
   [219.02501    213.02501    201.02501   ]
   ...
   [217.87288    211.87288    195.87288   ]
   [216.9946     210.9946     194.9946    ]
   [215.         209.         197.        ]]

  [[220.09       214.09       202.09      ]
   [216.93544    210.93544    198.93544   ]
   [214.9997     208.9997     196.9997    ]
   ...
   [221.33514    215.33514    199.33514   ]
   [219.00926    213.00926    197.00926   ]
   [214.97       208.97       196.97      ]]

  ...

  [[226.78546    226.60553    226.56055   ]
   [237.53633    237.3564     236.36595   ]
   [233.38574    233.20581    231.16083   ]
   ...
   [229.70834    217.61838    204.63339   ]
   [222.8302     214.69525    195.73933   ]
   [212.04216    200.9522     168.9522    ]]

  [[187.28864    188.28864    190.28864   ]
   [169.6994     171.6994     170.6994    ]
   [163.03944    165.03944    162.05885   ]
   ...
   [236.34972    226.83562    213.29066   ]
   [217.00072    210.00072    191.00072   ]
   [214.48495    203.48495    171.48495   ]]

  [[146.98502    171.98502    152.98502   ]
   [159.41502    174.44504    159.43503   ]
   [169.93037    177.96037    164.95036   ]
   ...
   [228.96277    227.44778    217.29794   ]
   [227.82532    218.81534    187.77542   ]
   [208.52728    203.52728    174.52728   ]]]


 [[[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.85       143.8        138.8       ]
   ...
   [131.52493    100.099945    82.22492   ]
   [138.34996    106.39996     91.14995   ]
   [139.         107.          92.        ]]

  [[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.8775     143.8275     138.8275    ]
   ...
   [130.97493     99.54994     81.67492   ]
   [137.79996    105.84996     90.599945  ]
   [138.45       106.45        91.45      ]]

  [[145.         147.         142.        ]
   [143.425      145.425      140.425     ]
   [141.9        143.85       138.85      ]
   ...
   [130.42493     98.99995     81.23992   ]
   [137.24995    105.299965    90.05995   ]
   [137.9        105.9         90.9       ]]

  ...

  [[116.849945   116.849945   114.849945  ]
   [117.89995    117.89995    115.89995   ]
   [118.49494    118.49494    116.49494   ]
   ...
   [157.77997    161.77997    162.77997   ]
   [162.81996    166.81996    167.81996   ]
   [163.29999    167.29999    168.29999   ]]

  [[111.89999    111.89999    109.89999   ]
   [112.94999    112.94999    110.94999   ]
   [113.37998    113.37998    111.37998   ]
   ...
   [157.94498    161.94498    162.94498   ]
   [161.82997    165.82997    166.82997   ]
   [162.2        166.2        167.2       ]]

  [[111.         111.         109.        ]
   [112.05       112.05       110.05      ]
   [112.45       112.45       110.45      ]
   ...
   [157.97498    161.97498    162.97498   ]
   [161.64998    165.64998    166.64998   ]
   [162.         166.         167.        ]]]]

Process finished with exit code 0

输出结果

小案例

import tensorflow as tf
import os


class Cifar(object):
    """
    读取二进制文件的演示, 将二进制文件得到的数据存储到TFRecords 并且读取TFRecords数据
    """
    def __init__(self):
        self.height = 32
        self.width = 32
        self.channels = 3 # 彩色的图片
        self.label_bytes = 1
        self.image_bytes = self.height * self.width * self.channels
        # 每次需要读取的字节大小
        self.bytes = self.label_bytes + self.image_bytes


    def read_and_decode(self, file_names):
        """
        读取并且解码二进制的图片
        :return: 将批处理的图片和标签返回
        """
        # 构建文件队列 通过文件名的列表
        file_queue = tf.train.string_input_producer(file_names)
        # 创建文件读取器 并且指定每次读取的字节大小 为 self.bytes
        reader = tf.FixedLengthRecordReader(self.bytes)
        # 读取二进制文件数据的 uint8 一个字节  3073 = 1 + 3072
        key, value = reader.read(file_queue)
        # 完成对于二进制数据的解码操作
        label_image = tf.decode_raw(value, tf.uint8)
        # 在decode image 之前需要讲读取的3073个字节分割成 1 和 3072
        # 通过切片的方式获取3073个字符串中的第一个 是一个字符串类型
        # 截取的起始位置和长度需要通过一个一阶的张量来表示
        # 将self.bytes 切分为 self.label_bytes 和 self.image_bytes
        label = tf.cast(tf.slice(label_image, [0], [self.label_bytes]),tf.int32)

        image = tf.cast(tf.slice(label_image, [self.label_bytes], [self.image_bytes]), tf.int32)
        # 图片的字节为3072, 需要将这个3072个字节的形状重新设置 [32 * 32 * 3] ---> 就是图片的张量
        reshape_image = tf.reshape(image, shape=[32, 32, 3])
        # 由于读取的图片的形状都是一样的 就不需要做resize处理 就可以直接进行批处理的操作
        image_batch, label_batch = tf.train.batch([reshape_image, label], batch_size=100, num_threads=2, capacity=100)

        return image_batch, label_batch


    def save_to_tfrecords(self):
        """
        将读取的图片数据存储为tfrecord格式的文件
        :return: 
        """


        return None


    def read_from_tfrecords(self):
        """
        从tfrecords格式的文件读取对应的数据
        :return: 
        """

        return None


def call_cifar():
    # 获取某一个路径下的文件名
    file_names = os.listdir('./cifar-10-batches-bin/')
    file_names = [os.path.join('./cifar-10-batches-bin/', file_name) for file_name in file_names if file_name[-3:] == 'bin']
    # 创建对象 调用对象方法
    cifar = Cifar()
    batch_image, batch_label = cifar.read_and_decode(file_names)
    print("===========")
    print(batch_image, batch_label)
    # # 运行已经设定好的图
    with tf.Session() as sess:
        # 开启子线程执行
        coord = tf.train.Coordinator() # 线程协调器
        threads = tf.train.start_queue_runners(sess, coord=coord)
        ret = sess.run([batch_image, batch_label])
        print(ret)
        coord.request_stop()
        coord.join(threads)


if __name__ == '__main__':
    call_cifar()

输出结果

图片存储，计算的类型

存储：uint8(节约空间)

矩阵计算：float32(提高精度)

TFRecords分析、存取

TFRecords是Tensorflow设计的一种内置文件格式，是一种二进制文件
它能更好的利用内存，更方便复制和移动为了将二进制数据和标签(训练的类别标签)数据存储在同一个文件中

TFRecords存储

1、建立TFRecord存储器

tf.python_io.TFRecordWriter(path),写入tfrecords文件

path: TFRecords文件的路径
return：写文件

method

write(record):向文件中写入一个字符串记录
close():关闭文件写入器

注：字符串为一个序列化的Example,Example.SerializeToString()

2、构造每个样本的Example协议块

tf.train.Example(features=None) 写入tfrecords文件

features:tf.train.Features类型的特征实例
return：example格式协议块

tf.train.Features(feature=None)

构建每个样本的信息键值对feature:字典数据,key为要保存的名字，

value为tf.train.Feature实例
return:Features类型
tf.train.Feature(**options)

**options：例如

bytes_list=tf.train. BytesList(value=[Bytes])
int64_list=tf.train. Int64List(value=[Value])
tf.train. Int64List(value=[Value])
tf.train. BytesList(value=[Bytes])
tf.train. FloatList(value=[value])

TFRecords读取方法

同文件阅读器流程，中间需要解析过程

解析TFRecords的example协议内存块

tf.parse_single_example(serialized,features=None,name=None) 解析一个单一的Example原型

serialized：标量字符串Tensor，一个序列化的Example
features：dict字典数据，键为读取的名字，值为FixedLenFeature
return:一个键值对组成的字典，键为读取的名字
tf.FixedLenFeature(shape,dtype)
shape：输入数据的形状，一般不指定,为空列表
dtype：输入数据类型，与存储进文件的类型要一致类型只能是float32,int64,string

CIFAR-10 批处理结果存入tfrecords流程

1、构造存储器
2、构造每一个样本的Example
3、写入序列化的Example

读取tfrecords流程

1、构造TFRecords阅读器
2、解析Example
3、转换格式，bytes解码

"""
读取二进制文件转换成张量，写进TFRecords,同时读取TFRcords
"""
import tensorflow as tf

# 命令行参数
FLAGS = tf.app.flags.FLAGS  # 获取值
tf.app.flags.DEFINE_string("tfrecord_dir", "cifar10.tfrecords", "写入图片数据文件的文件名")

# 读取二进制转换文件
class CifarRead(object):
    """
    读取二进制文件转换成张量，写进TFRecords,同时读取TFRcords
    """

    def __init__(self, file_list):
        """
        初始化图片参数
        :param file_list:图片的路径名称列表
        """

        # 文件列表
        self.file_list = file_list

        # 图片大小，二进制文件字节数
        self.height = 32
        self.width = 32
        self.channel = 3
        self.label_bytes = 1
        self.image_bytes = self.height * self.width * self.channel
        self.bytes = self.label_bytes + self.image_bytes

    def read_and_decode(self):
        """
        解析二进制文件到张量
        :return: 批处理的image,label张量
        """
        # 1.构造文件队列
        file_queue = tf.train.string_input_producer(self.file_list)

        # 2.阅读器读取内容
        reader = tf.FixedLengthRecordReader(self.bytes)

        key, value = reader.read(file_queue)  # key为文件名，value为元组

        print(value)

        # 3.进行解码，处理格式
        label_image = tf.decode_raw(value, tf.uint8)
        print(label_image)

        # 处理格式，image，label
        # 进行切片处理，标签值
        # tf.cast()函数是转换数据格式，此处是将label二进制数据转换成int32格式
        label = tf.cast(tf.slice(label_image, [0], [self.label_bytes]), tf.int32)

        # 处理图片数据
        image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])
        print(image)

        # 处理图片的形状，提供给批处理
        # 因为image的形状已经固定，此处形状用动态形状来改变
        image_tensor = tf.reshape(image, [self.height, self.width, self.channel])
        print(image_tensor)

        # 批处理图片数据
        image_batch, label_batch = tf.train.batch([image_tensor, label], batch_size=10, num_threads=1, capacity=10)

        return image_batch, label_batch

    def write_to_tfrecords(self, image_batch, label_batch):
        """
        将文件写入到TFRecords文件中
        :param image_batch:
        :param label_batch:
        :return:
        """

        # 建立TFRecords文件存储器
        writer = tf.python_io.TFRecordWriter('cifar10.tfrecords')  # 传进去命令行参数

        # 循环取出每个样本的值，构造example协议块
        for i in range(10):
            # 取出图片的值，  #写进去的是值，而不是tensor类型，
            # 写入example需要bytes文件格式，将tensor转化为bytes用tostring()来转化
            image = image_batch[i].eval().tostring()

            # 取出标签值，写入example中需要使用int形式，所以需要强制转换int
            label = int(label_batch[i].eval()[0])

            # 构造每个样本的example协议块
            example = tf.train.Example(features=tf.train.Features(feature={
                "image": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])),
                "label": tf.train.Feature(int64_list=tf.train.Int64List(value=[label]))
            }))

            # 写进去序列化后的值
            writer.write(example.SerializeToString())  # 此处其实是将其压缩成一个二进制数据

        writer.close()

        return None

    def read_from_tfrecords(self):
        """
        从TFRecords文件当中读取图片数据（解析example)
        :param self:
        :return: image_batch,label_batch
        """

        # 1.构造文件队列
        file_queue = tf.train.string_input_producer(['cifar10.tfrecords'])  # 参数为文件名列表

        # 2.构造阅读器
        reader = tf.TFRecordReader()

        key, value = reader.read(file_queue)

        # 3.解析协议块,返回的值是字典
        feature = tf.parse_single_example(value, features={
            "image": tf.FixedLenFeature([], tf.string),
            "label": tf.FixedLenFeature([], tf.int64)
        })

        # feature["image"],feature["label"]
        # 处理标签数据    ，cast()只能在int和float之间进行转换
        label = tf.cast(feature["label"], tf.int32)  # 将数据类型int64 转换为int32

        # 处理图片数据，由于是一个string,要进行解码，  #将字节转换为数字向量表示，字节为一字符串类型的张量
        # 如果之前用了tostring(),那么必须要用decode_raw()转换为最初的int类型
        # decode_raw()可以将数据从string,bytes转换为int，float类型的
        image = tf.decode_raw(feature["image"], tf.uint8)

        # 转换图片的形状，此处需要用动态形状进行转换
        image_tensor = tf.reshape(image, [self.height, self.width, self.channel])

        # 4.批处理
        image_batch, label_batch = tf.train.batch([image_tensor, label], batch_size=10, num_threads=1, capacity=10)

        return image_batch, label_batch


if __name__ == '__main__':
    # 找到文件路径，名字，构造路径+文件名的列表,"A.csv"...
    # os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表
    import os
    file_names = os.listdir('./cifar-10-batches-bin/')
    file_list = [os.path.join('./cifar-10-batches-bin/', file_name) for file_name in file_names if file_name[-3:] == 'bin']

    # 初始化参数
    cr = CifarRead(file_list)

    # 读取二进制文件
    # image_batch, label_batch = cr.read_and_decode()

    # 从已经存储的TFRecords文件中解析出原始数据
    image_batch, label_batch = cr.read_from_tfrecords()

    with tf.Session() as sess:
        # 线程协调器
        coord = tf.train.Coordinator()

        # 开启线程
        threads = tf.train.start_queue_runners(sess, coord=coord)

        print(sess.run([image_batch, label_batch]))

        print("存进TFRecords文件")
        cr.write_to_tfrecords(image_batch,label_batch)
        print("存进文件完毕")

        # 回收线程
        coord.request_stop()
        coord.join(threads)

输出结果如下

"C:\Program Files\Python36\python.exe" D:/数据分析/机器学习/day5/3-代码/tet.py
2020-01-22 17:51:59.917717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2020-01-22 17:51:59.918717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-22 17:51:59.918717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-22 17:51:59.918717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-22 17:51:59.919717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2020-01-22 17:51:59.919717: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
[array([[[[115, 118, 121],
         [122, 124, 126],
         [129, 133, 136],
         ...,
         [156, 155, 155],
         [153, 147, 144],
         [143, 140, 139]],

        [[125, 125, 127],
         [130, 132, 135],
         [137, 140, 143],
         ...,
         [164, 164, 163],
         [161, 156, 153],
         [150, 148, 149]],

        [[135, 136, 138],
         [141, 143, 145],
         [147, 149, 151],
         ...,
         [174, 174, 174],
         [175, 177, 174],
         [167, 165, 167]],

        ...,

        [[112, 113, 113],
         [112, 113, 114],
         [114, 116, 116],
         ...,
         [188, 188, 185],
         [174, 140, 120],
         [111, 106, 107]],

        [[102, 105, 109],
         [111, 114, 116],
         [118, 118, 116],
         ...,
         [183, 150, 121],
         [115, 113, 110],
         [108, 103,  83]],

        [[109, 107, 106],
         [103, 100,  99],
         [102, 104, 111],
         ...,
         [119, 117, 111],
         [108, 109, 108],
         [ 91,  77,  70]]],


       [[[ 71,  70,  72],
         [ 77,  78,  78],
         [ 81,  85,  87],
         ...,
         [108, 114, 117],
         [107, 103, 108],
         [108, 101,  99]],

        [[ 93,  92, 106],
         [154, 168, 154],
         [173, 191, 194],
         ...,
         [114, 111, 111],
         [113, 115, 136],
         [187, 203, 163]],

        [[159, 163, 156],
         [175, 179, 179],
         [182, 178, 181],
         ...,
         [122, 123, 138],
         [164, 163, 158],
         [169, 155, 126]],

        ...,

        [[143, 106,  63],
         [ 43,  11,  17],
         [ 28,  38,  46],
         ...,
         [119, 129, 138],
         [150, 159, 155],
         [159, 167, 163]],

        [[122,  87,  86],
         [ 92,  66,  83],
         [112, 115, 123],
         ...,
         [ 95,  80,  75],
         [ 80,  87,  77],
         [ 94, 120, 142]],

        [[145, 144, 146],
         [151, 156, 157],
         [155, 159, 165],
         ...,
         [154, 155, 152],
         [147, 128, 130],
         [152, 155, 158]]],


       [[[ 55,  78,  90],
         [ 82,  75,  81],
         [105, 116, 135],
         ...,
         [154, 156, 175],
         [186, 196, 199],
         [204, 206, 201]],

        [[141, 136, 126],
         [126, 122,  94],
         [ 86,  86,  91],
         ...,
         [112, 129, 143],
         [185, 169, 141],
         [155, 178, 181]],

        [[118,  61,  79],
         [108, 129, 158],
         [128, 110, 122],
         ...,
         [147, 132, 124],
         [115,  96, 113],
         [112,  62,  97]],

        ...,

        [[ 93,  82,  98],
         [ 94,  94, 105],
         [118, 115,  98],
         ...,
         [ 71,  74,  77],
         [ 64,  71,  66],
         [ 62,  48,  41]],

        [[117, 110, 121],
         [123, 121,  99],
         [ 75,  48,  65],
         ...,
         [125, 107, 106],
         [104,  86,  86],
         [ 75,  50,  50]],

        [[100, 114, 153],
         [171, 146, 130],
         [ 87,  62,  70],
         ...,
         [186, 190, 197],
         [177, 180, 192],
         [184, 174, 172]]],


       ...,


       [[[176, 173, 167],
         [183, 223, 174],
         [181, 168, 168],
         ...,
         [129,  90, 125],
         [139, 139, 131],
         [116, 103, 113]],

        [[160, 162, 149],
         [156, 169, 150],
         [145, 141, 131],
         ...,
         [153, 137, 100],
         [127, 191, 154],
         [131, 129, 122]],

        [[144, 162, 165],
         [191, 204, 192],
         [191, 173, 168],
         ...,
         [151, 127,  59],
         [ 78, 167, 123],
         [ 86,  87,  75]],

        ...,

        [[139, 156, 164],
         [157, 149, 146],
         [144, 133, 119],
         ...,
         [122, 118, 121],
         [118, 118, 117],
         [116, 119, 122]],

        [[182, 149, 136],
         [134, 141, 143],
         [145, 140, 143],
         ...,
         [141, 154, 139],
         [137, 132, 132],
         [130, 128, 123]],

        [[127, 126, 143],
         [185, 213, 207],
         [168, 146, 144],
         ...,
         [136, 137, 126],
         [125, 121, 123],
         [125, 117, 115]]],


       [[[ 36,  41,  36],
         [ 33,  37,  35],
         [ 36,  44,  61],
         ...,
         [ 47,  44,  40],
         [ 36,  36,  30],
         [ 31,  26,  26]],

        [[ 29,  23,  27],
         [ 30,  30,  41],
         [ 46,  35,  33],
         ...,
         [ 60,  45,  41],
         [ 44,  35,  32],
         [ 31,  36,  36]],

        [[ 50,  49,  34],
         [ 29,  30,  44],
         [ 58,  59,  61],
         ...,
         [ 59,  53,  68],
         [ 49,  47,  55],
         [ 86,  46,  31]],

        ...,

        [[ 29,  64,  96],
         [131, 138, 130],
         [159, 182, 129],
         ...,
         [ 67,  61,  57],
         [ 28,  12,  32],
         [ 39,  32,  32]],

        [[  8,  26,  40],
         [ 82, 112, 132],
         [164, 175, 134],
         ...,
         [ 66,  21,   2],
         [  2,   3,  21],
         [ 25,  27,  34]],

        [[  9,  23,  23],
         [ 42,  72, 118],
         [150, 136, 132],
         ...,
         [  5,   3,   2],
         [  3,   9,  16],
         [ 17,  25,  23]]],


       [[[255, 253, 253],
         [254, 254, 253],
         [253, 253, 253],
         ...,
         [253, 251, 251],
         [251, 251, 251],
         [251, 251, 253]],

        [[255, 253, 227],
         [214, 209, 199],
         [198, 199, 199],
         ...,
         [ 22,  15,   1],
         [  2,   1,   0],
         [ 57, 158, 213]],

        [[255, 249, 137],
         [ 37,  36,  29],
         [ 43,  39,  25],
         ...,
         [  8,  10,  19],
         [  9,  10,   3],
         [ 45, 104, 187]],

        ...,

        [[255, 248, 132],
         [ 31,  30,  26],
         [ 27,  20,  16],
         ...,
         [ 30,  28,  35],
         [ 23,  19,  24],
         [ 51,  94, 184]],

        [[255, 247, 129],
         [ 23,  28,  22],
         [ 18,  24,  27],
         ...,
         [ 85,  90,  78],
         [ 68,  71,  73],
         [ 86, 122, 198]],

        [[255, 248, 223],
         [161, 116, 102],
         [100, 101, 102],
         ...,
         [222, 222, 222],
         [222, 222, 222],
         [224, 234, 246]]]], dtype=uint8), array([0, 7, 4, 2, 5, 3, 0, 4, 1, 3])]
存进TFRecords文件

View Code

posted @ 2020-01-12 23:22 Crazymagic 阅读(236) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Crazymagic

机器学习 Tensorflow 线程队列与IO操作

Tensorflow队列

tf.FIFOQueue

队列管理器

tensorflow文件读取

管道读端批处理

tensorflow图像读取

图片存储，计算的类型

公告