tensorflow 批次读取文件内的数据,并将顺序随机化处理. --[python]
使用tensorflow批次的读取预处理之后的文本数据,并将其分为一个迭代器批次:
比如此刻,我有一个处理之后的数据包: data.csv shape =(8,10),其中这个结构中,前五个列为feature , 后五列为label
1,2,3,4,5,6,7,8,9,10 11,12,13,14,15,16,17,18,19,20 21,22,23,24,25,26,27,28,29,30 31,32,33,34,35,36,37,38,39,40 41,42,43,44,45,46,47,48,49,50 51,52,53,54,55,56,57,58,59,60 1,1,1,1,1,2,2,2,2,2 3,3,3,3,3,4,4,4,4,4
现在我需要将其分为4个批次: 也就是每个批次batch的大小为2
然后我可能需要将其顺序打乱,所以这里提供了两种方式,顺序和随机
#!/usr/bin/env python # -*- coding: utf-8 -*- __author__ = 'xijun1' import tensorflow as tf import numpy as np # data = np.arange(1, 100 + 1) # print ",".join( [str(i) for i in data]) # data_input = tf.constant(data) filename_queue = tf.train.string_input_producer(["data.csv"]) reader = tf.TextLineReader(skip_header_lines=0) key, value = reader.read(filename_queue) # decode_csv will convert a Tensor from type string (the text line) in # a tuple of tensor columns with the specified defaults, which also # sets the data type for each column words_size = 5 # 每一行数据的长度 decoded = tf.decode_csv( value, field_delim=',', record_defaults=[[0] for i in range(words_size * 2)]) batch_size = 2 # 每一个批次的大小 # 随机 batch_shuffle = tf.train.shuffle_batch(decoded, batch_size=batch_size, capacity=batch_size * words_size, min_after_dequeue=batch_size) #顺序 batch_no_shuffle = tf.train.batch(decoded, batch_size=batch_size, capacity=batch_size * words_size, allow_smaller_final_batch=batch_size) shuffle_features = tf.transpose(tf.stack(batch_shuffle[0:words_size])) shuffle_label = tf.transpose(tf.stack(batch_shuffle[words_size:])) features = tf.transpose(tf.stack(batch_no_shuffle[0:words_size])) label = tf.transpose(tf.stack(batch_no_shuffle[words_size:])) with tf.Session() as sess: coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(coord=coord) for i in range(8/batch_size): print (i+10, sess.run([shuffle_features, shuffle_label])) print (i, sess.run([features, label])) coord.request_stop() coord.join(threads)
当我们运行的时候,我们可以得到这个结果:
(10, [array([[ 1, 2, 3, 4, 5], [31, 32, 33, 34, 35]], dtype=int32), array([[ 6, 7, 8, 9, 10], [36, 37, 38, 39, 40]], dtype=int32)]) (0, [array([[11, 12, 13, 14, 15], [21, 22, 23, 24, 25]], dtype=int32), array([[16, 17, 18, 19, 20], [26, 27, 28, 29, 30]], dtype=int32)]) (11, [array([[51, 52, 53, 54, 55], [ 3, 3, 3, 3, 3]], dtype=int32), array([[56, 57, 58, 59, 60], [ 4, 4, 4, 4, 4]], dtype=int32)]) (1, [array([[41, 42, 43, 44, 45], [ 1, 1, 1, 1, 1]], dtype=int32), array([[46, 47, 48, 49, 50], [ 2, 2, 2, 2, 2]], dtype=int32)]) (12, [array([[ 3, 3, 3, 3, 3], [11, 12, 13, 14, 15]], dtype=int32), array([[ 4, 4, 4, 4, 4], [16, 17, 18, 19, 20]], dtype=int32)]) (2, [array([[ 1, 2, 3, 4, 5], [21, 22, 23, 24, 25]], dtype=int32), array([[ 6, 7, 8, 9, 10], [26, 27, 28, 29, 30]], dtype=int32)]) (13, [array([[31, 32, 33, 34, 35], [ 1, 1, 1, 1, 1]], dtype=int32), array([[36, 37, 38, 39, 40], [ 2, 2, 2, 2, 2]], dtype=int32)]) (3, [array([[41, 42, 43, 44, 45], [ 1, 1, 1, 1, 1]], dtype=int32), array([[46, 47, 48, 49, 50], [ 2, 2, 2, 2, 2]], dtype=int32)])
编程是一种快乐,享受代码带给我的乐趣!!!
分类:
tensorflow
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库