tensorflow分布式训练的学习
参考这篇文章:
https://zhuanlan.zhihu.com/p/41473323
参考下面的资料:
Tensorflow在1.4版本中引入了tf.estimator.train_and_evaluate
函数,用来替换老版中Experiment类提供的功能。tf.estimator.train_and_evaluate
简化了训练、评估和导出Estimator模型的过程,抽象了模型分布式训练和评估的细节,使得同样的代码在本地与分布式集群上的行为一致。
本文简要介绍如何自定义Estimator模型并通过使用tf.estimator.train_and_evaluate
完成训练和评估。
主要步骤:
- 构建自己的Estimator模型
- 定义在训练和测试过程中数据如何输入给模型
- 定义传递给
tf.estimator.train_and_evaluate
函数的训练、评估和导出的详述参数(TrainSpec and EvalSpec) - 使用
tf.estimator.train_and_evaluate
训练并评估模型
我们使用的数据集是《DBPedia Ontology Classification Dataset》(可点击下载),是从数据集DBpedia 2014中挑选的14个类别的互不重叠的本体(Company, EducationalInstitution, Artist, Athlete, OfficeHolder, MeanOfTransportation, Building, NaturalPlace, Village, Animal, Plant, Album, Film, WrittenWork),每个本体类别随机选择了40,000 个训练样本和5,000个测试样本。因此,总共有560,000个训练样本和70,000个测试样本。
模型代码
def my_model(features, labels, mode, params): sentence = features['sentence'] # Get word embeddings for each token in the sentence embeddings = tf.get_variable(name="embeddings", dtype=tf.float32, shape=[params["vocab_size"], FLAGS.embedding_size]) sentence = tf.nn.embedding_lookup(embeddings, sentence) # shape:(batch, sentence_len, embedding_size) # add a channel dim, required by the conv2d and max_pooling2d method sentence = tf.expand_dims(sentence, -1) # shape:(batch, sentence_len/height, embedding_size/width, channels=1) pooled_outputs = [] for filter_size in params["filter_sizes"]: conv = tf.layers.conv2d( sentence, filters=FLAGS.num_filters, kernel_size=[filter_size, FLAGS.embedding_size], strides=(1, 1), padding="VALID", activation=tf.nn.relu) pool = tf.layers.max_pooling2d( conv, pool_size=[FLAGS.sentence_max_len - filter_size + 1, 1], strides=(1, 1), padding="VALID") pooled_outputs.append(pool) h_pool = tf.concat(pooled_outputs, 3) # shape: (batch, 1, len(filter_size) * embedding_size, 1) h_pool_flat = tf.reshape(h_pool, [-1, FLAGS.num_filters * len(params["filter_sizes"])]) # shape: (batch, len(filter_size) * embedding_size) if 'dropout_rate' in params and params['dropout_rate'] > 0.0: h_pool_flat = tf.layers.dropout(h_pool_flat, params['dropout_rate'], training=(mode == tf.estimator.ModeKeys.TRAIN)) logits = tf.layers.dense(h_pool_flat, FLAGS.num_classes, activation=None) optimizer = tf.train.AdagradOptimizer(learning_rate=params['learning_rate']) def _train_op_fn(loss): return optimizer.minimize(loss, global_step=tf.train.get_global_step()) my_head = tf.contrib.estimator.multi_class_head(n_classes=FLAGS.num_classes) return my_head.create_estimator_spec( features=features, mode=mode, labels=labels, logits=logits, train_op_fn=_train_op_fn )
dataset
def input_fn(path_csv, path_vocab, shuffle_buffer_size, num_oov_buckets): vocab = tf.contrib.lookup.index_table_from_file(path_vocab, num_oov_buckets=num_oov_buckets) # Load csv file, one example per line dataset = tf.data.TextLineDataset(path_csv) # Convert line into list of tokens, splitting by white space; then convert each token to an unique id dataset = dataset.map(lambda line: parse_line(line, vocab)) if shuffle_buffer_size > 0: dataset = dataset.shuffle(shuffle_buffer_size).repeat() dataset = dataset.batch(FLAGS.batch_size).prefetch(1) return dataset
训练
classifier = tf.estimator.Estimator( model_fn=my_model, params={ 'vocab_size': config["vocab_size"], 'filter_sizes': map(int, FLAGS.filter_sizes.split(',')), 'learning_rate': FLAGS.learning_rate, 'dropout_rate': FLAGS.dropout_rate }, config=tf.estimator.RunConfig(model_dir=FLAGS.model_dir, save_checkpoints_steps=FLAGS.save_checkpoints_steps) ) train_spec = tf.estimator.TrainSpec( input_fn=lambda: input_fn(path_train, path_words, FLAGS.shuffle_buffer_size, config["num_oov_buckets"]), max_steps=FLAGS.train_steps ) input_fn_for_eval = lambda: input_fn(path_eval, path_words, 0, config["num_oov_buckets"]) eval_spec = tf.estimator.EvalSpec(input_fn=input_fn_for_eval, throttle_secs=300) tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
另外看了这篇文章,还没有看完:
https://zhuanlan.zhihu.com/p/41663141
构建分布式Tensorflow模型系列:特征工程
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
2019-03-15 python中解码unicode字符串的好方法
2019-03-15 flink java程序进行调试的经验方法
2018-03-15 【在线推荐】FTRL BPR online learning算法 - 觉得这篇讲的很不错 - 值得好好学习
2018-03-15 正态分布的前世今生 - 学习笔记 - 中心极限定理 - 泊松分布
2018-03-15 正则L1 L2 和 分布 先验的关系 - L1 优缺点
2018-03-15 【概率】MLE,MAP,Bayse估计,我觉得这一篇讲的非常好,是最清楚的
2017-03-15 今天学到一个生成单例类的方法