Module - Neural network training and inference
API。训练神经网络需要很多步骤。需要指定如何输入训练数据、初始化模型参数、在网络中执行向前和向后传递、基于计算的梯度更新权重、执行模型检查点等。在预测过程中,最终会重复这些步骤中的大多数步骤。对于新手和有经验的开发人员来说,所有这些都是非常令人望而生畏的。
幸运的是,MXNet在module(简称mod)包中模块化了用于训练和推理的常用代码。Module提供了用于执行预定义网络的高级和中级接口。两个接口都可以互换使用。在本教程中,我们将展示这两个接口的用法。
Preliminary
首先是一个初级用法demo:
import logging import random logging.getLogger().setLevel(logging.INFO) import mxnet as mx import numpy as np mx.random.seed(1234) np.random.seed(1234) random.seed(1234) fname = mx.test_utils.download('https://s3.us-east-2.amazonaws.com/mxnet-public/letter_recognition/letter-recognition.data') data = np.genfromtxt(fname, delimiter=',')[:,1:] label = np.array([ord(l.split(',')[0])-ord('A') for l in open(fname, 'r')]) batch_size = 32 ntrain = int(data.shape[0]*0.8) train_iter = mx.io.NDArrayIter(data[:ntrain, :], label[:ntrain], batch_size, shuffle=True) val_iter = mx.io.NDArrayIter(data[ntrain:, :], label[ntrain:], batch_size)
网络定义,利用Symbol:
net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net, node_attrs={"shape":"oval","fixedsize":"false"})
Creating a Module
利用Module类来引入模块,可以通过指定一下参数来构建:
- symbol:网络定义
- context:用于执行的设备(或设备列表)
- data_names:输入data变量名称的列表
- label_names:输入label变量名的列表
对于上面我们定义好的net,仅仅有一个data,且名字为data,仅有一个label被自动命名为softmax_label,这是根据我们在softmaxoutput里的名字softmax来决定的。
mod = mx.mod.Module(symbol=net, context=mx.cpu(), data_names=['data'], label_names=['softmax_label'])
Intermediate-level Interface
我们已经创建了模块。现在让我们看看如何使用模块的中级api运行训练和推理。这些api通过向前和向后运行传递,使开发人员能够灵活地执行逐步计算。它对调试也很有用。为了训练一个module,需要实现以下几个步骤:
- bind:通过分配内存为计算准备环境。
- init_params:分配和初始化参数。
- init_optimizer:初始化优化器。默认为sgd。
- metric.create:从输入度量名称创建计算度量。
- forward:前进计算。
- update_metric:计算并累积最后一次正向计算的输出的计算度量。
- backward:反向计算。
- update:根据已安装的优化器和上一个前向后批处理中计算的梯度更新参数。
具体实现如下:
# allocate memory given the input data and label shapes mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label) # initialize parameters by uniform random numbers mod.init_params(initializer=mx.init.Uniform(scale=.1)) # mxnet.initializer # use SGD with learning rate 0.1 to train mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.1), )) # mxnet.optimiazer # use accuracy as the metric metric = mx.metric.create('acc') # mxnet.mxtric # train 5 epochs, i.e. going over the data iter one pass for epoch in range(5): train_iter.reset() # 重新迭代 metric.reset() # 重新评估 for batch in train_iter: mod.forward(batch, is_train=True) # compute predictions 前向 mod.update_metric(metric, batch.label) # accumulate prediction accuracy 计算评估指标 mod.backward() # compute gradients # 反向 mod.update() # update parameters # 更新参数 print('Epoch %d, Training %s' % (epoch, metric.get()))
注意module.bind和symbol.bind不一样。
注意:mxnet里有很多缩写: mx.symbol=mx.sys; mx.initializer=mx.init; mx.module=mx.mod
Epoch 0, Training ('accuracy', 0.434625) Epoch 1, Training ('accuracy', 0.6516875) Epoch 2, Training ('accuracy', 0.6968125) Epoch 3, Training ('accuracy', 0.7273125) Epoch 4, Training ('accuracy', 0.7575625)
High-level Interface
上一节利用的是中级API,所以步骤比较多,这一节利用高阶API中的 fit 函数来实现。
1. 训练
# reset train_iter to the beginning 重置迭代器 train_iter.reset() # create a module mod = mx.mod.Module(symbol=net, # 建立module context=mx.cpu(), data_names=['data'], label_names=['softmax_label']) # fit the module # 训练 mod.fit(train_iter, eval_data=val_iter, optimizer='sgd', optimizer_params={'learning_rate':0.1}, eval_metric='acc', num_epoch=7)
INFO:root:Epoch[0] Train-accuracy=0.325437 INFO:root:Epoch[0] Time cost=0.550 INFO:root:Epoch[0] Validation-accuracy=0.568500 INFO:root:Epoch[1] Train-accuracy=0.622188 INFO:root:Epoch[1] Time cost=0.552 INFO:root:Epoch[1] Validation-accuracy=0.656500 INFO:root:Epoch[2] Train-accuracy=0.694375 INFO:root:Epoch[2] Time cost=0.566 INFO:root:Epoch[2] Validation-accuracy=0.703500 INFO:root:Epoch[3] Train-accuracy=0.732187 INFO:root:Epoch[3] Time cost=0.562 INFO:root:Epoch[3] Validation-accuracy=0.748750 INFO:root:Epoch[4] Train-accuracy=0.755375 INFO:root:Epoch[4] Time cost=0.484 INFO:root:Epoch[4] Validation-accuracy=0.761500 INFO:root:Epoch[5] Train-accuracy=0.773188 INFO:root:Epoch[5] Time cost=0.383 INFO:root:Epoch[5] Validation-accuracy=0.715000 INFO:root:Epoch[6] Train-accuracy=0.794687 INFO:root:Epoch[6] Time cost=0.378 INFO:root:Epoch[6] Validation-accuracy=0.802250
默认情况下,fit中的参数是这样的:
eval_metric
set to accuracy
, optimizer
to sgd
and optimizer_params to (('learning_rate', 0.01),)
.
y = mod.predict(val_iter) assert y.shape == (4000, 26)
有时候我们并不关心具体预测值是多少,只想知道在测试集上的指标如何,那么就可以调用score()函数来实现。它将会根据你提供的metric来评估:
score = mod.score(val_iter, ['acc']) print("Accuracy score is %f" % (score[0][1])) assert score[0][1] > 0.76, "Achieved accuracy (%f) is less than expected (0.76)" % score[0][1]
Accuracy score is 0.802250
当然也可以利用其他指标:top_k_acc
(top-k-accuracy), F1
, RMSE
, MSE
, MAE
, ce
(CrossEntropy). 更多指标参见 Evaluation metric.
# construct a callback function to save checkpoints model_prefix = 'mx_mlp' checkpoint = mx.callback.do_checkpoint(model_prefix) mod = mx.mod.Module(symbol=net) mod.fit(train_iter, num_epoch=5, epoch_end_callback=checkpoint) # 写到epoch_end_callback里就会每个epoch保存一次
INFO:root:Epoch[0] Train-accuracy=0.098437 INFO:root:Epoch[0] Time cost=0.421 INFO:root:Saved checkpoint to "mx_mlp-0001.params" INFO:root:Epoch[1] Train-accuracy=0.257437 INFO:root:Epoch[1] Time cost=0.520 INFO:root:Saved checkpoint to "mx_mlp-0002.params" INFO:root:Epoch[2] Train-accuracy=0.457250 INFO:root:Epoch[2] Time cost=0.562 INFO:root:Saved checkpoint to "mx_mlp-0003.params" INFO:root:Epoch[3] Train-accuracy=0.558187 INFO:root:Epoch[3] Time cost=0.434 INFO:root:Saved checkpoint to "mx_mlp-0004.params" INFO:root:Epoch[4] Train-accuracy=0.617750 INFO:root:Epoch[4] Time cost=0.414 INFO:root:Saved checkpoint to "mx_mlp-0005.params"
为了载入模型参数,可以调用load_checkpoint函数,它会将Symbol和相关的参数load进来,然后可以将载好的参数送入module中:
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 3)
assert sym.tojson() == net.tojson()
# assign the loaded parameters to the module
mod.set_params(arg_params, aux_params)
如果只想导入一个保存好的模型来继续训练,就可以不用set_params()了,而直接在fit()里面传递参数。此时fit就知道了你想加载一个已有的参数而非随机初始化参数从头训练。此时也可以再设置一下begin_epoch,表明我们是接着从某个epoch训练。
mod = mx.mod.Module(symbol=sym) mod.fit(train_iter, num_epoch=21, arg_params=arg_params, aux_params=aux_params, begin_epoch=3) assert score[0][1] > 0.77, "Achieved accuracy (%f) is less than expected (0.77)" % score[0][1]
INFO:root:Epoch[3] Train-accuracy=0.555438 INFO:root:Epoch[3] Time cost=0.377 INFO:root:Epoch[4] Train-accuracy=0.616625 INFO:root:Epoch[4] Time cost=0.457 INFO:root:Epoch[5] Train-accuracy=0.658438 INFO:root:Epoch[5] Time cost=0.518 ........................................... INFO:root:Epoch[18] Train-accuracy=0.788687 INFO:root:Epoch[18] Time cost=0.532 INFO:root:Epoch[19] Train-accuracy=0.789562 INFO:root:Epoch[19] Time cost=0.531 INFO:root:Epoch[20] Train-accuracy=0.796250 INFO:root:Epoch[20] Time cost=0.531