cs224d 作业 problem set2 (三) 用RNNLM模型实现Language Model,来预测下一个单词的出现
今天将的还是cs224d 的problem set2 的第三部分习题,
原来国外大学的系统难度真的如此之大,相比之下还是默默地再天朝继续搬砖吧
下面讲述一下RNN语言建模的数学公式:
给出一串连续的词x1,x2...xt关于预测其后面紧跟的词xt+1的建模方式是:
vj是词库中的某个词。实现一个循环神经网络,此网络利用隐层中的反馈信息对"历史记录"x1,x2...xt进行建模:
$h^{(0)}=h_{0}\epsilon R^{D_{h}}$是隐藏层的初始化向量
$x^{(t)}L$是以$x^{(t)}$one-hot行向量与嵌入矩阵L的乘积
这个one-hot行向量就是当前处理词汇的索引
是词嵌入矩阵,
$L$是词嵌入矩阵
$I$是输入词表征矩阵
$H$是隐藏转换矩阵
$U$是输出词表征矩阵
$b_{1}$ $b_{2}$是偏置值
$d$是词嵌入的维数
|V|代表词库的规模
$D_{h}$是隐层的维数
输出向量
是面向整个词库的概率分布,我们需要最优化交叉熵(非正则化的)的损失率:
使用困惑度来评估语言模型的性能,其定义形式如下:
梯度:
该模型中各个变量进行最优化迭代的时候的梯度如下所示:
初始化所有的上面这些需要训练的参数的值
然后通过对每一个词进行训练,安装上述公司求出每个参数的导数值
然后使用梯度下降方法对其进行更新
将新得到的参数代入到模型中,如果损失的值小于初始设定的值则停止迭代,否则继续进行迭代
下面是一张RNNLM的结构图
上面这张是第二层RNN节点的结构图
上面这张是在RNN的变量上面应用Dropout的结构,降低模型过拟合的误差,第一层RNN的dropout结构
上面这张是第一层RNN的结构图
(注意前方高能,一大批天书即将来袭)
''' Created on 2017年9月26日 @author: weizhen ''' import getpass import sys import time import numpy as np from copy import deepcopy from utils import calculate_perplexity, get_ptb_dataset, Vocab from utils import ptb_iterator, sample import tensorflow as tf from model import LanguageModel from tensorflow.contrib.legacy_seq2seq.python.ops.seq2seq import sequence_loss class Config(object): """储存超参数和数据信息""" batch_size = 64 embed_size = 50 hidden_size = 100 num_steps = 10 max_epochs = 16 early_stopping = 2 dropout = 0.9 lr = 0.001 class RNNLM_Model(LanguageModel): def load_data(self, debug=False): """加载词向量并且训练 train/dev/test 数据""" self.vocab = Vocab() self.vocab.construct(get_ptb_dataset('train')) self.encoded_train = np.array([self.vocab.encode(word) for word in get_ptb_dataset('train')], dtype=np.int32) self.encoded_valid = np.array([self.vocab.encode(word) for word in get_ptb_dataset('valid')], dtype=np.int32) self.encoded_test = np.array([self.vocab.encode(word) for word in get_ptb_dataset('test')]) if debug: num_debug = 1024 self.encoded_train = self.encoded_train[:num_debug] self.encoded_valid = self.encoded_valid[:num_debug] self.encoded_test = self.encoded_test[:num_debug] def add_placeholders(self): """生成placeholder 变量来表示输入的 tensors 这些placeholder 被用来在模型的其他地方被填充 并且在训练的过程中会被填充 input_placeholder:Input placeholder shape (None,num_steps),type tf.int32 labels_placeholder:label placeholder shape (None,num_steps) type tf.float32 dropout_placeholder:dropput value placeholder (scalar), type tf.float32 """ self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Input') self.labels_placeholder = tf.placeholder(tf.int32, shape=[None, self.config.num_steps], name='Target') self.dropout_placeholder = tf.placeholder(tf.float32, name='Dropout') def add_embedding(self): """添加词嵌入层 Hint : 这一层应该用input_placeholder 来索引词嵌入 Hint : 你或许能发现tf.nn.embedding_lookup 是有用的 Hint : 你或许能发现tf.split , tf.squeeze 是有用的在构造tensor 的输入的时候 Hint : 下面是你需要创建的变量的维度 L:(len(self.vocab),embed_size) Returns: inputs:一个训练次数的列表,每一个元素应该是 一个张量 大小是 (batch_size,embed_size) tf.split(dimension,num_split,input) dimension表示输入张量的哪一个维度, 如果是0就表示对第0维度进行切割, num_split就是切割的数量, 如果是2就表示输入张量被切成2份, 每一份是一个列表 tf.squeeze(input,squeeze_dims=None,name=None) 从tensor中删除所有大小是1的维度 example: t is a tensor of shape [1,2,1,3,1,1] shape(squeeze(t))==>[2,3] t is a tensor of shape [1,2,1,3,1,1] shape(squeeze(t,[2,4]))==>[1,2,3,1] tf.nn.embedding_lookup 将词的索引映射到词的向量 """ with tf.device('/cpu:0'): embedding = tf.get_variable('Embedding', [len(self.vocab), self.config.embed_size], trainable=True) inputs = tf.nn.embedding_lookup(embedding, self.input_placeholder) inputs = [tf.squeeze(x, [1]) for x in tf.split(inputs, self.config.num_steps, 1)] return inputs def add_projection(self, rnn_outputs): """添加一个投影层 投影层将隐藏层的表示变换到整个词向量上的分布式表示 Hint:下面是你需要去创建的维度 U(hidden_size,len(vocab)) b_2:(len(vocab),) 参数: rnn_outputs:一个训练次数的列表,每一个元素应该是一个张量 大小是(batch_size,embed_size) Returns: outputs:一个长度的列表,每一个元素是一个张量(batch_size,len(vocab)) """ with tf.variable_scope('Projection'): U = tf.get_variable('Matrix', [self.config.hidden_size, len(self.vocab)]) proj_b = tf.get_variable('Bias', [len(self.vocab)]) outputs = [tf.matmul(o, U) + proj_b for o in rnn_outputs] return outputs def add_loss_op(self, output): """将损失添加到目标函数上面 Hint:使用tensorflow.python.ops.seq2seq.sequence_loss 来实现序列损失 参数: 输出:一个张量 大小是 (None,self.vocab) 返回: 损失:一个0-d大小的张量 """ all_ones = [tf.ones([self.config.batch_size * self.config.num_steps])] cross_entropy = sequence_loss([output], [tf.reshape(self.labels_placeholder, [-1])], all_ones, len(self.vocab)) tf.add_to_collection('total_loss', cross_entropy) loss = tf.add_n(tf.get_collection('total_loss')) return loss def add_training_op(self, loss): """将目标损失添加到计算图上 创建一个优化器并且应用梯度下降到所有的训练变量上面 Hint:使用tf.train.AdamOptimizer 对于这个模型 使用optimizer.minimize() 会返回一个train_op的对象 参数: loss: 损失张量,来自于cross_entropy_loss 交叉熵损失 返回: train_op:训练的目标 """ with tf.variable_scope("Optimizer") as scope: train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss) return train_op def __init__(self, config): self.config = config self.load_data(debug=False) self.add_placeholders() self.inputs = self.add_embedding() self.rnn_outputs = self.add_model(self.inputs) self.outputs = self.add_projection(self.rnn_outputs) # 我们想去检验下一个词预测得多好 # 我们把o转变成float64 位 因为如果不这样就会有数值问题 # sum(output of softmax) = 1.00000298179 并且不是 1 self.predictions = [tf.nn.softmax(tf.cast(o, 'float64')) for o in self.outputs] # 将输出值转变成 len(vocab) 的大小 output = tf.reshape(tf.concat(self.outputs, 1), [-1, len(self.vocab)]) self.calculate_loss = self.add_loss_op(output) self.train_step = self.add_training_op(self.calculate_loss) def add_model(self, inputs): """创建RNN LM 模型 在下面的实现里面你需要去实现RNN LM 模型的等式 Hint: 使用一个零向量 大小是 (batch_size,hidden_size) 作为初始的RNN的状态 Hint: 将最后RNN输出 作为实例变量 self.final_state Hint : 确保将dropout应用到 输入和输出的 变量上面 Hint : 使用变量域 RNN 来定义 RNN变量 Hint : 表现一个明显的 for-loop 在输入上面 你可以使用scope.reuse_variable() 来确定权重 在每一次迭代都是相同的 确保不会在第一次循环的时候调用这个,因为没有变量会被初始化 Hint : 下面变量的不同的维度 , 你需要去创建的 H: (hidden_size,hidden_size) I: (embed_size,hidden_size) b_1:(hidden_size,) Args: inputs:一个记录num_steps的列表,里边的每一个元素应该是一个张量 大小是(batch_size,embed_size)的大小 Returns:返回 outputs:一个记录num_steps的列表,里面每一个元素应该是一个张量 大小是(batch_size,hidden_size) """ with tf.variable_scope('InputDropout'): inputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in inputs] with tf.variable_scope('RNN') as scope: self.initial_state = tf.zeros([self.config.batch_size, self.config.hidden_size]) state = self.initial_state rnn_outputs = [] for tstep, current_input in enumerate(inputs): if tstep > 0: scope.reuse_variables() RNN_H = tf.get_variable('HMatrix', [self.config.hidden_size, self.config.hidden_size]) RNN_I = tf.get_variable('IMatrix', [self.config.embed_size, self.config.hidden_size]) RNN_b = tf.get_variable('B', [self.config.hidden_size]) state = tf.nn.sigmoid(tf.matmul(state, RNN_H) + tf.matmul(current_input, RNN_I) + RNN_b) rnn_outputs.append(state) self.final_state = rnn_outputs[-1] with tf.variable_scope('RNNDropout'): rnn_outputs = [tf.nn.dropout(x, self.dropout_placeholder) for x in rnn_outputs] return rnn_outputs def run_epoch(self, session, data, train_op=None, verbose=10): config = self.config dp = config.dropout if not train_op: train_op = tf.no_op() dp = 1 total_steps = sum(1 for x in ptb_iterator(data, config.batch_size, config.num_steps)) total_loss = [] state = self.initial_state.eval() for step, (x, y) in enumerate(ptb_iterator(data, config.batch_size, config.num_steps)): # 我们需要通过初始状态,并且从最终状态中抽取数据来进行填充 # RNN 合适的 历史 feed = {self.input_placeholder: x, self.labels_placeholder: y, self.initial_state: state, self.dropout_placeholder: dp } loss, state, _ = session.run([self.calculate_loss, self.final_state, train_op], feed_dict=feed) total_loss.append(loss) if verbose and step % verbose == 0: sys.stdout.write('\r{} / {} : pp = {} '.format(step, total_steps, np.exp(np.mean(total_loss)))) sys.stdout.flush() if verbose: sys.stdout.write('\r') return np.exp(np.mean(total_loss)) def generate_text(session, model, config, starting_text='<eos>', stop_length=100, stop_tokens=None, temp=1.0): """从模型自动生成文字 Hint:创建一个feed-dictionary 并且使用sess.run()方法去执行这个模型 你会需要使用model.initial_state 作为一个键传递给feed_dict Hint:得到model.final_state 和 model.predictions[-1]. 在add_model()方法中设置model.final_state 。 model.predictions 是在 __init__方法中设置的 Hint:在模型的训练中存储输出的参数值,和预测的y_pred的值 参数: Args: session : tf.Session() object model : Object of type RNNLM Model config : A Config() object starting_text:Initial text passed to model Returns: output : List of word idxs """ state = model.initial_state.eval() # Imagine tokens as a batch size of one, length of len(tokens[0]) tokens = [model.vocab.encode(word) for word in starting_text.split()] for i in range(stop_length): feed = {model.input_placeholder: [tokens[-1:]], model.initial_state: state, model.dropout_placeholder: 1} state, y_pred = session.run([model.final_state, model.predictions[-1]], feed_dict=feed) next_word_idx = sample(y_pred[0], temperature=temp) tokens.append(next_word_idx) if stop_tokens and model.vocab.decode(tokens[-1]) in stop_tokens: break output = [model.vocab.decode(word_idx) for word_idx in tokens] return output def generate_sentence(session, model, config, *args, **kwargs): """方便从模型来生成句子""" return generate_text(session, model, config, *args, stop_tokens=['<eos>'], **kwargs) def test_RNNLM(): config = Config() gen_config = deepcopy(config) gen_config.batch_size = gen_config.num_steps = 1 # 创建训练模型,并且生成模型 with tf.variable_scope('RNNLM',reuse=None) as scope: model = RNNLM_Model(config) # 这个指示gen_model来重新使用相同的变量作为以上的模型 scope.reuse_variables() gen_model = RNNLM_Model(gen_config) init = tf.global_variables_initializer() saver = tf.train.Saver() with tf.Session() as session: best_val_pp = float('inf') best_val_epoch = 0 session.run(init) for epoch in range(config.max_epochs): print('Epoch {0}'.format(epoch)) start = time.time() train_pp = model.run_epoch(session, model.encoded_train, train_op=model.train_step) valid_pp = model.run_epoch(session, model.encoded_valid) print('Training perplexity: {0}'.format(train_pp)) print('Validation perplexity:{0}'.format(valid_pp)) if valid_pp < best_val_pp: best_val_pp = valid_pp best_val_epoch = epoch saver.save(session, './ptb_rnnlm.weights') if epoch - best_val_epoch > config.early_stopping: break print('Total time : {0}'.format(time.time() - start)) saver.restore(session, 'ptb_rnnlm.weights') test_pp = model.run_epoch(session, model.encoded_test) print('=-=' * 5) print('Test perplexity: {0} '.format(test_pp)) print('=-=' * 5) starting_text = 'in palo alto' while starting_text: print(' '.join(generate_sentence(session, gen_model, gen_config, starting_text=starting_text, temp=1.0))) #starting_text = raw_input('>') if __name__ == "__main__": test_RNNLM()
(其实也不算是天书啦,比高数简单多啦,比数学分析那是简单了好几十万倍了呀)
下面是训练的Log
1380 / 1452 : pp = 266.20892333984375 1390 / 1452 : pp = 265.94439697265625 1400 / 1452 : pp = 265.66845703125 1410 / 1452 : pp = 265.5393981933594 1420 / 1452 : pp = 265.32489013671875 1430 / 1452 : pp = 265.2019348144531 1440 / 1452 : pp = 265.13720703125 1450 / 1452 : pp = 264.954833984375 0 / 115 : pp = 296.9217224121094 10 / 115 : pp = 282.02130126953125 20 / 115 : pp = 279.76800537109375 30 / 115 : pp = 276.4101257324219 40 / 115 : pp = 276.2939147949219 50 / 115 : pp = 270.73565673828125 60 / 115 : pp = 269.88134765625 70 / 115 : pp = 266.8675231933594 80 / 115 : pp = 263.6731872558594 90 / 115 : pp = 260.8569030761719 100 / 115 : pp = 256.3356628417969 110 / 115 : pp = 255.1026611328125 Training perplexity: 264.9092102050781 Validation perplexity:254.84902954101562 Total time : 41.65332388877869 Epoch 3 0 / 1452 : pp = 327.0847473144531 10 / 1452 : pp = 273.9620056152344 20 / 1452 : pp = 270.22943115234375 30 / 1452 : pp = 263.5213317871094 40 / 1452 : pp = 264.0644836425781 50 / 1452 : pp = 258.6029968261719 60 / 1452 : pp = 257.04290771484375 70 / 1452 : pp = 257.59161376953125 80 / 1452 : pp = 256.7600402832031 90 / 1452 : pp = 254.5120391845703 100 / 1452 : pp = 252.44725036621094 110 / 1452 : pp = 250.13954162597656 120 / 1452 : pp = 249.91647338867188 130 / 1452 : pp = 249.50460815429688 140 / 1452 : pp = 247.67440795898438 150 / 1452 : pp = 247.19090270996094 160 / 1452 : pp = 247.8919219970703 170 / 1452 : pp = 247.54322814941406 180 / 1452 : pp = 246.17623901367188 190 / 1452 : pp = 245.78330993652344 200 / 1452 : pp = 246.80552673339844 210 / 1452 : pp = 246.3059844970703 220 / 1452 : pp = 246.19021606445312 230 / 1452 : pp = 246.70140075683594 240 / 1452 : pp = 246.3099822998047 250 / 1452 : pp = 245.1745147705078 260 / 1452 : pp = 244.17384338378906 270 / 1452 : pp = 242.57363891601562 280 / 1452 : pp = 242.8500213623047 290 / 1452 : pp = 243.0492706298828 300 / 1452 : pp = 243.1466522216797 310 / 1452 : pp = 242.89044189453125 320 / 1452 : pp = 243.08045959472656 330 / 1452 : pp = 243.32235717773438 340 / 1452 : pp = 242.34715270996094 350 / 1452 : pp = 242.80972290039062 360 / 1452 : pp = 242.5345458984375 370 / 1452 : pp = 242.0083465576172 380 / 1452 : pp = 241.22708129882812 390 / 1452 : pp = 241.24398803710938 400 / 1452 : pp = 240.63473510742188 410 / 1452 : pp = 240.94094848632812 420 / 1452 : pp = 241.19717407226562 430 / 1452 : pp = 240.8896026611328 440 / 1452 : pp = 240.7772979736328 450 / 1452 : pp = 240.45913696289062 460 / 1452 : pp = 240.06674194335938 470 / 1452 : pp = 239.42198181152344 480 / 1452 : pp = 238.39271545410156 490 / 1452 : pp = 238.0517120361328 500 / 1452 : pp = 237.31752014160156 510 / 1452 : pp = 237.1197967529297 520 / 1452 : pp = 236.64865112304688 530 / 1452 : pp = 236.004638671875 540 / 1452 : pp = 235.192626953125 550 / 1452 : pp = 234.6700439453125 560 / 1452 : pp = 234.1914825439453 570 / 1452 : pp = 233.80899047851562 580 / 1452 : pp = 233.3753662109375 590 / 1452 : pp = 232.8699188232422 600 / 1452 : pp = 232.2629852294922 610 / 1452 : pp = 231.8668212890625 620 / 1452 : pp = 231.478515625 630 / 1452 : pp = 231.0444793701172 640 / 1452 : pp = 231.2737579345703 650 / 1452 : pp = 231.28114318847656 660 / 1452 : pp = 231.4324951171875 670 / 1452 : pp = 231.48513793945312 680 / 1452 : pp = 231.45932006835938 690 / 1452 : pp = 231.17738342285156 700 / 1452 : pp = 231.00570678710938 710 / 1452 : pp = 231.03810119628906 720 / 1452 : pp = 230.96131896972656 730 / 1452 : pp = 230.91110229492188 740 / 1452 : pp = 231.13539123535156 750 / 1452 : pp = 231.04393005371094 760 / 1452 : pp = 231.03489685058594 770 / 1452 : pp = 231.19744873046875 780 / 1452 : pp = 231.26625061035156 790 / 1452 : pp = 231.38714599609375 800 / 1452 : pp = 231.24441528320312 810 / 1452 : pp = 231.16824340820312 820 / 1452 : pp = 231.11831665039062 830 / 1452 : pp = 231.34886169433594 840 / 1452 : pp = 231.221923828125 850 / 1452 : pp = 231.2562255859375 860 / 1452 : pp = 231.26492309570312 870 / 1452 : pp = 231.1961212158203 880 / 1452 : pp = 231.30506896972656 890 / 1452 : pp = 231.24728393554688 900 / 1452 : pp = 231.15744018554688 910 / 1452 : pp = 231.20175170898438 920 / 1452 : pp = 231.25534057617188 930 / 1452 : pp = 231.09461975097656 940 / 1452 : pp = 231.12612915039062 950 / 1452 : pp = 231.0475616455078 960 / 1452 : pp = 230.86056518554688 970 / 1452 : pp = 230.80377197265625 980 / 1452 : pp = 230.4598846435547 990 / 1452 : pp = 230.24559020996094 1000 / 1452 : pp = 229.91030883789062 1010 / 1452 : pp = 229.9349822998047 1020 / 1452 : pp = 230.01470947265625 1030 / 1452 : pp = 229.8909149169922 1040 / 1452 : pp = 229.9403533935547 1050 / 1452 : pp = 229.84815979003906 1060 / 1452 : pp = 229.60377502441406 1070 / 1452 : pp = 229.74647521972656 1080 / 1452 : pp = 229.80410766601562 1090 / 1452 : pp = 229.78733825683594 1100 / 1452 : pp = 229.64549255371094 1110 / 1452 : pp = 229.26255798339844 1120 / 1452 : pp = 229.00262451171875 1130 / 1452 : pp = 228.6716766357422 1140 / 1452 : pp = 228.55067443847656 1150 / 1452 : pp = 228.61563110351562 1160 / 1452 : pp = 228.50958251953125 1170 / 1452 : pp = 228.3498992919922 1180 / 1452 : pp = 228.29786682128906 1190 / 1452 : pp = 228.33204650878906 1200 / 1452 : pp = 228.27369689941406 1210 / 1452 : pp = 228.11831665039062 1220 / 1452 : pp = 228.21775817871094 1230 / 1452 : pp = 228.3170166015625 1240 / 1452 : pp = 228.22134399414062 1250 / 1452 : pp = 228.3769073486328 1260 / 1452 : pp = 228.37527465820312 1270 / 1452 : pp = 228.33694458007812 1280 / 1452 : pp = 228.27108764648438 1290 / 1452 : pp = 228.1731414794922 1300 / 1452 : pp = 228.12200927734375 1310 / 1452 : pp = 228.10275268554688 1320 / 1452 : pp = 227.9289093017578 1330 / 1452 : pp = 227.77723693847656 1340 / 1452 : pp = 227.79623413085938 1350 / 1452 : pp = 227.7408447265625 1360 / 1452 : pp = 227.72586059570312 1370 / 1452 : pp = 227.49728393554688 1380 / 1452 : pp = 227.37940979003906 1390 / 1452 : pp = 227.20166015625 1400 / 1452 : pp = 227.018310546875 1410 / 1452 : pp = 226.95651245117188 1420 / 1452 : pp = 226.8065643310547 1430 / 1452 : pp = 226.7261199951172 1440 / 1452 : pp = 226.7193145751953 1450 / 1452 : pp = 226.61068725585938 0 / 115 : pp = 269.342041015625 10 / 115 : pp = 255.03016662597656 20 / 115 : pp = 253.8992919921875 30 / 115 : pp = 251.04025268554688 40 / 115 : pp = 250.51756286621094 50 / 115 : pp = 245.3595428466797 60 / 115 : pp = 244.4713897705078 70 / 115 : pp = 241.2674560546875 80 / 115 : pp = 238.3473663330078 90 / 115 : pp = 235.56423950195312 100 / 115 : pp = 231.2281036376953 110 / 115 : pp = 229.8423614501953 Training perplexity: 226.5760040283203 Validation perplexity:229.59939575195312 Total time : 42.202677726745605 Epoch 4 0 / 1452 : pp = 282.2423095703125 10 / 1452 : pp = 240.16258239746094 20 / 1452 : pp = 236.12203979492188 30 / 1452 : pp = 230.3953857421875 40 / 1452 : pp = 231.8789825439453 50 / 1452 : pp = 227.26612854003906 60 / 1452 : pp = 226.22061157226562 70 / 1452 : pp = 227.01885986328125 80 / 1452 : pp = 226.2459716796875 90 / 1452 : pp = 224.3211669921875 100 / 1452 : pp = 222.65615844726562 110 / 1452 : pp = 220.70326232910156 120 / 1452 : pp = 220.42288208007812 130 / 1452 : pp = 219.8100128173828 140 / 1452 : pp = 218.04432678222656 150 / 1452 : pp = 217.31639099121094 160 / 1452 : pp = 217.86349487304688 170 / 1452 : pp = 217.46597290039062 180 / 1452 : pp = 216.3349151611328 190 / 1452 : pp = 216.12240600585938 200 / 1452 : pp = 216.97842407226562 210 / 1452 : pp = 216.51014709472656 220 / 1452 : pp = 216.46751403808594 230 / 1452 : pp = 216.80126953125 240 / 1452 : pp = 216.45965576171875 250 / 1452 : pp = 215.5008544921875 260 / 1452 : pp = 214.62210083007812 270 / 1452 : pp = 213.29183959960938 280 / 1452 : pp = 213.5621337890625 290 / 1452 : pp = 213.80657958984375 300 / 1452 : pp = 213.8963165283203 310 / 1452 : pp = 213.60653686523438 320 / 1452 : pp = 213.85877990722656 330 / 1452 : pp = 214.07345581054688 340 / 1452 : pp = 213.25421142578125 350 / 1452 : pp = 213.68019104003906 360 / 1452 : pp = 213.41717529296875 370 / 1452 : pp = 213.04920959472656 380 / 1452 : pp = 212.39019775390625 390 / 1452 : pp = 212.4908905029297 400 / 1452 : pp = 212.01914978027344 410 / 1452 : pp = 212.36903381347656 420 / 1452 : pp = 212.6802520751953 430 / 1452 : pp = 212.42697143554688 440 / 1452 : pp = 212.42990112304688 450 / 1452 : pp = 212.14524841308594 460 / 1452 : pp = 211.7836151123047 470 / 1452 : pp = 211.17282104492188 480 / 1452 : pp = 210.27903747558594 490 / 1452 : pp = 209.95211791992188 500 / 1452 : pp = 209.28302001953125 510 / 1452 : pp = 209.1029815673828 520 / 1452 : pp = 208.73855590820312 530 / 1452 : pp = 208.19700622558594 540 / 1452 : pp = 207.4554443359375 550 / 1452 : pp = 207.0062255859375 560 / 1452 : pp = 206.59739685058594 570 / 1452 : pp = 206.27874755859375 580 / 1452 : pp = 205.87144470214844 590 / 1452 : pp = 205.43545532226562 600 / 1452 : pp = 204.90940856933594 610 / 1452 : pp = 204.5686798095703 620 / 1452 : pp = 204.22862243652344 630 / 1452 : pp = 203.8448028564453 640 / 1452 : pp = 204.06576538085938 650 / 1452 : pp = 204.0941925048828 660 / 1452 : pp = 204.22103881835938 670 / 1452 : pp = 204.289794921875 680 / 1452 : pp = 204.3115234375 690 / 1452 : pp = 204.10284423828125 700 / 1452 : pp = 203.99757385253906 710 / 1452 : pp = 204.04971313476562 720 / 1452 : pp = 204.03152465820312 730 / 1452 : pp = 203.99046325683594 740 / 1452 : pp = 204.19786071777344 750 / 1452 : pp = 204.1642608642578 760 / 1452 : pp = 204.19435119628906 770 / 1452 : pp = 204.37786865234375 780 / 1452 : pp = 204.4965057373047 790 / 1452 : pp = 204.6479034423828 800 / 1452 : pp = 204.56117248535156 810 / 1452 : pp = 204.52284240722656 820 / 1452 : pp = 204.50978088378906 830 / 1452 : pp = 204.7531280517578 840 / 1452 : pp = 204.64468383789062 850 / 1452 : pp = 204.71348571777344 860 / 1452 : pp = 204.7399444580078 870 / 1452 : pp = 204.69406127929688 880 / 1452 : pp = 204.7965850830078 890 / 1452 : pp = 204.7594757080078 900 / 1452 : pp = 204.71446228027344 910 / 1452 : pp = 204.7590789794922 920 / 1452 : pp = 204.85772705078125 930 / 1452 : pp = 204.7428741455078 940 / 1452 : pp = 204.8068389892578 950 / 1452 : pp = 204.75791931152344 960 / 1452 : pp = 204.63815307617188 970 / 1452 : pp = 204.60760498046875 980 / 1452 : pp = 204.34347534179688 990 / 1452 : pp = 204.151611328125 1000 / 1452 : pp = 203.8665771484375 1010 / 1452 : pp = 203.9164581298828 1020 / 1452 : pp = 204.0184783935547 1030 / 1452 : pp = 203.95166015625 1040 / 1452 : pp = 204.03045654296875 1050 / 1452 : pp = 203.95846557617188 1060 / 1452 : pp = 203.77114868164062 1070 / 1452 : pp = 203.93260192871094 1080 / 1452 : pp = 204.00048828125 1090 / 1452 : pp = 204.00233459472656 1100 / 1452 : pp = 203.8960418701172 1110 / 1452 : pp = 203.5987548828125 1120 / 1452 : pp = 203.38392639160156 1130 / 1452 : pp = 203.08872985839844 1140 / 1452 : pp = 203.01272583007812 1150 / 1452 : pp = 203.0865936279297 1160 / 1452 : pp = 203.02308654785156 1170 / 1452 : pp = 202.9125518798828 1180 / 1452 : pp = 202.9097442626953 1190 / 1452 : pp = 202.98252868652344 1200 / 1452 : pp = 202.95387268066406 1210 / 1452 : pp = 202.851318359375 1220 / 1452 : pp = 202.97671508789062 1230 / 1452 : pp = 203.1051025390625 1240 / 1452 : pp = 203.0526123046875 1250 / 1452 : pp = 203.21417236328125 1260 / 1452 : pp = 203.23617553710938 1270 / 1452 : pp = 203.22802734375 1280 / 1452 : pp = 203.20846557617188 1290 / 1452 : pp = 203.15362548828125 1300 / 1452 : pp = 203.14315795898438 1310 / 1452 : pp = 203.15264892578125 1320 / 1452 : pp = 203.02801513671875 1330 / 1452 : pp = 202.92977905273438 1340 / 1452 : pp = 202.95484924316406 1350 / 1452 : pp = 202.9335479736328 1360 / 1452 : pp = 202.955322265625 1370 / 1452 : pp = 202.7740478515625 1380 / 1452 : pp = 202.68569946289062 1390 / 1452 : pp = 202.55816650390625 1400 / 1452 : pp = 202.41651916503906 1410 / 1452 : pp = 202.38494873046875 1420 / 1452 : pp = 202.27593994140625 1430 / 1452 : pp = 202.21826171875 1440 / 1452 : pp = 202.23272705078125 1450 / 1452 : pp = 202.16099548339844 0 / 115 : pp = 253.23211669921875 10 / 115 : pp = 237.62506103515625 20 / 115 : pp = 237.60557556152344 30 / 115 : pp = 234.9273223876953 40 / 115 : pp = 234.30519104003906 50 / 115 : pp = 229.43960571289062 60 / 115 : pp = 228.6050567626953 70 / 115 : pp = 225.2646484375 80 / 115 : pp = 222.55935668945312 90 / 115 : pp = 219.83255004882812 100 / 115 : pp = 215.5491485595703 110 / 115 : pp = 214.07937622070312 Training perplexity: 202.1349639892578 Validation perplexity:213.85256958007812 Total time : 42.10724234580994 Epoch 5 0 / 1452 : pp = 255.92384338378906 10 / 1452 : pp = 219.5322265625 20 / 1452 : pp = 214.36212158203125 30 / 1452 : pp = 209.12620544433594 40 / 1452 : pp = 210.04193115234375 50 / 1452 : pp = 205.77398681640625 60 / 1452 : pp = 204.8201141357422 70 / 1452 : pp = 205.3955841064453 80 / 1452 : pp = 204.8386688232422 90 / 1452 : pp = 203.21194458007812 100 / 1452 : pp = 201.87643432617188 110 / 1452 : pp = 200.10122680664062 120 / 1452 : pp = 199.82012939453125 130 / 1452 : pp = 199.11192321777344 140 / 1452 : pp = 197.51919555664062 150 / 1452 : pp = 197.03567504882812 160 / 1452 : pp = 197.4231414794922 170 / 1452 : pp = 197.09571838378906 180 / 1452 : pp = 196.17665100097656 190 / 1452 : pp = 196.0064697265625 200 / 1452 : pp = 196.7347869873047 210 / 1452 : pp = 196.3063507080078 220 / 1452 : pp = 196.21388244628906 230 / 1452 : pp = 196.5252227783203 240 / 1452 : pp = 196.203125 250 / 1452 : pp = 195.3251953125 260 / 1452 : pp = 194.53335571289062 270 / 1452 : pp = 193.3546142578125 280 / 1452 : pp = 193.59420776367188 290 / 1452 : pp = 193.83297729492188 300 / 1452 : pp = 193.98489379882812 310 / 1452 : pp = 193.68414306640625 320 / 1452 : pp = 193.89065551757812 330 / 1452 : pp = 194.0518798828125 340 / 1452 : pp = 193.32888793945312 350 / 1452 : pp = 193.76219177246094 360 / 1452 : pp = 193.56106567382812 370 / 1452 : pp = 193.28179931640625 380 / 1452 : pp = 192.7037811279297 390 / 1452 : pp = 192.8145294189453 400 / 1452 : pp = 192.43325805664062 410 / 1452 : pp = 192.81527709960938 420 / 1452 : pp = 193.13760375976562 430 / 1452 : pp = 192.9148712158203 440 / 1452 : pp = 192.92526245117188 450 / 1452 : pp = 192.70083618164062 460 / 1452 : pp = 192.36647033691406 470 / 1452 : pp = 191.85394287109375 480 / 1452 : pp = 191.07244873046875 490 / 1452 : pp = 190.75401306152344 500 / 1452 : pp = 190.1843719482422 510 / 1452 : pp = 190.03334045410156 520 / 1452 : pp = 189.72938537597656 530 / 1452 : pp = 189.25889587402344 540 / 1452 : pp = 188.59315490722656 550 / 1452 : pp = 188.19313049316406 560 / 1452 : pp = 187.80621337890625 570 / 1452 : pp = 187.5229034423828 580 / 1452 : pp = 187.1091766357422 590 / 1452 : pp = 186.72592163085938 600 / 1452 : pp = 186.2238006591797 610 / 1452 : pp = 185.89695739746094 620 / 1452 : pp = 185.60989379882812 630 / 1452 : pp = 185.2689208984375 640 / 1452 : pp = 185.47567749023438 650 / 1452 : pp = 185.5127410888672 660 / 1452 : pp = 185.64627075195312 670 / 1452 : pp = 185.71311950683594 680 / 1452 : pp = 185.72569274902344 690 / 1452 : pp = 185.56459045410156 700 / 1452 : pp = 185.48681640625 710 / 1452 : pp = 185.5458221435547 720 / 1452 : pp = 185.5598907470703 730 / 1452 : pp = 185.5335235595703 740 / 1452 : pp = 185.73995971679688 750 / 1452 : pp = 185.744384765625 760 / 1452 : pp = 185.81268310546875 770 / 1452 : pp = 186.00088500976562 780 / 1452 : pp = 186.14443969726562 790 / 1452 : pp = 186.30764770507812 800 / 1452 : pp = 186.2595977783203 810 / 1452 : pp = 186.23028564453125 820 / 1452 : pp = 186.23997497558594 830 / 1452 : pp = 186.49057006835938 840 / 1452 : pp = 186.43331909179688 850 / 1452 : pp = 186.48887634277344 860 / 1452 : pp = 186.51502990722656 870 / 1452 : pp = 186.5167999267578 880 / 1452 : pp = 186.62400817871094 890 / 1452 : pp = 186.6103973388672 900 / 1452 : pp = 186.58111572265625 910 / 1452 : pp = 186.64126586914062 920 / 1452 : pp = 186.7366180419922 930 / 1452 : pp = 186.65719604492188 940 / 1452 : pp = 186.71755981445312 950 / 1452 : pp = 186.6977996826172 960 / 1452 : pp = 186.62774658203125 970 / 1452 : pp = 186.62115478515625 980 / 1452 : pp = 186.3773193359375 990 / 1452 : pp = 186.23109436035156 1000 / 1452 : pp = 185.99227905273438 1010 / 1452 : pp = 186.0488739013672 1020 / 1452 : pp = 186.1744384765625 1030 / 1452 : pp = 186.1162109375 1040 / 1452 : pp = 186.18899536132812 1050 / 1452 : pp = 186.1549072265625 1060 / 1452 : pp = 186.01419067382812 1070 / 1452 : pp = 186.17364501953125 1080 / 1452 : pp = 186.27061462402344 1090 / 1452 : pp = 186.28428649902344 1100 / 1452 : pp = 186.2150115966797 1110 / 1452 : pp = 185.95103454589844 1120 / 1452 : pp = 185.77423095703125 1130 / 1452 : pp = 185.5232696533203 1140 / 1452 : pp = 185.4607391357422 1150 / 1452 : pp = 185.56077575683594 1160 / 1452 : pp = 185.53343200683594 1170 / 1452 : pp = 185.46453857421875 1180 / 1452 : pp = 185.4741668701172 1190 / 1452 : pp = 185.5594482421875 1200 / 1452 : pp = 185.53785705566406 1210 / 1452 : pp = 185.4576416015625 1220 / 1452 : pp = 185.5943145751953 1230 / 1452 : pp = 185.7483673095703 1240 / 1452 : pp = 185.70762634277344 1250 / 1452 : pp = 185.8568115234375 1260 / 1452 : pp = 185.90635681152344 1270 / 1452 : pp = 185.8961639404297 1280 / 1452 : pp = 185.89199829101562 1290 / 1452 : pp = 185.85911560058594 1300 / 1452 : pp = 185.86097717285156 1310 / 1452 : pp = 185.88739013671875 1320 / 1452 : pp = 185.79248046875 1330 / 1452 : pp = 185.69700622558594 1340 / 1452 : pp = 185.7310028076172 1350 / 1452 : pp = 185.72613525390625 1360 / 1452 : pp = 185.76829528808594 1370 / 1452 : pp = 185.6322021484375 1380 / 1452 : pp = 185.56378173828125 1390 / 1452 : pp = 185.4654998779297 1400 / 1452 : pp = 185.35110473632812 1410 / 1452 : pp = 185.33917236328125 1420 / 1452 : pp = 185.2509002685547 1430 / 1452 : pp = 185.20436096191406 1440 / 1452 : pp = 185.2254638671875 1450 / 1452 : pp = 185.16542053222656 0 / 115 : pp = 242.26800537109375 10 / 115 : pp = 226.12258911132812 20 / 115 : pp = 226.4702606201172 30 / 115 : pp = 223.982666015625 40 / 115 : pp = 223.376953125 50 / 115 : pp = 218.65716552734375 60 / 115 : pp = 217.95306396484375 70 / 115 : pp = 214.5392303466797 80 / 115 : pp = 212.07525634765625 90 / 115 : pp = 209.40631103515625 100 / 115 : pp = 205.1455078125 110 / 115 : pp = 203.6289520263672 Training perplexity: 185.14476013183594 Validation perplexity:203.3822784423828 Total time : 42.47052240371704 Epoch 6 0 / 1452 : pp = 233.56707763671875 10 / 1452 : pp = 202.6468505859375 20 / 1452 : pp = 198.2734375 30 / 1452 : pp = 193.47442626953125 40 / 1452 : pp = 195.17147827148438 50 / 1452 : pp = 191.5596923828125 60 / 1452 : pp = 190.4825897216797 70 / 1452 : pp = 191.07681274414062 80 / 1452 : pp = 190.339599609375 90 / 1452 : pp = 188.98277282714844 100 / 1452 : pp = 187.74757385253906 110 / 1452 : pp = 186.10104370117188 120 / 1452 : pp = 185.7500457763672 130 / 1452 : pp = 184.90707397460938 140 / 1452 : pp = 183.340087890625 150 / 1452 : pp = 182.70840454101562 160 / 1452 : pp = 183.1043701171875 170 / 1452 : pp = 182.69776916503906 180 / 1452 : pp = 181.88400268554688 190 / 1452 : pp = 181.8062286376953 200 / 1452 : pp = 182.4969940185547 210 / 1452 : pp = 182.10572814941406 220 / 1452 : pp = 181.9981689453125 230 / 1452 : pp = 182.3802490234375 240 / 1452 : pp = 182.03636169433594 250 / 1452 : pp = 181.23712158203125 260 / 1452 : pp = 180.53726196289062 270 / 1452 : pp = 179.53567504882812 280 / 1452 : pp = 179.70208740234375 290 / 1452 : pp = 179.977783203125 300 / 1452 : pp = 180.16600036621094 310 / 1452 : pp = 179.87294006347656 320 / 1452 : pp = 180.11849975585938 330 / 1452 : pp = 180.31838989257812 340 / 1452 : pp = 179.56759643554688 350 / 1452 : pp = 179.97134399414062 360 / 1452 : pp = 179.80030822753906 370 / 1452 : pp = 179.52085876464844 380 / 1452 : pp = 178.98228454589844 390 / 1452 : pp = 179.0868682861328 400 / 1452 : pp = 178.74569702148438 410 / 1452 : pp = 179.1776580810547 420 / 1452 : pp = 179.5055389404297 430 / 1452 : pp = 179.3883056640625 440 / 1452 : pp = 179.42279052734375 450 / 1452 : pp = 179.2106475830078 460 / 1452 : pp = 178.85311889648438 470 / 1452 : pp = 178.33840942382812 480 / 1452 : pp = 177.60350036621094 490 / 1452 : pp = 177.30335998535156 500 / 1452 : pp = 176.72222900390625 510 / 1452 : pp = 176.6067352294922 520 / 1452 : pp = 176.33998107910156 530 / 1452 : pp = 175.93162536621094 540 / 1452 : pp = 175.30657958984375 550 / 1452 : pp = 174.9462432861328 560 / 1452 : pp = 174.5836639404297 570 / 1452 : pp = 174.31431579589844 580 / 1452 : pp = 173.92300415039062 590 / 1452 : pp = 173.55856323242188 600 / 1452 : pp = 173.08277893066406 610 / 1452 : pp = 172.75930786132812 620 / 1452 : pp = 172.53192138671875 630 / 1452 : pp = 172.20652770996094 640 / 1452 : pp = 172.37454223632812 650 / 1452 : pp = 172.39845275878906 660 / 1452 : pp = 172.52255249023438 670 / 1452 : pp = 172.60935974121094 680 / 1452 : pp = 172.6611328125 690 / 1452 : pp = 172.53118896484375 700 / 1452 : pp = 172.4709014892578 710 / 1452 : pp = 172.5406494140625 720 / 1452 : pp = 172.55447387695312 730 / 1452 : pp = 172.5330047607422 740 / 1452 : pp = 172.7061767578125 750 / 1452 : pp = 172.71054077148438 760 / 1452 : pp = 172.77743530273438 770 / 1452 : pp = 172.95481872558594 780 / 1452 : pp = 173.11265563964844 790 / 1452 : pp = 173.2832794189453 800 / 1452 : pp = 173.2537841796875 810 / 1452 : pp = 173.22164916992188 820 / 1452 : pp = 173.24148559570312 830 / 1452 : pp = 173.48228454589844 840 / 1452 : pp = 173.43753051757812 850 / 1452 : pp = 173.505615234375 860 / 1452 : pp = 173.5214080810547 870 / 1452 : pp = 173.5009002685547 880 / 1452 : pp = 173.6202392578125 890 / 1452 : pp = 173.622802734375 900 / 1452 : pp = 173.5987091064453 910 / 1452 : pp = 173.68316650390625 920 / 1452 : pp = 173.77330017089844 930 / 1452 : pp = 173.72018432617188 940 / 1452 : pp = 173.79351806640625 950 / 1452 : pp = 173.7653350830078 960 / 1452 : pp = 173.7102508544922 970 / 1452 : pp = 173.69766235351562 980 / 1452 : pp = 173.4836883544922 990 / 1452 : pp = 173.3550262451172 1000 / 1452 : pp = 173.14816284179688 1010 / 1452 : pp = 173.20777893066406 1020 / 1452 : pp = 173.3390655517578 1030 / 1452 : pp = 173.2884063720703 1040 / 1452 : pp = 173.38015747070312 1050 / 1452 : pp = 173.35592651367188 1060 / 1452 : pp = 173.2260284423828 1070 / 1452 : pp = 173.39321899414062 1080 / 1452 : pp = 173.4879913330078 1090 / 1452 : pp = 173.5231475830078 1100 / 1452 : pp = 173.47177124023438 1110 / 1452 : pp = 173.24453735351562 1120 / 1452 : pp = 173.09408569335938 1130 / 1452 : pp = 172.86627197265625 1140 / 1452 : pp = 172.8234100341797 1150 / 1452 : pp = 172.92843627929688 1160 / 1452 : pp = 172.90065002441406 1170 / 1452 : pp = 172.8550567626953 1180 / 1452 : pp = 172.8810272216797 1190 / 1452 : pp = 172.97312927246094 1200 / 1452 : pp = 172.9776611328125 1210 / 1452 : pp = 172.89413452148438 1220 / 1452 : pp = 173.0257568359375 1230 / 1452 : pp = 173.1847381591797 1240 / 1452 : pp = 173.1756591796875 1250 / 1452 : pp = 173.32138061523438 1260 / 1452 : pp = 173.37229919433594 1270 / 1452 : pp = 173.36891174316406 1280 / 1452 : pp = 173.36337280273438 1290 / 1452 : pp = 173.3444366455078 1300 / 1452 : pp = 173.36138916015625 1310 / 1452 : pp = 173.4015655517578 1320 / 1452 : pp = 173.31790161132812 1330 / 1452 : pp = 173.24710083007812 1340 / 1452 : pp = 173.27212524414062 1350 / 1452 : pp = 173.27674865722656 1360 / 1452 : pp = 173.32749938964844 1370 / 1452 : pp = 173.20472717285156 1380 / 1452 : pp = 173.14889526367188 1390 / 1452 : pp = 173.0755157470703 1400 / 1452 : pp = 172.9678497314453 1410 / 1452 : pp = 172.9612579345703 1420 / 1452 : pp = 172.8872833251953 1430 / 1452 : pp = 172.84805297851562 1440 / 1452 : pp = 172.87252807617188 1450 / 1452 : pp = 172.82505798339844 0 / 115 : pp = 236.35635375976562 10 / 115 : pp = 219.06166076660156 20 / 115 : pp = 219.7670440673828 30 / 115 : pp = 217.33587646484375 40 / 115 : pp = 216.6626739501953 50 / 115 : pp = 212.04734802246094 60 / 115 : pp = 211.42068481445312 70 / 115 : pp = 207.9592742919922 80 / 115 : pp = 205.6216583251953 90 / 115 : pp = 202.93597412109375 100 / 115 : pp = 198.62583923339844 110 / 115 : pp = 196.97216796875 Training perplexity: 172.80404663085938 Validation perplexity:196.6871337890625 Total time : 41.52522921562195 Epoch 7 0 / 1452 : pp = 219.23231506347656 10 / 1452 : pp = 192.07225036621094 20 / 1452 : pp = 187.48464965820312 30 / 1452 : pp = 182.9149932861328 40 / 1452 : pp = 184.2945098876953 50 / 1452 : pp = 180.78492736816406 60 / 1452 : pp = 179.377197265625 70 / 1452 : pp = 180.0273895263672 80 / 1452 : pp = 179.2517547607422 90 / 1452 : pp = 177.77540588378906 100 / 1452 : pp = 176.6474151611328 110 / 1452 : pp = 174.84066772460938 120 / 1452 : pp = 174.46890258789062 130 / 1452 : pp = 173.64573669433594 140 / 1452 : pp = 172.17483520507812 150 / 1452 : pp = 171.57041931152344 160 / 1452 : pp = 171.92059326171875 170 / 1452 : pp = 171.5497283935547 180 / 1452 : pp = 170.77249145507812 190 / 1452 : pp = 170.72103881835938 200 / 1452 : pp = 171.336181640625 210 / 1452 : pp = 170.98524475097656 220 / 1452 : pp = 170.99771118164062 230 / 1452 : pp = 171.39918518066406 240 / 1452 : pp = 171.09925842285156 250 / 1452 : pp = 170.39962768554688 260 / 1452 : pp = 169.7328643798828 270 / 1452 : pp = 168.72225952148438 280 / 1452 : pp = 168.92552185058594 290 / 1452 : pp = 169.20147705078125 300 / 1452 : pp = 169.40338134765625 310 / 1452 : pp = 169.12057495117188 320 / 1452 : pp = 169.31236267089844 330 / 1452 : pp = 169.49945068359375 340 / 1452 : pp = 168.8396759033203 350 / 1452 : pp = 169.25917053222656 360 / 1452 : pp = 169.09388732910156 370 / 1452 : pp = 168.84323120117188 380 / 1452 : pp = 168.3832550048828 390 / 1452 : pp = 168.48275756835938 400 / 1452 : pp = 168.19972229003906 410 / 1452 : pp = 168.5838623046875 420 / 1452 : pp = 168.91119384765625 430 / 1452 : pp = 168.80836486816406 440 / 1452 : pp = 168.90264892578125 450 / 1452 : pp = 168.68589782714844 460 / 1452 : pp = 168.3704071044922 470 / 1452 : pp = 167.90394592285156 480 / 1452 : pp = 167.23373413085938 490 / 1452 : pp = 166.9560546875 500 / 1452 : pp = 166.43161010742188 510 / 1452 : pp = 166.320068359375 520 / 1452 : pp = 166.05902099609375 530 / 1452 : pp = 165.71714782714844 540 / 1452 : pp = 165.10398864746094 550 / 1452 : pp = 164.80430603027344 560 / 1452 : pp = 164.4687042236328 570 / 1452 : pp = 164.2272491455078 580 / 1452 : pp = 163.84312438964844 590 / 1452 : pp = 163.46035766601562 600 / 1452 : pp = 163.01559448242188 610 / 1452 : pp = 162.74134826660156 620 / 1452 : pp = 162.50267028808594 630 / 1452 : pp = 162.2018280029297 640 / 1452 : pp = 162.37130737304688 650 / 1452 : pp = 162.3895721435547 660 / 1452 : pp = 162.51351928710938 670 / 1452 : pp = 162.57684326171875 680 / 1452 : pp = 162.6346893310547 690 / 1452 : pp = 162.5135955810547 700 / 1452 : pp = 162.47052001953125 710 / 1452 : pp = 162.539794921875 720 / 1452 : pp = 162.55381774902344 730 / 1452 : pp = 162.5297088623047 740 / 1452 : pp = 162.71652221679688 750 / 1452 : pp = 162.740966796875 760 / 1452 : pp = 162.79754638671875 770 / 1452 : pp = 162.9949951171875 780 / 1452 : pp = 163.17868041992188 790 / 1452 : pp = 163.33055114746094 800 / 1452 : pp = 163.31591796875 810 / 1452 : pp = 163.2859344482422 820 / 1452 : pp = 163.2958984375 830 / 1452 : pp = 163.528564453125 840 / 1452 : pp = 163.47610473632812 850 / 1452 : pp = 163.5260772705078 860 / 1452 : pp = 163.55352783203125 870 / 1452 : pp = 163.55718994140625 880 / 1452 : pp = 163.67523193359375 890 / 1452 : pp = 163.6920166015625 900 / 1452 : pp = 163.67710876464844 910 / 1452 : pp = 163.7476806640625 920 / 1452 : pp = 163.84803771972656 930 / 1452 : pp = 163.8114013671875 940 / 1452 : pp = 163.86663818359375 950 / 1452 : pp = 163.83531188964844 960 / 1452 : pp = 163.79945373535156 970 / 1452 : pp = 163.80320739746094 980 / 1452 : pp = 163.5953369140625 990 / 1452 : pp = 163.48382568359375 1000 / 1452 : pp = 163.2642822265625 1010 / 1452 : pp = 163.32113647460938 1020 / 1452 : pp = 163.44204711914062 1030 / 1452 : pp = 163.40206909179688 1040 / 1452 : pp = 163.4915313720703 1050 / 1452 : pp = 163.47096252441406 1060 / 1452 : pp = 163.3601531982422 1070 / 1452 : pp = 163.5138397216797 1080 / 1452 : pp = 163.6189727783203 1090 / 1452 : pp = 163.6471405029297 1100 / 1452 : pp = 163.60406494140625 1110 / 1452 : pp = 163.40736389160156 1120 / 1452 : pp = 163.26841735839844 1130 / 1452 : pp = 163.0680694580078 1140 / 1452 : pp = 163.04591369628906 1150 / 1452 : pp = 163.15478515625 1160 / 1452 : pp = 163.1380615234375 1170 / 1452 : pp = 163.09303283691406 1180 / 1452 : pp = 163.14149475097656 1190 / 1452 : pp = 163.2374267578125 1200 / 1452 : pp = 163.2394561767578 1210 / 1452 : pp = 163.17835998535156 1220 / 1452 : pp = 163.32347106933594 1230 / 1452 : pp = 163.4639434814453 1240 / 1452 : pp = 163.4611358642578 1250 / 1452 : pp = 163.60687255859375 1260 / 1452 : pp = 163.67227172851562 1270 / 1452 : pp = 163.67515563964844 1280 / 1452 : pp = 163.6881103515625 1290 / 1452 : pp = 163.66648864746094 1300 / 1452 : pp = 163.69287109375 1310 / 1452 : pp = 163.7276153564453 1320 / 1452 : pp = 163.6551055908203 1330 / 1452 : pp = 163.58901977539062 1340 / 1452 : pp = 163.6205291748047 1350 / 1452 : pp = 163.63824462890625 1360 / 1452 : pp = 163.69334411621094 1370 / 1452 : pp = 163.5885467529297 1380 / 1452 : pp = 163.54049682617188 1390 / 1452 : pp = 163.4760284423828 1400 / 1452 : pp = 163.38897705078125 1410 / 1452 : pp = 163.3974609375 1420 / 1452 : pp = 163.35009765625 1430 / 1452 : pp = 163.32191467285156 1440 / 1452 : pp = 163.35220336914062 1450 / 1452 : pp = 163.3201904296875 0 / 115 : pp = 232.2108154296875 10 / 115 : pp = 214.35496520996094 20 / 115 : pp = 215.20510864257812 30 / 115 : pp = 212.82754516601562 40 / 115 : pp = 212.0598907470703 50 / 115 : pp = 207.5095672607422 60 / 115 : pp = 206.86976623535156 70 / 115 : pp = 203.36016845703125 80 / 115 : pp = 201.11538696289062 90 / 115 : pp = 198.52120971679688 100 / 115 : pp = 194.1772003173828 110 / 115 : pp = 192.41224670410156 Training perplexity: 163.29916381835938 Validation perplexity:192.09552001953125 Total time : 41.78096055984497 Epoch 8 0 / 1452 : pp = 201.77548217773438 10 / 1452 : pp = 180.4141082763672 20 / 1452 : pp = 176.41432189941406 30 / 1452 : pp = 172.7764434814453 40 / 1452 : pp = 174.69166564941406 50 / 1452 : pp = 171.2933807373047 60 / 1452 : pp = 170.08010864257812 70 / 1452 : pp = 170.6719512939453 80 / 1452 : pp = 170.07589721679688 90 / 1452 : pp = 168.7478485107422 100 / 1452 : pp = 167.57081604003906 110 / 1452 : pp = 166.06971740722656 120 / 1452 : pp = 165.73374938964844 130 / 1452 : pp = 164.80674743652344 140 / 1452 : pp = 163.32821655273438 150 / 1452 : pp = 162.6752471923828 160 / 1452 : pp = 163.02049255371094 170 / 1452 : pp = 162.64120483398438 180 / 1452 : pp = 161.95529174804688 190 / 1452 : pp = 161.91954040527344 200 / 1452 : pp = 162.5446014404297 210 / 1452 : pp = 162.2645721435547 220 / 1452 : pp = 162.3128662109375 230 / 1452 : pp = 162.65872192382812 240 / 1452 : pp = 162.40948486328125 250 / 1452 : pp = 161.75787353515625 260 / 1452 : pp = 161.15213012695312 270 / 1452 : pp = 160.22256469726562 280 / 1452 : pp = 160.3651123046875 290 / 1452 : pp = 160.63780212402344 300 / 1452 : pp = 160.80026245117188 310 / 1452 : pp = 160.54383850097656 320 / 1452 : pp = 160.7539520263672 330 / 1452 : pp = 160.94317626953125 340 / 1452 : pp = 160.3373565673828 350 / 1452 : pp = 160.71763610839844 360 / 1452 : pp = 160.60960388183594 370 / 1452 : pp = 160.37527465820312 380 / 1452 : pp = 159.92990112304688 390 / 1452 : pp = 160.0165557861328 400 / 1452 : pp = 159.75697326660156 410 / 1452 : pp = 160.15274047851562 420 / 1452 : pp = 160.48390197753906 430 / 1452 : pp = 160.4031982421875 440 / 1452 : pp = 160.4693603515625 450 / 1452 : pp = 160.28016662597656 460 / 1452 : pp = 159.94004821777344 470 / 1452 : pp = 159.48257446289062 480 / 1452 : pp = 158.87998962402344 490 / 1452 : pp = 158.59765625 500 / 1452 : pp = 158.10865783691406 510 / 1452 : pp = 157.96795654296875 520 / 1452 : pp = 157.7591552734375 530 / 1452 : pp = 157.42648315429688 540 / 1452 : pp = 156.85348510742188 550 / 1452 : pp = 156.5618438720703 560 / 1452 : pp = 156.24905395507812 570 / 1452 : pp = 155.9994354248047 580 / 1452 : pp = 155.612060546875 590 / 1452 : pp = 155.25830078125 600 / 1452 : pp = 154.8464813232422 610 / 1452 : pp = 154.5833282470703 620 / 1452 : pp = 154.38040161132812 630 / 1452 : pp = 154.0767364501953 640 / 1452 : pp = 154.2534637451172 650 / 1452 : pp = 154.25875854492188 660 / 1452 : pp = 154.35874938964844 670 / 1452 : pp = 154.4289093017578 680 / 1452 : pp = 154.51412963867188 690 / 1452 : pp = 154.41676330566406 700 / 1452 : pp = 154.37892150878906 710 / 1452 : pp = 154.4234619140625 720 / 1452 : pp = 154.4586639404297 730 / 1452 : pp = 154.4351806640625 740 / 1452 : pp = 154.6002197265625 750 / 1452 : pp = 154.65684509277344 760 / 1452 : pp = 154.73318481445312 770 / 1452 : pp = 154.92935180664062 780 / 1452 : pp = 155.1021728515625 790 / 1452 : pp = 155.24757385253906 800 / 1452 : pp = 155.223876953125 810 / 1452 : pp = 155.2095184326172 820 / 1452 : pp = 155.24009704589844 830 / 1452 : pp = 155.4519500732422 840 / 1452 : pp = 155.3947296142578 850 / 1452 : pp = 155.45306396484375 860 / 1452 : pp = 155.4661102294922 870 / 1452 : pp = 155.45765686035156 880 / 1452 : pp = 155.58758544921875 890 / 1452 : pp = 155.59373474121094 900 / 1452 : pp = 155.59254455566406 910 / 1452 : pp = 155.66854858398438 920 / 1452 : pp = 155.75942993164062 930 / 1452 : pp = 155.73350524902344 940 / 1452 : pp = 155.80740356445312 950 / 1452 : pp = 155.7733917236328 960 / 1452 : pp = 155.73565673828125 970 / 1452 : pp = 155.74404907226562 980 / 1452 : pp = 155.55902099609375 990 / 1452 : pp = 155.45675659179688 1000 / 1452 : pp = 155.2649688720703 1010 / 1452 : pp = 155.31332397460938 1020 / 1452 : pp = 155.44979858398438 1030 / 1452 : pp = 155.4137725830078 1040 / 1452 : pp = 155.49012756347656 1050 / 1452 : pp = 155.46054077148438 1060 / 1452 : pp = 155.3616943359375 1070 / 1452 : pp = 155.5286865234375 1080 / 1452 : pp = 155.63743591308594 1090 / 1452 : pp = 155.6842803955078 1100 / 1452 : pp = 155.65599060058594 1110 / 1452 : pp = 155.4827880859375 1120 / 1452 : pp = 155.35450744628906 1130 / 1452 : pp = 155.1777801513672 1140 / 1452 : pp = 155.15994262695312 1150 / 1452 : pp = 155.26193237304688 1160 / 1452 : pp = 155.26214599609375 1170 / 1452 : pp = 155.23231506347656 1180 / 1452 : pp = 155.29266357421875 1190 / 1452 : pp = 155.37680053710938 1200 / 1452 : pp = 155.3736114501953 1210 / 1452 : pp = 155.3380584716797 1220 / 1452 : pp = 155.474853515625 1230 / 1452 : pp = 155.62986755371094 1240 / 1452 : pp = 155.62831115722656 1250 / 1452 : pp = 155.77101135253906 1260 / 1452 : pp = 155.83445739746094 1270 / 1452 : pp = 155.845458984375 1280 / 1452 : pp = 155.8556365966797 1290 / 1452 : pp = 155.8556365966797 1300 / 1452 : pp = 155.8843994140625 1310 / 1452 : pp = 155.92417907714844 1320 / 1452 : pp = 155.8560791015625 1330 / 1452 : pp = 155.80636596679688 1340 / 1452 : pp = 155.84344482421875 1350 / 1452 : pp = 155.8706512451172 1360 / 1452 : pp = 155.9273681640625 1370 / 1452 : pp = 155.83140563964844 1380 / 1452 : pp = 155.7911376953125 1390 / 1452 : pp = 155.7401885986328 1400 / 1452 : pp = 155.6622314453125 1410 / 1452 : pp = 155.68531799316406 1420 / 1452 : pp = 155.64041137695312 1430 / 1452 : pp = 155.62216186523438 1440 / 1452 : pp = 155.6437530517578 1450 / 1452 : pp = 155.62757873535156 0 / 115 : pp = 228.70111083984375 10 / 115 : pp = 211.03330993652344 20 / 115 : pp = 212.24957275390625 30 / 115 : pp = 209.8839569091797 40 / 115 : pp = 209.11045837402344 50 / 115 : pp = 204.66351318359375 60 / 115 : pp = 204.03366088867188 70 / 115 : pp = 200.46681213378906 80 / 115 : pp = 198.24404907226562 90 / 115 : pp = 195.63223266601562 100 / 115 : pp = 191.18345642089844 110 / 115 : pp = 189.31134033203125 Training perplexity: 155.61154174804688 Validation perplexity:188.94537353515625 Total time : 42.13483738899231 Epoch 9 0 / 1452 : pp = 197.80628967285156 10 / 1452 : pp = 172.6316680908203 20 / 1452 : pp = 168.6739959716797 30 / 1452 : pp = 164.4781036376953 40 / 1452 : pp = 166.1627960205078 50 / 1452 : pp = 163.05197143554688 60 / 1452 : pp = 161.87924194335938 70 / 1452 : pp = 162.5297088623047 80 / 1452 : pp = 161.7450714111328 90 / 1452 : pp = 160.6148223876953 100 / 1452 : pp = 159.73289489746094 110 / 1452 : pp = 158.4092254638672 120 / 1452 : pp = 158.04653930664062 130 / 1452 : pp = 157.13563537597656 140 / 1452 : pp = 155.71798706054688 150 / 1452 : pp = 155.19161987304688 160 / 1452 : pp = 155.42718505859375 170 / 1452 : pp = 155.0531463623047 180 / 1452 : pp = 154.46897888183594 190 / 1452 : pp = 154.4127197265625 200 / 1452 : pp = 154.97154235839844 210 / 1452 : pp = 154.70169067382812 220 / 1452 : pp = 154.72816467285156 230 / 1452 : pp = 155.03799438476562 240 / 1452 : pp = 154.85601806640625 250 / 1452 : pp = 154.28016662597656 260 / 1452 : pp = 153.7699432373047 270 / 1452 : pp = 152.90948486328125 280 / 1452 : pp = 153.0459747314453 290 / 1452 : pp = 153.298095703125 300 / 1452 : pp = 153.45716857910156 310 / 1452 : pp = 153.22195434570312 320 / 1452 : pp = 153.41664123535156 330 / 1452 : pp = 153.66542053222656 340 / 1452 : pp = 153.06378173828125 350 / 1452 : pp = 153.43923950195312 360 / 1452 : pp = 153.31381225585938 370 / 1452 : pp = 153.13473510742188 380 / 1452 : pp = 152.75267028808594 390 / 1452 : pp = 152.85504150390625 400 / 1452 : pp = 152.62342834472656 410 / 1452 : pp = 153.03152465820312 420 / 1452 : pp = 153.39161682128906 430 / 1452 : pp = 153.30364990234375 440 / 1452 : pp = 153.37896728515625 450 / 1452 : pp = 153.18988037109375 460 / 1452 : pp = 152.88478088378906 470 / 1452 : pp = 152.4380340576172 480 / 1452 : pp = 151.86618041992188 490 / 1452 : pp = 151.5962371826172 500 / 1452 : pp = 151.11614990234375 510 / 1452 : pp = 150.99830627441406 520 / 1452 : pp = 150.8135986328125 530 / 1452 : pp = 150.500732421875 540 / 1452 : pp = 149.9623260498047 550 / 1452 : pp = 149.68028259277344 560 / 1452 : pp = 149.3885040283203 570 / 1452 : pp = 149.140380859375 580 / 1452 : pp = 148.76876831054688 590 / 1452 : pp = 148.43368530273438 600 / 1452 : pp = 148.02598571777344 610 / 1452 : pp = 147.7869110107422 620 / 1452 : pp = 147.59796142578125 630 / 1452 : pp = 147.30068969726562 640 / 1452 : pp = 147.45240783691406 650 / 1452 : pp = 147.4651336669922 660 / 1452 : pp = 147.5808563232422 670 / 1452 : pp = 147.65582275390625 680 / 1452 : pp = 147.7360382080078 690 / 1452 : pp = 147.63075256347656 700 / 1452 : pp = 147.6066131591797 710 / 1452 : pp = 147.7024383544922 720 / 1452 : pp = 147.7445526123047 730 / 1452 : pp = 147.72279357910156 740 / 1452 : pp = 147.87107849121094 750 / 1452 : pp = 147.91436767578125 760 / 1452 : pp = 147.9857635498047 770 / 1452 : pp = 148.18206787109375 780 / 1452 : pp = 148.3845672607422 790 / 1452 : pp = 148.5517120361328 800 / 1452 : pp = 148.54002380371094 810 / 1452 : pp = 148.51119995117188 820 / 1452 : pp = 148.5664520263672 830 / 1452 : pp = 148.7821044921875 840 / 1452 : pp = 148.72486877441406 850 / 1452 : pp = 148.77452087402344 860 / 1452 : pp = 148.80076599121094 870 / 1452 : pp = 148.79701232910156 880 / 1452 : pp = 148.9181671142578 890 / 1452 : pp = 148.94537353515625 900 / 1452 : pp = 148.9435272216797 910 / 1452 : pp = 149.02102661132812 920 / 1452 : pp = 149.1085968017578 930 / 1452 : pp = 149.06893920898438 940 / 1452 : pp = 149.1317138671875 950 / 1452 : pp = 149.1232452392578 960 / 1452 : pp = 149.10354614257812 970 / 1452 : pp = 149.11656188964844 980 / 1452 : pp = 148.94259643554688 990 / 1452 : pp = 148.8236846923828 1000 / 1452 : pp = 148.633056640625 1010 / 1452 : pp = 148.6830291748047 1020 / 1452 : pp = 148.8126220703125 1030 / 1452 : pp = 148.78089904785156 1040 / 1452 : pp = 148.8600311279297 1050 / 1452 : pp = 148.8486785888672 1060 / 1452 : pp = 148.7664337158203 1070 / 1452 : pp = 148.9337921142578 1080 / 1452 : pp = 149.04441833496094 1090 / 1452 : pp = 149.07284545898438 1100 / 1452 : pp = 149.03318786621094 1110 / 1452 : pp = 148.86428833007812 1120 / 1452 : pp = 148.7332305908203 1130 / 1452 : pp = 148.5670166015625 1140 / 1452 : pp = 148.54661560058594 1150 / 1452 : pp = 148.64219665527344 1160 / 1452 : pp = 148.6490020751953 1170 / 1452 : pp = 148.62420654296875 1180 / 1452 : pp = 148.67665100097656 1190 / 1452 : pp = 148.7633056640625 1200 / 1452 : pp = 148.7782745361328 1210 / 1452 : pp = 148.72500610351562 1220 / 1452 : pp = 148.87493896484375 1230 / 1452 : pp = 149.039794921875 1240 / 1452 : pp = 149.04000854492188 1250 / 1452 : pp = 149.17054748535156 1260 / 1452 : pp = 149.23863220214844 1270 / 1452 : pp = 149.2436065673828 1280 / 1452 : pp = 149.25086975097656 1290 / 1452 : pp = 149.24147033691406 1300 / 1452 : pp = 149.27413940429688 1310 / 1452 : pp = 149.32077026367188 1320 / 1452 : pp = 149.27301025390625 1330 / 1452 : pp = 149.23080444335938 1340 / 1452 : pp = 149.25791931152344 1350 / 1452 : pp = 149.2841033935547 1360 / 1452 : pp = 149.337158203125 1370 / 1452 : pp = 149.2467498779297 1380 / 1452 : pp = 149.21351623535156 1390 / 1452 : pp = 149.15403747558594 1400 / 1452 : pp = 149.0877685546875 1410 / 1452 : pp = 149.110595703125 1420 / 1452 : pp = 149.07241821289062 1430 / 1452 : pp = 149.05166625976562 1440 / 1452 : pp = 149.0776824951172 1450 / 1452 : pp = 149.06771850585938 0 / 115 : pp = 227.0559844970703 10 / 115 : pp = 208.7002410888672 20 / 115 : pp = 210.38775634765625 30 / 115 : pp = 207.9513397216797 40 / 115 : pp = 207.12994384765625 50 / 115 : pp = 202.70811462402344 60 / 115 : pp = 202.05787658691406 70 / 115 : pp = 198.3761444091797 80 / 115 : pp = 196.17637634277344 90 / 115 : pp = 193.5880126953125 100 / 115 : pp = 189.0758819580078 110 / 115 : pp = 187.07528686523438 Training perplexity: 149.0502471923828 Validation perplexity:186.6911163330078 Total time : 47.274805545806885 Epoch 10 0 / 1452 : pp = 181.8408203125 10 / 1452 : pp = 164.99664306640625 20 / 1452 : pp = 161.8847198486328 30 / 1452 : pp = 158.30064392089844 40 / 1452 : pp = 160.13914489746094 50 / 1452 : pp = 157.58743286132812 60 / 1452 : pp = 156.11871337890625 70 / 1452 : pp = 156.82948303222656 80 / 1452 : pp = 156.2889862060547 90 / 1452 : pp = 155.04833984375 100 / 1452 : pp = 154.09327697753906 110 / 1452 : pp = 152.5070343017578 120 / 1452 : pp = 152.20750427246094 130 / 1452 : pp = 151.3399200439453 140 / 1452 : pp = 149.90740966796875 150 / 1452 : pp = 149.345703125 160 / 1452 : pp = 149.59814453125 170 / 1452 : pp = 149.26539611816406 180 / 1452 : pp = 148.624267578125 190 / 1452 : pp = 148.58819580078125 200 / 1452 : pp = 149.09552001953125 210 / 1452 : pp = 148.8439178466797 220 / 1452 : pp = 148.86605834960938 230 / 1452 : pp = 149.1971435546875 240 / 1452 : pp = 148.96533203125 250 / 1452 : pp = 148.4253387451172 260 / 1452 : pp = 147.9200897216797 270 / 1452 : pp = 147.08816528320312 280 / 1452 : pp = 147.24366760253906 290 / 1452 : pp = 147.52182006835938 300 / 1452 : pp = 147.72222900390625 310 / 1452 : pp = 147.50486755371094 320 / 1452 : pp = 147.73892211914062 330 / 1452 : pp = 147.9404754638672 340 / 1452 : pp = 147.37803649902344 350 / 1452 : pp = 147.6969451904297 360 / 1452 : pp = 147.5704345703125 370 / 1452 : pp = 147.38674926757812 380 / 1452 : pp = 147.03970336914062 390 / 1452 : pp = 147.14231872558594 400 / 1452 : pp = 146.91656494140625 410 / 1452 : pp = 147.34059143066406 420 / 1452 : pp = 147.68496704101562 430 / 1452 : pp = 147.61195373535156 440 / 1452 : pp = 147.68405151367188 450 / 1452 : pp = 147.4711151123047 460 / 1452 : pp = 147.1927032470703 470 / 1452 : pp = 146.72970581054688 480 / 1452 : pp = 146.17173767089844 490 / 1452 : pp = 145.9028778076172 500 / 1452 : pp = 145.42721557617188 510 / 1452 : pp = 145.3111114501953 520 / 1452 : pp = 145.11460876464844 530 / 1452 : pp = 144.81488037109375 540 / 1452 : pp = 144.263916015625 550 / 1452 : pp = 143.997802734375 560 / 1452 : pp = 143.71766662597656 570 / 1452 : pp = 143.47451782226562 580 / 1452 : pp = 143.08474731445312 590 / 1452 : pp = 142.77920532226562 600 / 1452 : pp = 142.39573669433594 610 / 1452 : pp = 142.14906311035156 620 / 1452 : pp = 141.9574432373047 630 / 1452 : pp = 141.67369079589844 640 / 1452 : pp = 141.81556701660156 650 / 1452 : pp = 141.81759643554688 660 / 1452 : pp = 141.9339599609375 670 / 1452 : pp = 142.01248168945312 680 / 1452 : pp = 142.08773803710938 690 / 1452 : pp = 142.00328063964844 700 / 1452 : pp = 141.98086547851562 710 / 1452 : pp = 142.0632781982422 720 / 1452 : pp = 142.10372924804688 730 / 1452 : pp = 142.08055114746094 740 / 1452 : pp = 142.23619079589844 750 / 1452 : pp = 142.2660369873047 760 / 1452 : pp = 142.34678649902344 770 / 1452 : pp = 142.5257568359375 780 / 1452 : pp = 142.70025634765625 790 / 1452 : pp = 142.8614044189453 800 / 1452 : pp = 142.84573364257812 810 / 1452 : pp = 142.8250274658203 820 / 1452 : pp = 142.8540496826172 830 / 1452 : pp = 143.06053161621094 840 / 1452 : pp = 143.0423126220703 850 / 1452 : pp = 143.09634399414062 860 / 1452 : pp = 143.10487365722656 870 / 1452 : pp = 143.0884246826172 880 / 1452 : pp = 143.19387817382812 890 / 1452 : pp = 143.236083984375 900 / 1452 : pp = 143.23390197753906 910 / 1452 : pp = 143.29537963867188 920 / 1452 : pp = 143.3722686767578 930 / 1452 : pp = 143.33795166015625 940 / 1452 : pp = 143.40618896484375 950 / 1452 : pp = 143.3929901123047 960 / 1452 : pp = 143.3693389892578 970 / 1452 : pp = 143.39736938476562 980 / 1452 : pp = 143.2371063232422 990 / 1452 : pp = 143.13893127441406 1000 / 1452 : pp = 142.9658660888672 1010 / 1452 : pp = 143.01544189453125 1020 / 1452 : pp = 143.152587890625 1030 / 1452 : pp = 143.11334228515625 1040 / 1452 : pp = 143.19020080566406 1050 / 1452 : pp = 143.18234252929688 1060 / 1452 : pp = 143.092041015625 1070 / 1452 : pp = 143.24449157714844 1080 / 1452 : pp = 143.34828186035156 1090 / 1452 : pp = 143.38739013671875 1100 / 1452 : pp = 143.37432861328125 1110 / 1452 : pp = 143.20596313476562 1120 / 1452 : pp = 143.07969665527344 1130 / 1452 : pp = 142.92041015625 1140 / 1452 : pp = 142.90902709960938 1150 / 1452 : pp = 143.00732421875 1160 / 1452 : pp = 143.01182556152344 1170 / 1452 : pp = 142.9925994873047 1180 / 1452 : pp = 143.06080627441406 1190 / 1452 : pp = 143.14337158203125 1200 / 1452 : pp = 143.16644287109375 1210 / 1452 : pp = 143.1259002685547 1220 / 1452 : pp = 143.2671661376953 1230 / 1452 : pp = 143.4210968017578 1240 / 1452 : pp = 143.4327850341797 1250 / 1452 : pp = 143.5699920654297 1260 / 1452 : pp = 143.63771057128906 1270 / 1452 : pp = 143.65798950195312 1280 / 1452 : pp = 143.68251037597656 1290 / 1452 : pp = 143.68045043945312 1300 / 1452 : pp = 143.72293090820312 1310 / 1452 : pp = 143.77015686035156 1320 / 1452 : pp = 143.71910095214844 1330 / 1452 : pp = 143.68792724609375 1340 / 1452 : pp = 143.7241668701172 1350 / 1452 : pp = 143.7570037841797 1360 / 1452 : pp = 143.81829833984375 1370 / 1452 : pp = 143.7487030029297 1380 / 1452 : pp = 143.7196502685547 1390 / 1452 : pp = 143.67359924316406 1400 / 1452 : pp = 143.60592651367188 1410 / 1452 : pp = 143.62620544433594 1420 / 1452 : pp = 143.5905303955078 1430 / 1452 : pp = 143.55799865722656 1440 / 1452 : pp = 143.5891571044922 1450 / 1452 : pp = 143.5869598388672 0 / 115 : pp = 226.9864959716797 10 / 115 : pp = 207.8067169189453 20 / 115 : pp = 209.68667602539062 30 / 115 : pp = 207.1610565185547 40 / 115 : pp = 206.3247833251953 50 / 115 : pp = 201.77403259277344 60 / 115 : pp = 201.07098388671875 70 / 115 : pp = 197.33335876464844 80 / 115 : pp = 195.12513732910156 90 / 115 : pp = 192.5349578857422 100 / 115 : pp = 187.90072631835938 110 / 115 : pp = 185.81240844726562 Training perplexity: 143.57354736328125 Validation perplexity:185.40573120117188 Total time : 46.14846849441528 Epoch 11 0 / 1452 : pp = 181.93162536621094 10 / 1452 : pp = 159.94607543945312 20 / 1452 : pp = 156.83673095703125 30 / 1452 : pp = 153.75843811035156 40 / 1452 : pp = 155.18362426757812 50 / 1452 : pp = 152.39529418945312 60 / 1452 : pp = 151.18772888183594 70 / 1452 : pp = 151.9004364013672 80 / 1452 : pp = 151.30239868164062 90 / 1452 : pp = 150.1591033935547 100 / 1452 : pp = 149.18618774414062 110 / 1452 : pp = 147.72653198242188 120 / 1452 : pp = 147.4357452392578 130 / 1452 : pp = 146.41372680664062 140 / 1452 : pp = 145.0057373046875 150 / 1452 : pp = 144.39447021484375 160 / 1452 : pp = 144.5330047607422 170 / 1452 : pp = 144.23593139648438 180 / 1452 : pp = 143.63990783691406 190 / 1452 : pp = 143.63812255859375 200 / 1452 : pp = 144.1143798828125 210 / 1452 : pp = 143.88278198242188 220 / 1452 : pp = 143.92518615722656 230 / 1452 : pp = 144.24032592773438 240 / 1452 : pp = 143.94110107421875 250 / 1452 : pp = 143.3688507080078 260 / 1452 : pp = 142.8829345703125 270 / 1452 : pp = 142.11952209472656 280 / 1452 : pp = 142.19415283203125 290 / 1452 : pp = 142.51889038085938 300 / 1452 : pp = 142.70494079589844 310 / 1452 : pp = 142.51426696777344 320 / 1452 : pp = 142.70106506347656 330 / 1452 : pp = 142.88014221191406 340 / 1452 : pp = 142.3287353515625 350 / 1452 : pp = 142.6169891357422 360 / 1452 : pp = 142.51971435546875 370 / 1452 : pp = 142.33566284179688 380 / 1452 : pp = 142.04161071777344 390 / 1452 : pp = 142.13551330566406 400 / 1452 : pp = 141.9499969482422 410 / 1452 : pp = 142.3361358642578 420 / 1452 : pp = 142.64065551757812 430 / 1452 : pp = 142.5511016845703 440 / 1452 : pp = 142.6728973388672 450 / 1452 : pp = 142.47030639648438 460 / 1452 : pp = 142.1704864501953 470 / 1452 : pp = 141.73390197753906 480 / 1452 : pp = 141.23020935058594 490 / 1452 : pp = 140.9759521484375 500 / 1452 : pp = 140.51609802246094 510 / 1452 : pp = 140.40545654296875 520 / 1452 : pp = 140.1936492919922 530 / 1452 : pp = 139.8929443359375 540 / 1452 : pp = 139.3696746826172 550 / 1452 : pp = 139.13217163085938 560 / 1452 : pp = 138.85247802734375 570 / 1452 : pp = 138.6092987060547 580 / 1452 : pp = 138.2471160888672 590 / 1452 : pp = 137.9485626220703 600 / 1452 : pp = 137.57379150390625 610 / 1452 : pp = 137.31576538085938 620 / 1452 : pp = 137.14230346679688 630 / 1452 : pp = 136.87405395507812 640 / 1452 : pp = 137.02928161621094 650 / 1452 : pp = 137.0481719970703 660 / 1452 : pp = 137.1595001220703 670 / 1452 : pp = 137.21124267578125 680 / 1452 : pp = 137.2671356201172 690 / 1452 : pp = 137.19410705566406 700 / 1452 : pp = 137.1850128173828 710 / 1452 : pp = 137.26058959960938 720 / 1452 : pp = 137.30726623535156 730 / 1452 : pp = 137.28048706054688 740 / 1452 : pp = 137.4352569580078 750 / 1452 : pp = 137.4680938720703 760 / 1452 : pp = 137.5524139404297 770 / 1452 : pp = 137.73829650878906 780 / 1452 : pp = 137.90882873535156 790 / 1452 : pp = 138.05865478515625 800 / 1452 : pp = 138.0673370361328 810 / 1452 : pp = 138.03909301757812 820 / 1452 : pp = 138.084716796875 830 / 1452 : pp = 138.27989196777344 840 / 1452 : pp = 138.23545837402344 850 / 1452 : pp = 138.30343627929688 860 / 1452 : pp = 138.3339080810547 870 / 1452 : pp = 138.32835388183594 880 / 1452 : pp = 138.4450225830078 890 / 1452 : pp = 138.47157287597656 900 / 1452 : pp = 138.46304321289062 910 / 1452 : pp = 138.55618286132812 920 / 1452 : pp = 138.64512634277344 930 / 1452 : pp = 138.6160430908203 940 / 1452 : pp = 138.66932678222656 950 / 1452 : pp = 138.6573028564453 960 / 1452 : pp = 138.6463165283203 970 / 1452 : pp = 138.67059326171875 980 / 1452 : pp = 138.50999450683594 990 / 1452 : pp = 138.42430114746094 1000 / 1452 : pp = 138.25344848632812 1010 / 1452 : pp = 138.3004608154297 1020 / 1452 : pp = 138.4243621826172 1030 / 1452 : pp = 138.40713500976562 1040 / 1452 : pp = 138.47129821777344 1050 / 1452 : pp = 138.45928955078125 1060 / 1452 : pp = 138.3919677734375 1070 / 1452 : pp = 138.5287628173828 1080 / 1452 : pp = 138.62298583984375 1090 / 1452 : pp = 138.6699981689453 1100 / 1452 : pp = 138.64849853515625 1110 / 1452 : pp = 138.49191284179688 1120 / 1452 : pp = 138.37355041503906 1130 / 1452 : pp = 138.2216796875 1140 / 1452 : pp = 138.21534729003906 1150 / 1452 : pp = 138.30963134765625 1160 / 1452 : pp = 138.316162109375 1170 / 1452 : pp = 138.3023681640625 1180 / 1452 : pp = 138.36932373046875 1190 / 1452 : pp = 138.45960998535156 1200 / 1452 : pp = 138.4866180419922 1210 / 1452 : pp = 138.45730590820312 1220 / 1452 : pp = 138.60031127929688 1230 / 1452 : pp = 138.75485229492188 1240 / 1452 : pp = 138.7751007080078 1250 / 1452 : pp = 138.91221618652344 1260 / 1452 : pp = 138.9815216064453 1270 / 1452 : pp = 138.9919891357422 1280 / 1452 : pp = 139.0243377685547 1290 / 1452 : pp = 139.02725219726562 1300 / 1452 : pp = 139.0701446533203 1310 / 1452 : pp = 139.1090850830078 1320 / 1452 : pp = 139.06027221679688 1330 / 1452 : pp = 139.0338134765625 1340 / 1452 : pp = 139.06385803222656 1350 / 1452 : pp = 139.09608459472656 1360 / 1452 : pp = 139.1609649658203 1370 / 1452 : pp = 139.0869903564453 1380 / 1452 : pp = 139.0604705810547 1390 / 1452 : pp = 139.01670837402344 1400 / 1452 : pp = 138.94393920898438 1410 / 1452 : pp = 138.97323608398438 1420 / 1452 : pp = 138.9404296875 1430 / 1452 : pp = 138.90943908691406 1440 / 1452 : pp = 138.94268798828125 1450 / 1452 : pp = 138.93991088867188 0 / 115 : pp = 225.55990600585938 10 / 115 : pp = 207.0504608154297 20 / 115 : pp = 208.98306274414062 30 / 115 : pp = 206.28396606445312 40 / 115 : pp = 205.35386657714844 50 / 115 : pp = 200.7255401611328 60 / 115 : pp = 200.0526580810547 70 / 115 : pp = 196.33087158203125 80 / 115 : pp = 194.12110900878906 90 / 115 : pp = 191.52816772460938 100 / 115 : pp = 186.7974395751953 110 / 115 : pp = 184.59829711914062 Training perplexity: 138.9222869873047 Validation perplexity:184.18101501464844 Total time : 43.92928600311279 Epoch 12 0 / 1452 : pp = 173.0251007080078 10 / 1452 : pp = 152.98446655273438 20 / 1452 : pp = 150.43128967285156 30 / 1452 : pp = 147.5819854736328 40 / 1452 : pp = 149.4164276123047 50 / 1452 : pp = 146.70816040039062 60 / 1452 : pp = 145.557861328125 70 / 1452 : pp = 146.50473022460938 80 / 1452 : pp = 145.83200073242188 90 / 1452 : pp = 144.84402465820312 100 / 1452 : pp = 144.0390167236328 110 / 1452 : pp = 142.66514587402344 120 / 1452 : pp = 142.3549346923828 130 / 1452 : pp = 141.4630126953125 140 / 1452 : pp = 140.2266082763672 150 / 1452 : pp = 139.67518615722656 160 / 1452 : pp = 139.90414428710938 170 / 1452 : pp = 139.5490264892578 180 / 1452 : pp = 138.91969299316406 190 / 1452 : pp = 138.89234924316406 200 / 1452 : pp = 139.40908813476562 210 / 1452 : pp = 139.19068908691406 220 / 1452 : pp = 139.35513305664062 230 / 1452 : pp = 139.5464324951172 240 / 1452 : pp = 139.3047637939453 250 / 1452 : pp = 138.7708740234375 260 / 1452 : pp = 138.29188537597656 270 / 1452 : pp = 137.4787139892578 280 / 1452 : pp = 137.6367950439453 290 / 1452 : pp = 137.98513793945312 300 / 1452 : pp = 138.17819213867188 310 / 1452 : pp = 137.943359375 320 / 1452 : pp = 138.12060546875 330 / 1452 : pp = 138.29037475585938 340 / 1452 : pp = 137.77606201171875 350 / 1452 : pp = 138.06378173828125 360 / 1452 : pp = 137.99000549316406 370 / 1452 : pp = 137.81922912597656 380 / 1452 : pp = 137.52159118652344 390 / 1452 : pp = 137.61782836914062 400 / 1452 : pp = 137.4178924560547 410 / 1452 : pp = 137.82632446289062 420 / 1452 : pp = 138.17567443847656 430 / 1452 : pp = 138.11863708496094 440 / 1452 : pp = 138.215087890625 450 / 1452 : pp = 137.9976348876953 460 / 1452 : pp = 137.6929168701172 470 / 1452 : pp = 137.25416564941406 480 / 1452 : pp = 136.75140380859375 490 / 1452 : pp = 136.51712036132812 500 / 1452 : pp = 136.0896453857422 510 / 1452 : pp = 135.97048950195312 520 / 1452 : pp = 135.7760009765625 530 / 1452 : pp = 135.50389099121094 540 / 1452 : pp = 135.01437377929688 550 / 1452 : pp = 134.7666015625 560 / 1452 : pp = 134.48973083496094 570 / 1452 : pp = 134.22853088378906 580 / 1452 : pp = 133.88455200195312 590 / 1452 : pp = 133.5808868408203 600 / 1452 : pp = 133.22975158691406 610 / 1452 : pp = 132.99591064453125 620 / 1452 : pp = 132.79502868652344 630 / 1452 : pp = 132.5094451904297 640 / 1452 : pp = 132.62892150878906 650 / 1452 : pp = 132.63499450683594 660 / 1452 : pp = 132.7379913330078 670 / 1452 : pp = 132.79046630859375 680 / 1452 : pp = 132.85842895507812 690 / 1452 : pp = 132.80364990234375 700 / 1452 : pp = 132.80477905273438 710 / 1452 : pp = 132.90170288085938 720 / 1452 : pp = 132.92971801757812 730 / 1452 : pp = 132.9019012451172 740 / 1452 : pp = 133.04811096191406 750 / 1452 : pp = 133.10877990722656 760 / 1452 : pp = 133.19189453125 770 / 1452 : pp = 133.3564910888672 780 / 1452 : pp = 133.54000854492188 790 / 1452 : pp = 133.69239807128906 800 / 1452 : pp = 133.68495178222656 810 / 1452 : pp = 133.67971801757812 820 / 1452 : pp = 133.7035675048828 830 / 1452 : pp = 133.89329528808594 840 / 1452 : pp = 133.850341796875 850 / 1452 : pp = 133.90390014648438 860 / 1452 : pp = 133.9090118408203 870 / 1452 : pp = 133.89974975585938 880 / 1452 : pp = 134.0077667236328 890 / 1452 : pp = 134.03485107421875 900 / 1452 : pp = 134.0261688232422 910 / 1452 : pp = 134.10255432128906 920 / 1452 : pp = 134.17291259765625 930 / 1452 : pp = 134.14796447753906 940 / 1452 : pp = 134.20925903320312 950 / 1452 : pp = 134.19281005859375 960 / 1452 : pp = 134.17745971679688 970 / 1452 : pp = 134.18653869628906 980 / 1452 : pp = 134.03192138671875 990 / 1452 : pp = 133.94349670410156 1000 / 1452 : pp = 133.79685974121094 1010 / 1452 : pp = 133.8438262939453 1020 / 1452 : pp = 133.9608612060547 1030 / 1452 : pp = 133.93934631347656 1040 / 1452 : pp = 134.02833557128906 1050 / 1452 : pp = 134.01734924316406 1060 / 1452 : pp = 133.95346069335938 1070 / 1452 : pp = 134.10205078125 1080 / 1452 : pp = 134.2030487060547 1090 / 1452 : pp = 134.23696899414062 1100 / 1452 : pp = 134.2230224609375 1110 / 1452 : pp = 134.0829315185547 1120 / 1452 : pp = 133.980224609375 1130 / 1452 : pp = 133.83815002441406 1140 / 1452 : pp = 133.8366241455078 1150 / 1452 : pp = 133.92108154296875 1160 / 1452 : pp = 133.94375610351562 1170 / 1452 : pp = 133.9360809326172 1180 / 1452 : pp = 133.99684143066406 1190 / 1452 : pp = 134.0944366455078 1200 / 1452 : pp = 134.11676025390625 1210 / 1452 : pp = 134.0911102294922 1220 / 1452 : pp = 134.22763061523438 1230 / 1452 : pp = 134.38043212890625 1240 / 1452 : pp = 134.39817810058594 1250 / 1452 : pp = 134.5367431640625 1260 / 1452 : pp = 134.593017578125 1270 / 1452 : pp = 134.61497497558594 1280 / 1452 : pp = 134.6423797607422 1290 / 1452 : pp = 134.64340209960938 1300 / 1452 : pp = 134.68026733398438 1310 / 1452 : pp = 134.73556518554688 1320 / 1452 : pp = 134.69021606445312 1330 / 1452 : pp = 134.66131591796875 1340 / 1452 : pp = 134.69393920898438 1350 / 1452 : pp = 134.7328643798828 1360 / 1452 : pp = 134.79405212402344 1370 / 1452 : pp = 134.71237182617188 1380 / 1452 : pp = 134.6885528564453 1390 / 1452 : pp = 134.65110778808594 1400 / 1452 : pp = 134.59584045410156 1410 / 1452 : pp = 134.6193389892578 1420 / 1452 : pp = 134.58338928222656 1430 / 1452 : pp = 134.559326171875 1440 / 1452 : pp = 134.59507751464844 1450 / 1452 : pp = 134.59365844726562 0 / 115 : pp = 226.0741729736328 10 / 115 : pp = 207.00494384765625 20 / 115 : pp = 209.26976013183594 30 / 115 : pp = 206.44662475585938 40 / 115 : pp = 205.47268676757812 50 / 115 : pp = 200.7876739501953 60 / 115 : pp = 200.13414001464844 70 / 115 : pp = 196.35549926757812 80 / 115 : pp = 194.10777282714844 90 / 115 : pp = 191.47467041015625 100 / 115 : pp = 186.61351013183594 110 / 115 : pp = 184.30374145507812 Training perplexity: 134.57826232910156 Validation perplexity:183.8900146484375 Total time : 45.410256147384644 Epoch 13 0 / 1452 : pp = 169.39393615722656 10 / 1452 : pp = 150.13232421875 20 / 1452 : pp = 147.60450744628906 30 / 1452 : pp = 144.64317321777344 40 / 1452 : pp = 146.47427368164062 50 / 1452 : pp = 143.929443359375 60 / 1452 : pp = 142.8344268798828 70 / 1452 : pp = 143.45248413085938 80 / 1452 : pp = 142.5418701171875 90 / 1452 : pp = 141.6178436279297 100 / 1452 : pp = 140.70127868652344 110 / 1452 : pp = 139.2852325439453 120 / 1452 : pp = 138.8017120361328 130 / 1452 : pp = 137.85629272460938 140 / 1452 : pp = 136.51718139648438 150 / 1452 : pp = 136.03619384765625 160 / 1452 : pp = 136.154296875 170 / 1452 : pp = 135.67037963867188 180 / 1452 : pp = 135.0376739501953 190 / 1452 : pp = 134.9230499267578 200 / 1452 : pp = 135.4241180419922 210 / 1452 : pp = 135.24581909179688 220 / 1452 : pp = 135.37957763671875 230 / 1452 : pp = 135.67652893066406 240 / 1452 : pp = 135.4161834716797 250 / 1452 : pp = 134.90895080566406 260 / 1452 : pp = 134.46754455566406 270 / 1452 : pp = 133.68577575683594 280 / 1452 : pp = 133.86770629882812 290 / 1452 : pp = 134.18475341796875 300 / 1452 : pp = 134.39132690429688 310 / 1452 : pp = 134.19985961914062 320 / 1452 : pp = 134.37998962402344 330 / 1452 : pp = 134.5557403564453 340 / 1452 : pp = 134.00686645507812 350 / 1452 : pp = 134.27749633789062 360 / 1452 : pp = 134.20286560058594 370 / 1452 : pp = 134.042724609375 380 / 1452 : pp = 133.74398803710938 390 / 1452 : pp = 133.83584594726562 400 / 1452 : pp = 133.64382934570312 410 / 1452 : pp = 134.02366638183594 420 / 1452 : pp = 134.35415649414062 430 / 1452 : pp = 134.310546875 440 / 1452 : pp = 134.3634490966797 450 / 1452 : pp = 134.15602111816406 460 / 1452 : pp = 133.86578369140625 470 / 1452 : pp = 133.43414306640625 480 / 1452 : pp = 132.90310668945312 490 / 1452 : pp = 132.646240234375 500 / 1452 : pp = 132.1982421875 510 / 1452 : pp = 132.04200744628906 520 / 1452 : pp = 131.86940002441406 530 / 1452 : pp = 131.59841918945312 540 / 1452 : pp = 131.12356567382812 550 / 1452 : pp = 130.887939453125 560 / 1452 : pp = 130.6210174560547 570 / 1452 : pp = 130.37826538085938 580 / 1452 : pp = 130.0374755859375 590 / 1452 : pp = 129.75979614257812 600 / 1452 : pp = 129.38308715820312 610 / 1452 : pp = 129.16685485839844 620 / 1452 : pp = 129.0115509033203 630 / 1452 : pp = 128.75152587890625 640 / 1452 : pp = 128.87295532226562 650 / 1452 : pp = 128.88734436035156 660 / 1452 : pp = 128.98275756835938 670 / 1452 : pp = 129.0487060546875 680 / 1452 : pp = 129.11013793945312 690 / 1452 : pp = 129.0646514892578 700 / 1452 : pp = 129.06280517578125 710 / 1452 : pp = 129.1343994140625 720 / 1452 : pp = 129.18582153320312 730 / 1452 : pp = 129.15138244628906 740 / 1452 : pp = 129.29811096191406 750 / 1452 : pp = 129.339599609375 760 / 1452 : pp = 129.4257354736328 770 / 1452 : pp = 129.61631774902344 780 / 1452 : pp = 129.802734375 790 / 1452 : pp = 129.96804809570312 800 / 1452 : pp = 129.95187377929688 810 / 1452 : pp = 129.92417907714844 820 / 1452 : pp = 129.9774627685547 830 / 1452 : pp = 130.1638946533203 840 / 1452 : pp = 130.13095092773438 850 / 1452 : pp = 130.16595458984375 860 / 1452 : pp = 130.173828125 870 / 1452 : pp = 130.170166015625 880 / 1452 : pp = 130.27032470703125 890 / 1452 : pp = 130.3022003173828 900 / 1452 : pp = 130.3071746826172 910 / 1452 : pp = 130.37939453125 920 / 1452 : pp = 130.46229553222656 930 / 1452 : pp = 130.43846130371094 940 / 1452 : pp = 130.50889587402344 950 / 1452 : pp = 130.50086975097656 960 / 1452 : pp = 130.4833221435547 970 / 1452 : pp = 130.50814819335938 980 / 1452 : pp = 130.35577392578125 990 / 1452 : pp = 130.26759338378906 1000 / 1452 : pp = 130.1064453125 1010 / 1452 : pp = 130.1472625732422 1020 / 1452 : pp = 130.27169799804688 1030 / 1452 : pp = 130.25100708007812 1040 / 1452 : pp = 130.30816650390625 1050 / 1452 : pp = 130.29803466796875 1060 / 1452 : pp = 130.2242431640625 1070 / 1452 : pp = 130.35906982421875 1080 / 1452 : pp = 130.45103454589844 1090 / 1452 : pp = 130.49838256835938 1100 / 1452 : pp = 130.484130859375 1110 / 1452 : pp = 130.35316467285156 1120 / 1452 : pp = 130.24697875976562 1130 / 1452 : pp = 130.10804748535156 1140 / 1452 : pp = 130.1076202392578 1150 / 1452 : pp = 130.195068359375 1160 / 1452 : pp = 130.19674682617188 1170 / 1452 : pp = 130.18321228027344 1180 / 1452 : pp = 130.24623107910156 1190 / 1452 : pp = 130.33905029296875 1200 / 1452 : pp = 130.3650360107422 1210 / 1452 : pp = 130.34588623046875 1220 / 1452 : pp = 130.4850616455078 1230 / 1452 : pp = 130.63160705566406 1240 / 1452 : pp = 130.64674377441406 1250 / 1452 : pp = 130.77078247070312 1260 / 1452 : pp = 130.8397674560547 1270 / 1452 : pp = 130.8511199951172 1280 / 1452 : pp = 130.88967895507812 1290 / 1452 : pp = 130.9040985107422 1300 / 1452 : pp = 130.93511962890625 1310 / 1452 : pp = 130.9759063720703 1320 / 1452 : pp = 130.92800903320312 1330 / 1452 : pp = 130.9105224609375 1340 / 1452 : pp = 130.929443359375 1350 / 1452 : pp = 130.96153259277344 1360 / 1452 : pp = 131.02381896972656 1370 / 1452 : pp = 130.9545440673828 1380 / 1452 : pp = 130.9344940185547 1390 / 1452 : pp = 130.9055938720703 1400 / 1452 : pp = 130.85386657714844 1410 / 1452 : pp = 130.8874969482422 1420 / 1452 : pp = 130.85928344726562 1430 / 1452 : pp = 130.83995056152344 1440 / 1452 : pp = 130.86659240722656 1450 / 1452 : pp = 130.86839294433594 0 / 115 : pp = 227.78428649902344 10 / 115 : pp = 207.609619140625 20 / 115 : pp = 209.92459106445312 30 / 115 : pp = 206.96240234375 40 / 115 : pp = 205.9295654296875 50 / 115 : pp = 201.0296630859375 60 / 115 : pp = 200.38059997558594 70 / 115 : pp = 196.55764770507812 80 / 115 : pp = 194.31735229492188 90 / 115 : pp = 191.66146850585938 100 / 115 : pp = 186.70437622070312 110 / 115 : pp = 184.3171844482422 Training perplexity: 130.85043334960938 Validation perplexity:183.88186645507812 Total time : 45.345656394958496 Epoch 14 0 / 1452 : pp = 164.82191467285156 10 / 1452 : pp = 146.39089965820312 20 / 1452 : pp = 142.93240356445312 30 / 1452 : pp = 140.3113555908203 40 / 1452 : pp = 142.39939880371094 50 / 1452 : pp = 139.70162963867188 60 / 1452 : pp = 138.73023986816406 70 / 1452 : pp = 139.2675018310547 80 / 1452 : pp = 138.47824096679688 90 / 1452 : pp = 137.40432739257812 100 / 1452 : pp = 136.47793579101562 110 / 1452 : pp = 135.2294464111328 120 / 1452 : pp = 134.80728149414062 130 / 1452 : pp = 133.89822387695312 140 / 1452 : pp = 132.54141235351562 150 / 1452 : pp = 132.10025024414062 160 / 1452 : pp = 132.21829223632812 170 / 1452 : pp = 131.8765106201172 180 / 1452 : pp = 131.37515258789062 190 / 1452 : pp = 131.31622314453125 200 / 1452 : pp = 131.78297424316406 210 / 1452 : pp = 131.5507354736328 220 / 1452 : pp = 131.7002410888672 230 / 1452 : pp = 131.9277801513672 240 / 1452 : pp = 131.72166442871094 250 / 1452 : pp = 131.225830078125 260 / 1452 : pp = 130.7496337890625 270 / 1452 : pp = 129.9896697998047 280 / 1452 : pp = 130.10594177246094 290 / 1452 : pp = 130.41644287109375 300 / 1452 : pp = 130.5982208251953 310 / 1452 : pp = 130.36329650878906 320 / 1452 : pp = 130.5633544921875 330 / 1452 : pp = 130.77252197265625 340 / 1452 : pp = 130.273193359375 350 / 1452 : pp = 130.47889709472656 360 / 1452 : pp = 130.4348602294922 370 / 1452 : pp = 130.28126525878906 380 / 1452 : pp = 130.02786254882812 390 / 1452 : pp = 130.1564483642578 400 / 1452 : pp = 129.98440551757812 410 / 1452 : pp = 130.37721252441406 420 / 1452 : pp = 130.71859741210938 430 / 1452 : pp = 130.65939331054688 440 / 1452 : pp = 130.72987365722656 450 / 1452 : pp = 130.56272888183594 460 / 1452 : pp = 130.28195190429688 470 / 1452 : pp = 129.90936279296875 480 / 1452 : pp = 129.42857360839844 490 / 1452 : pp = 129.18077087402344 500 / 1452 : pp = 128.7588348388672 510 / 1452 : pp = 128.6303253173828 520 / 1452 : pp = 128.47616577148438 530 / 1452 : pp = 128.21148681640625 540 / 1452 : pp = 127.7218017578125 550 / 1452 : pp = 127.50067138671875 560 / 1452 : pp = 127.27574157714844 570 / 1452 : pp = 127.05399322509766 580 / 1452 : pp = 126.73983001708984 590 / 1452 : pp = 126.43692779541016 600 / 1452 : pp = 126.06050109863281 610 / 1452 : pp = 125.82952880859375 620 / 1452 : pp = 125.66295623779297 630 / 1452 : pp = 125.39354705810547 640 / 1452 : pp = 125.49463653564453 650 / 1452 : pp = 125.48816680908203 660 / 1452 : pp = 125.58712005615234 670 / 1452 : pp = 125.65978240966797 680 / 1452 : pp = 125.71456146240234 690 / 1452 : pp = 125.66937255859375 700 / 1452 : pp = 125.65900421142578 710 / 1452 : pp = 125.7271499633789 720 / 1452 : pp = 125.77758026123047 730 / 1452 : pp = 125.74129486083984 740 / 1452 : pp = 125.8759765625 750 / 1452 : pp = 125.91793823242188 760 / 1452 : pp = 125.99595642089844 770 / 1452 : pp = 126.18113708496094 780 / 1452 : pp = 126.35147094726562 790 / 1452 : pp = 126.50797271728516 800 / 1452 : pp = 126.49759674072266 810 / 1452 : pp = 126.48113250732422 820 / 1452 : pp = 126.52528381347656 830 / 1452 : pp = 126.705810546875 840 / 1452 : pp = 126.67517852783203 850 / 1452 : pp = 126.74176025390625 860 / 1452 : pp = 126.74151611328125 870 / 1452 : pp = 126.73414611816406 880 / 1452 : pp = 126.83026885986328 890 / 1452 : pp = 126.88519287109375 900 / 1452 : pp = 126.88053894042969 910 / 1452 : pp = 126.97138214111328 920 / 1452 : pp = 127.04660034179688 930 / 1452 : pp = 127.03763580322266 940 / 1452 : pp = 127.1126480102539 950 / 1452 : pp = 127.09610748291016 960 / 1452 : pp = 127.0873794555664 970 / 1452 : pp = 127.10343933105469 980 / 1452 : pp = 126.96441650390625 990 / 1452 : pp = 126.88519287109375 1000 / 1452 : pp = 126.7336654663086 1010 / 1452 : pp = 126.77796936035156 1020 / 1452 : pp = 126.89826202392578 1030 / 1452 : pp = 126.88761138916016 1040 / 1452 : pp = 126.95309448242188 1050 / 1452 : pp = 126.96478271484375 1060 / 1452 : pp = 126.89324188232422 1070 / 1452 : pp = 127.03242492675781 1080 / 1452 : pp = 127.13228607177734 1090 / 1452 : pp = 127.173095703125 1100 / 1452 : pp = 127.15975189208984 1110 / 1452 : pp = 127.0392074584961 1120 / 1452 : pp = 126.94032287597656 1130 / 1452 : pp = 126.80693054199219 1140 / 1452 : pp = 126.81315612792969 1150 / 1452 : pp = 126.90467834472656 1160 / 1452 : pp = 126.91236114501953 1170 / 1452 : pp = 126.90897369384766 1180 / 1452 : pp = 126.98052215576172 1190 / 1452 : pp = 127.07483673095703 1200 / 1452 : pp = 127.10216522216797 1210 / 1452 : pp = 127.08258819580078 1220 / 1452 : pp = 127.22943878173828 1230 / 1452 : pp = 127.38563537597656 1240 / 1452 : pp = 127.40538024902344 1250 / 1452 : pp = 127.53369140625 1260 / 1452 : pp = 127.59293365478516 1270 / 1452 : pp = 127.61489868164062 1280 / 1452 : pp = 127.6484375 1290 / 1452 : pp = 127.65257263183594 1300 / 1452 : pp = 127.69329833984375 1310 / 1452 : pp = 127.74549102783203 1320 / 1452 : pp = 127.7043228149414 1330 / 1452 : pp = 127.6866683959961 1340 / 1452 : pp = 127.70913696289062 1350 / 1452 : pp = 127.73233795166016 1360 / 1452 : pp = 127.7855224609375 1370 / 1452 : pp = 127.71918487548828 1380 / 1452 : pp = 127.69987487792969 1390 / 1452 : pp = 127.6697998046875 1400 / 1452 : pp = 127.61137390136719 1410 / 1452 : pp = 127.6404037475586 1420 / 1452 : pp = 127.61094665527344 1430 / 1452 : pp = 127.58216857910156 1440 / 1452 : pp = 127.61477661132812 1450 / 1452 : pp = 127.61964416503906 0 / 115 : pp = 228.21578979492188 10 / 115 : pp = 208.11244201660156 20 / 115 : pp = 210.688232421875 30 / 115 : pp = 207.62408447265625 40 / 115 : pp = 206.45184326171875 50 / 115 : pp = 201.52760314941406 60 / 115 : pp = 200.7784881591797 70 / 115 : pp = 196.83067321777344 80 / 115 : pp = 194.6357879638672 90 / 115 : pp = 191.9783935546875 100 / 115 : pp = 186.8787841796875 110 / 115 : pp = 184.35252380371094 Training perplexity: 127.60413360595703 Validation perplexity:183.8877410888672 Total time : 41.6636528968811 Epoch 15 0 / 1452 : pp = 156.81654357910156 10 / 1452 : pp = 142.1070556640625 20 / 1452 : pp = 139.55076599121094 30 / 1452 : pp = 136.63551330566406 40 / 1452 : pp = 138.5840606689453 50 / 1452 : pp = 136.052734375 60 / 1452 : pp = 134.93019104003906 70 / 1452 : pp = 135.65206909179688 80 / 1452 : pp = 135.2620086669922 90 / 1452 : pp = 134.314697265625 100 / 1452 : pp = 133.4916229248047 110 / 1452 : pp = 132.26052856445312 120 / 1452 : pp = 131.7714080810547 130 / 1452 : pp = 130.77365112304688 140 / 1452 : pp = 129.5411834716797 150 / 1452 : pp = 129.0791778564453 160 / 1452 : pp = 129.21920776367188 170 / 1452 : pp = 128.7528839111328 180 / 1452 : pp = 128.22279357910156 190 / 1452 : pp = 128.18177795410156 200 / 1452 : pp = 128.58758544921875 210 / 1452 : pp = 128.3906707763672 220 / 1452 : pp = 128.5266571044922 230 / 1452 : pp = 128.80563354492188 240 / 1452 : pp = 128.61886596679688 250 / 1452 : pp = 128.13172912597656 260 / 1452 : pp = 127.69220733642578 270 / 1452 : pp = 126.96150970458984 280 / 1452 : pp = 127.04702758789062 290 / 1452 : pp = 127.33565521240234 300 / 1452 : pp = 127.55929565429688 310 / 1452 : pp = 127.38514709472656 320 / 1452 : pp = 127.52171325683594 330 / 1452 : pp = 127.68690490722656 340 / 1452 : pp = 127.18340301513672 350 / 1452 : pp = 127.4073257446289 360 / 1452 : pp = 127.30432891845703 370 / 1452 : pp = 127.17618560791016 380 / 1452 : pp = 126.92579650878906 390 / 1452 : pp = 127.02473449707031 400 / 1452 : pp = 126.8515625 410 / 1452 : pp = 127.211669921875 420 / 1452 : pp = 127.51788330078125 430 / 1452 : pp = 127.47386169433594 440 / 1452 : pp = 127.57164001464844 450 / 1452 : pp = 127.3601303100586 460 / 1452 : pp = 127.09434509277344 470 / 1452 : pp = 126.71922302246094 480 / 1452 : pp = 126.24349212646484 490 / 1452 : pp = 125.98778533935547 500 / 1452 : pp = 125.59526824951172 510 / 1452 : pp = 125.4450912475586 520 / 1452 : pp = 125.29247283935547 530 / 1452 : pp = 125.03536224365234 540 / 1452 : pp = 124.5813980102539 550 / 1452 : pp = 124.33724212646484 560 / 1452 : pp = 124.08995819091797 570 / 1452 : pp = 123.86637878417969 580 / 1452 : pp = 123.53152465820312 590 / 1452 : pp = 123.20321655273438 600 / 1452 : pp = 122.85673522949219 610 / 1452 : pp = 122.64250946044922 620 / 1452 : pp = 122.4958724975586 630 / 1452 : pp = 122.22386169433594 640 / 1452 : pp = 122.31143188476562 650 / 1452 : pp = 122.30093383789062 660 / 1452 : pp = 122.39427947998047 670 / 1452 : pp = 122.45440673828125 680 / 1452 : pp = 122.51146697998047 690 / 1452 : pp = 122.4854736328125 700 / 1452 : pp = 122.48600006103516 710 / 1452 : pp = 122.56084442138672 720 / 1452 : pp = 122.59059143066406 730 / 1452 : pp = 122.55529022216797 740 / 1452 : pp = 122.69409942626953 750 / 1452 : pp = 122.76456451416016 760 / 1452 : pp = 122.84437561035156 770 / 1452 : pp = 123.02527618408203 780 / 1452 : pp = 123.20509338378906 790 / 1452 : pp = 123.36305236816406 800 / 1452 : pp = 123.36852264404297 810 / 1452 : pp = 123.36799621582031 820 / 1452 : pp = 123.39976501464844 830 / 1452 : pp = 123.59362030029297 840 / 1452 : pp = 123.56946563720703 850 / 1452 : pp = 123.63800811767578 860 / 1452 : pp = 123.63983917236328 870 / 1452 : pp = 123.64148712158203 880 / 1452 : pp = 123.7568588256836 890 / 1452 : pp = 123.7885513305664 900 / 1452 : pp = 123.79640197753906 910 / 1452 : pp = 123.86153411865234 920 / 1452 : pp = 123.92941284179688 930 / 1452 : pp = 123.9125747680664 940 / 1452 : pp = 123.95559692382812 950 / 1452 : pp = 123.93928527832031 960 / 1452 : pp = 123.94294738769531 970 / 1452 : pp = 123.95547485351562 980 / 1452 : pp = 123.8229751586914 990 / 1452 : pp = 123.73727416992188 1000 / 1452 : pp = 123.59091186523438 1010 / 1452 : pp = 123.634765625 1020 / 1452 : pp = 123.76506042480469 1030 / 1452 : pp = 123.75485229492188 1040 / 1452 : pp = 123.807861328125 1050 / 1452 : pp = 123.79156494140625 1060 / 1452 : pp = 123.73054504394531 1070 / 1452 : pp = 123.8615951538086 1080 / 1452 : pp = 123.96564483642578 1090 / 1452 : pp = 124.02104187011719 1100 / 1452 : pp = 124.012939453125 1110 / 1452 : pp = 123.87582397460938 1120 / 1452 : pp = 123.775390625 1130 / 1452 : pp = 123.63182067871094 1140 / 1452 : pp = 123.62391662597656 1150 / 1452 : pp = 123.71013641357422 1160 / 1452 : pp = 123.72423553466797 1170 / 1452 : pp = 123.71726989746094 1180 / 1452 : pp = 123.79032897949219 1190 / 1452 : pp = 123.87883758544922 1200 / 1452 : pp = 123.9125747680664 1210 / 1452 : pp = 123.90140533447266 1220 / 1452 : pp = 124.03245544433594 1230 / 1452 : pp = 124.19799041748047 1240 / 1452 : pp = 124.21469116210938 1250 / 1452 : pp = 124.34103393554688 1260 / 1452 : pp = 124.4041976928711 1270 / 1452 : pp = 124.42852020263672 1280 / 1452 : pp = 124.46656036376953 1290 / 1452 : pp = 124.4811019897461 1300 / 1452 : pp = 124.52384185791016 1310 / 1452 : pp = 124.57533264160156 1320 / 1452 : pp = 124.5398178100586 1330 / 1452 : pp = 124.52598571777344 1340 / 1452 : pp = 124.53311157226562 1350 / 1452 : pp = 124.57759094238281 1360 / 1452 : pp = 124.63385772705078 1370 / 1452 : pp = 124.58133697509766 1380 / 1452 : pp = 124.55769348144531 1390 / 1452 : pp = 124.54011535644531 1400 / 1452 : pp = 124.4884033203125 1410 / 1452 : pp = 124.51226806640625 1420 / 1452 : pp = 124.49683380126953 1430 / 1452 : pp = 124.4754638671875 1440 / 1452 : pp = 124.50164031982422 1450 / 1452 : pp = 124.50894165039062 0 / 115 : pp = 230.8488006591797 10 / 115 : pp = 209.2509002685547 20 / 115 : pp = 211.68577575683594 30 / 115 : pp = 208.44056701660156 40 / 115 : pp = 207.2039337158203 50 / 115 : pp = 202.1859588623047 60 / 115 : pp = 201.34739685058594 70 / 115 : pp = 197.4251251220703 80 / 115 : pp = 195.2623291015625 90 / 115 : pp = 192.592529296875 100 / 115 : pp = 187.39553833007812 110 / 115 : pp = 184.791259765625 Training perplexity: 124.4933853149414 Validation perplexity:184.32510375976562 Total time : 40.856229066848755 0 / 128 : pp = 184.6475067138672 10 / 128 : pp = 176.8856964111328 20 / 128 : pp = 164.3444366455078 30 / 128 : pp = 167.85472106933594 40 / 128 : pp = 169.25367736816406 50 / 128 : pp = 168.86561584472656 60 / 128 : pp = 168.11801147460938 70 / 128 : pp = 165.4105224609375 80 / 128 : pp = 162.91146850585938 90 / 128 : pp = 161.29742431640625 100 / 128 : pp = 162.45989990234375 110 / 128 : pp = 162.6834716796875 120 / 128 : pp = 164.3359832763672 =-==-==-==-==-= Test perplexity: 164.0149383544922 =-==-==-==-==-=
更详细的内容请参考下面链接
https://github.com/weizhenzhao/cs224d_nlp_problem_set2