调参tips
对于一个模型,都可以从以下几个方面进行调参:
1. 对weight和bias进行初始化(效果很好,一般都可以提升1-2%)
Point 1 (CNN):
1 for conv in self.convs1: 2 init.xavier_normal(conv.weight, gain=np.sqrt(2.0)) # 对weight进行正态分布初始化 3 # init.normal(conv.weight, mean=0, std=0.1) 4 # init.constant(conv.bias, 0.1) # 对bias初始化为0.1
Point 2 (LSTM):
(1)Bias vectors are initialized to zero, except the bias b f for the forget gate in LSTM , which is initialized to 1.0 .(参见论文End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF)。weight 使用高斯分布或是均匀分布都可以。详细讲解参考博文Deep Learning 之 参数初始化
(2)简单的设置就是,weight设为0.1,bias设为0。
1 init.xavier_normal(self.lstm.all_weights[0][0], gain=np.sqrt(2.0)) 2 self.lstm.all_weights[0][3].data[20:40].fill_(1) # forget gate 3 self.lstm.all_weights[0][3].data[0:20].fill_(0) 4 self.lstm.all_weights[0][3].data[40:80].fill_(0)
注:对于封装好的lstm,其提供了all_weights接口统一对其参数进行初始化,不能单个定义,forget gate对应的下标是20-39。若是使用lstmcell则可以对单个想要修改的参数进行修改。
2. clip gradients让权重的梯度更新限制在一定范围内,防止单个节点出现梯度爆炸、梯度消失。
1 optimizer.zero_grad() 2 logit = model(feature) 3 loss = F.cross_entropy(logit, target) 4 loss.backward() 5 # clip gradients 6 utils.clip_grad_norm(model.parameters(), 5) 7 optimizer.step()
3. L2 regularization
L2值也叫惩罚值,是为了防止过拟合问题。提供了接口可直接设值,一般设为1e-8。
1 optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=0.01)
4. batch normalization批标准化若设置正确,据说会大大加大迭代速度,效果明显。
若是BatchNorm2d(x),input是(batchsize,channel,height,width),x值对应channel,即维度1。所以channel=0时,求一次mean,var,做一次normalize;channel=1时,求一次.......channel=x时,求一次。BatchNorm1d时情况也是一样的,x对应的是维度1的值,若是不对应,则需要进行转置,如下示例。
1 m = nn.BatchNorm1d(2) 2 input = torch.randn(2, 10) 3 input = Variable(input) 4 input = Variable(torch.transpose(input.data, 0, 1)) 5 print(input) 6 output = m(input) 7 print(output)
Point 1 (CNN):
1 def __init__(self, args): 2 super(CNN, self).__init__() 3 self.bn = nn.BatchNorm2d(1) 4 5 def forward(self, x): 6 for conv in self.convs1: 7 xx = conv(x) # variable [torch.FloatTensor of size 16x200x35x1] 8 xx = Variable(torch.transpose(xx.data, 2, 3)) 9 xx = Variable(torch.transpose(xx.data, 1, 2)) 10 xx = self.bn(xx) 11 xx = F.relu(xx) 12 xx = xx.squeeze(1) 13 a.append(xx)
Point 2 (LSTM):
1 class BiLSTM(nn.Module): 2 def __init__(self, args): 3 super(BiLSTM, self).__init__() 4 self.bn1 = nn.BatchNorm1d(2*self.hidden_size) 5 6 def forward(self, sentence): 7 out = self.bn1(out) 8 out = F.tanh(out) 9 y = self.hidden2label(out)
结果:以上两种设置并没有提高准确率。
Point 3 (BN-LSTM):
参看论文RECURRENT BATCH NORMALIZATION,不使用pytorch框架,自己实现。