NNI神经网络模型压缩教程
1. NNI简介
NNI是微软发布的一款自动机器学习(AutoML)开源项目,对机器学习生命周期的各个环节做了更加全面的支持,包括特征工程、神经网络架构搜索(NAS)、超参调优和模型压缩在内的步骤,你都能使用自动机器学习算法来完成。
微软自动深度学习工具 NNI 具备以下优势:
-
支持多种框架:提供基于 Python 的 SDK,支持
PyTorch、TensorFlow、scikit-learn、LightGBM 等主流框架和库; -
支持多种训练平台:除在本机直接运行外,还能通过 SSH 调度一组 GPU 服务器,或通过
FrameworkController、KubeFlow、OpenPAI 等在 Kubernetes 下调度大规模集群; -
支持机器学习生命周期中的多环节:特征工程、神经网络架构搜索(NAS)、超参调优和模型压缩等;
-
提供易用的命令行工具和友好的 WEB 用户界面;
-
大量的示例能帮助你很快上手;
-
最后划重点,NNI的所有文档都有中文版!
完整中文文档请参考:https://aka.ms/nnizh
2. 通道剪枝
NNI可实现多种剪枝算法的自动剪枝,训练。比如:SlimPruner,L1FilterPruner,L2FilterPruner,FPGMPruner,LevelPruner,AGP_Pruner等多种剪枝算法。
2.1 剪枝流程
以level为例
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
#default,需要修改的层
pruner = LevelPruner(model, config_list)
pruner.compress()
pruner.export_model(
os.path.join(args.experiment_data_dir, 'model_masked.pth'),os.path.join(args.experiment_data_dir, 'mask.pth'))#模型保存与模型掩码保存
m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
m_speedup.speedup_model()#一定要进行speedup才能加速
evaluation_result = evaluator(model)# 评估模型
torch.save(model.state_dict(), os.path.join(args.experiment_data_dir, 'model_speed_up.pth'))
最终保存模型
其中speedup最终为重要,这样才能加速。
但speedup过程真的时间很长,要进行掩码和model参数的计算。
2.2 NNI剪枝方法总结
2.2.1 Level Pruner
最简单的基本的一次性 Pruner:可设置目标稀疏度(以分数表示,0.6 表示会剪除 60%)。首先按照绝对值对指定层的权重排序。 然后按照所需的稀疏度,将值最小的权重屏蔽为 0。
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
#default,需要修改的层
pruner = LevelPruner(model, config_list)
pruner.compress()
2.2.2 Slim Pruner
One-Shot Pruner,它在训练过程中对 batch normalization(BN)层的比例因子进行稀疏正则化,以识别不重要的通道。 比例因子值较小的通道将被修剪。
from nni.algorithms.compression.pytorch.pruning import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()
2.2.3 FPGM Pruner
One-Shot Pruner,用最小的几何中值修剪卷积滤波器。 FPGM 选择最可替换的滤波器。
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d'] #修改所有的卷积层
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()
2.2.4 L1Filter Pruner
One-Shot Pruner,它修剪 卷积层 中的滤波器。L1正则化剪枝。
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()
2.2.5 L2Filter Pruner
这是一种结构化剪枝算法,用于修剪权重的最小 L2 规范卷积滤波器,算是一种一次性修剪器。
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
2.2.6 AGP Pruner
一种自动逐步剪枝算法,在 n 个剪枝步骤中,稀疏度从初始的稀疏度(通常为 0)增加到最终的稀疏度。
from nni.algorithms.compression.pytorch.pruning import AGPPruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
# 读取预训练的模型,或在使用 Pruner 前进行训练。
# model = MyModel()
# model.load_state_dict(torch.load('mycheckpoint.pth'))
# AGP Pruner 会在 optimizer. step() 上回调,在微调模型时剪枝,
# 因此,必须要有 optimizer 才能完成模型剪枝。
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = AGPPruner(model, config_list, optimizer, pruning_algorithm='level')
pruner.compress()
AGP剪枝在我使用过程中经常报错,比如我遇到过的报错有
‘0不能转换为float型’,于是我把参数改成了:
config_list = [{
'initial_sparsity': 0.01,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
我是这样解决的,但确实很玄学。。。
同时AGP Pruner 默认使用 LevelPruner 算法来修建权重,还可以设置 pruning_algorithm 参数来使用其它剪枝算法:
level: LevelPruner
slim: SlimPruner
l1: L1FilterPruner
l2: L2FilterPruner
fpgm: FPGMPruner
taylorfo: TaylorFOWeightFilterPruner
apoz: ActivationAPoZRankFilterPruner
mean_activation: ActivationMeanRankFilterPruner
2.2.7 NetAdapt Pruner
NetAdapt 在算力足够的情况下,自动简化预训练的网络。 给定整体稀疏度,NetAdapt 可通过迭代剪枝自动为不同层生成不同的稀疏分布。
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
pruner.compress()
大坑出现了,在GitHub官方文档里并没有写short_term_fine_tuner和evaluator这两个参数是啥,base_algo是正则化,experiment_data_dir则是模型保存的位置。我翻了很久的sample找到了short_term_fine_tuner和evaluator这两个参数相应定义:
def evaluator(model):
return test(model, device, criterion, val_loader)
def test(model, device, criterion, val_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in val_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += criterion(output, target).item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(val_loader.dataset)
accuracy = correct / len(val_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
test_loss, correct, len(val_loader.dataset), 100. * accuracy))
return
仔细一看其实就是简单的返回一个accuracy,并且输入仅有model
def short_term_fine_tuner(model, epochs=1):
for epoch in range(epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch)
short_term_fine_tuner是一个epoch的参数微调,也就是训练一次。
2.2.8 SimulatedAnnealing Pruner
模拟退火剪枝,此 Pruner 基于先验经验,实现了引导式的启发搜索方法,模拟退火(SA)算法。 增强的模拟退火算法基于以下理论:具有更多权重的深度神经网络层通常具有较高的可压缩度,对整体精度的影响更小。
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
pruner.compress()
evaluator如上
2.2.9 AutoCompress Pruner
每一轮中,AutoCompressPruner 会用相同的稀疏度对模型进行剪枝,从而达到总体的稀疏度。
AutoCompress是基于模拟退火的算法。
from nni.algorithms.compression.pytorch.pruning import AutoCompressPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
pruner.compress()
2.2.10AMC Pruner
AMC Pruner 利用强化学习来提供模型压缩策略。 这种基于学习的压缩策略比传统的基于规则的压缩策略有更高的压缩比, 更好地保存了精度,节省了人力。
from nni.algorithms.compression.pytorch.pruning import AMCPruner
config_list = [{
'op_types': ['Conv2d', 'Linear']
}]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
pruner.compress()
val_loader是使用pytorch的dataloader进行数据读入的验证集。
2.3 测试代码
import nni
import torch
import torch.nn as nn
from nni.compression.pytorch import ModelSpeedup
from model_input import Model_input
import numpy as np
import data_loading
from prefetcher import data_prefetcher
from torch.utils.data import DataLoader
from PIL import Image
from torchvision import transforms
if __name__ == '__main__':
model_name = 'shufflenet'
pruning_class = 'SimulatedAnnealing'
model = Model_input(model_name).model_final
if model_name != 'shufflenet_pruned':
dict_save_path = 'model/' + model_name + '_20210311.pkl'
model.load_state_dict(torch.load(dict_save_path, map_location="cuda:0"))
else :
dict_save_path = 'model/' + model_name + '_' +pruning_class + ".pth"
model.load_state_dict(torch.load(dict_save_path, map_location="cuda:0"))
model_final = model
def evaluator(model):
loss_fn = nn.CrossEntropyLoss()
test_dataset = data_loading.Dataset_loading('D:/Dataset_all/weld_Dataset_unlabel/al5083/test/test.json')
dataloader = DataLoader(test_dataset, shuffle=True, batch_size=32, num_workers=1, pin_memory=True)
model.cuda()
model.eval()
all_data_num = 0
correct_data_num = 0
loss_all = []
corr_num_all = []
loss = 0
losses = 0
prefetcher = data_prefetcher(dataloader)
with torch.no_grad():
images, labels = prefetcher.next()
steps = 0
while images is not None:
steps += 1
images, labels = images.cuda(), labels.cuda()
output = model(images)
loss = loss_fn(output, labels)
losses += loss
loss_all.append(loss)
pred_labels = output.argmax(dim=1)
all_data_num += labels.size(0)
correct_data_num += (pred_labels == labels).sum().item()
corr_num_all.append(correct_data_num)
images, labels = prefetcher.next()
acc = (correct_data_num / all_data_num)
# print('评估结果:test_loss:', np.array(losses.cpu()), 'test_acc:{:.2f}'.format(acc), '%')
# loss_record_path = 'train_record/'+str(time.ctime())+'-'+str(epoch)+'-'+'loss'+'.txt'
# acc_record_path = 'train_record/' + str(time.ctime()) + '-' + str(epoch) + '-'+'acc'+'.txt'
return acc
def pruner_load(pruning_class,model):
global short_term_fine_tuner,evaluator,trainer,dummy_input,val_loader,fine_tuner
if pruning_class == 'level':
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
elif pruning_class == 'slim':
from nni.algorithms.compression.pytorch.pruning import SlimPruner
config_list = [{'sparsity': 0.8, 'op_types': ['BatchNorm2d']}]
pruner = SlimPruner(model, config_list)
elif pruning_class == 'FPGM':
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']}]
pruner = FPGMPruner(model, config_list)
elif pruning_class == 'L1':
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{'sparsity': 0.8, 'op_types': ['Conv2d']}]
pruner = L1FilterPruner(model, config_list)
elif pruning_class == 'L2':
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{'sparsity': 0.8, 'op_types': ['Conv2d']}]
pruner = L2FilterPruner(model, config_list)
elif pruning_class == 'AGP':
from nni.algorithms.compression.pytorch.pruning import AGPPruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = AGPPruner(model, config_list, optimizer, pruning_algorithm='level')
elif pruning_class == 'NetAdapt':
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,
base_algo='l1', experiment_data_dir='./')
elif pruning_class == 'SimulatedAnnealing':
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9,
experiment_data_dir='./')
elif pruning_class == 'AutoCompress':
from nni.algorithms.compression.pytorch.pruning import AutoCompressPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
elif pruning_class == 'AMC':
from nni.algorithms.compression.pytorch.pruning import AMCPruner
config_list = [{
'op_types': ['Conv2d', 'Linear']
}]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
elif pruning_class == 'ADMM':
from nni.algorithms.compression.pytorch.pruning import ADMMPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_types': ['Conv2d'],
'op_names': ['conv2']
}]
pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=30, epochs=5)
elif pruning_class == 'Sensitivity':
from nni.algorithms.compression.pytorch.pruning import SensitivityPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SensitivityPruner(model, config_list, finetuner=fine_tuner, evaluator=evaluator)
# eval_args and finetune_args 分别是传给 evaluator 和 finetuner 的参数
#pruner.compress(eval_args=[model], finetune_args=[model])
else:
raise ValueError(
"Pruner not supported.")
return pruner
#pruner = pruner_load(pruning_class,model_final)
#model_process = pruner.compress()
model_process = model_final
eval_result = evaluator(model_process)
print(eval_result)
masks_file = 'model/' + model_name + pruning_class + "_mask.pth"
model_file = 'model/' + model_name + '_' + pruning_class + ".pth"
#pruner.export_model(model_path=model_final, mask_path=masks_file)
dummy_input = torch.randn([1, 1, 224, 224]).to('cuda')
m_speedup = ModelSpeedup(model_process, dummy_input, masks_file, 'cuda')
m_speedup.speedup_model()
eval_result = evaluator(model_process)
print(eval_result)
torch.save(model_process.state_dict(),'model/' + model_name + '_' +pruning_class + "_pruned.pth")
官方文档:NNI 支持的剪枝算法