书生开源大模型训练营-第5讲-笔记

公告

View Post

1、大模型部署的背景

a、大模型部署的挑战

对设备的要求：存储大
推理：生成token的速度、动态shape、内存管理
服务：高并发下的吞吐量、单用户的响应时间

b、技术方案：

模型并行、量化
Transfomer优化、推理优化

2、LMDeploy介绍

a、是在N卡上的全流程方案，包括轻量化、推理和服务，当前还没有cover移动端。核心是推理引擎turbomind

b、为什么要量化？

降低显存要求：权重、KV Cache
提升推理速度
推理阶段，Decoder Only模型的瓶颈是内存访问，A100上实测batch_size=128时瓶颈由内存访问过渡到计算

c、TurboMind推理引擎

Continous Batch Inferrence
有状态的推理
Blocked K/V cache
高性能cuda kernel优化

d、推理服务api server

3、动手

3.1、环境准备

查看显卡资源使用情况:

实时查看GPU的使用情况：

创建虚拟环境：

/root/share/install_conda_env_internlm_base.sh lmdeploy

查看虚拟环境：

激活虚拟环境lmdeploy

Python检查一下 PyTorch:

安装lmdeploy最新稳定版：

验证lmdeploy已安装好

3.2、服务部署

a、模型转化为 TurboMind 的格式：

在线转换：

离线转换：

(lmdeploy) root@intern-studio-069640:~/ch5# lmdeploy convert internlm-chat-7b /root/share/model_repos/internlm-chat-7b
create workspace in directory ./workspace
copy triton model templates from "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/turbomind/triton_models" to "./workspace/triton_models"
copy service_docker_up.sh from "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/turbomind/service_docker_up.sh" to "./workspace"
model_name             internlm-chat-7b
model_format           None
inferred_model_format  hf
model_path             /root/share/model_repos/internlm-chat-7b
tokenizer_path         /root/share/model_repos/internlm-chat-7b/tokenizer.model
output_format          fp16
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
*** splitting layers.0.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.0.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.0.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.0.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.0.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.0.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.0.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.1.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.1.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.1.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.1.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.1.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.1.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.1.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.2.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.2.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.2.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.2.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.2.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.2.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.2.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.3.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.3.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.3.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.3.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.3.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.3.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.3.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.4.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.4.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.4.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.4.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.4.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.4.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.4.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.5.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.5.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.5.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.5.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.5.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.5.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.5.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.6.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.6.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.6.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.6.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.6.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.6.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.6.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.7.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.7.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.7.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.7.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.7.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.7.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.7.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.8.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.8.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.8.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.8.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.8.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.8.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.8.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.9.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.9.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.9.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.9.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.9.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.9.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.9.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.10.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.10.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.10.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.10.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.10.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.10.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.10.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.11.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.11.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.11.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.11.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.11.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.11.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.11.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.12.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.12.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.12.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.12.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.12.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.12.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.12.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.13.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.13.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.13.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.13.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.13.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.13.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.13.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.14.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.14.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.14.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.14.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.14.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.14.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.14.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.15.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.15.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.15.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.15.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.15.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.15.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.15.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.16.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.16.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.16.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.16.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.16.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.16.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.16.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.17.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.17.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.17.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.17.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.17.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.17.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.17.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.18.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.18.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.18.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.18.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.18.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.18.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.18.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.19.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.19.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.19.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.19.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.19.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.19.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.19.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.20.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.20.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.20.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.20.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.20.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.20.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.20.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.21.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.21.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.21.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.21.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.21.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.21.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.21.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.22.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.22.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.22.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.22.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.22.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.22.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.22.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.23.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.23.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.23.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.23.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.23.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.23.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.23.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.24.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.24.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.24.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.24.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.24.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.24.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.24.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.25.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.25.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.25.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.25.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.25.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.25.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.25.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.26.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.26.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.26.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.26.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.26.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.26.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.26.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.27.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.27.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.27.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.27.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.27.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.27.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.27.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.28.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.28.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.28.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.28.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.28.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.28.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.28.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.29.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.29.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.29.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.29.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.29.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.29.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.29.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.30.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.30.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.30.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.30.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.30.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.30.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.30.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.31.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.31.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.31.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.31.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.31.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.31.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.31.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
Convert to turbomind format: 100%|███████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:40<00:00,  1.28s/it]
(lmdeploy) root@intern-studio-069640:~/ch5#