大连人工智能计算平台——华为昇腾AI平台——高性能计算HPC的异构计算——CPU和GPU的混合计算模式
好消息,居然有经费了,账号可以接着用了,可以接着玩超算了。
==========================================================
在超算平台上安装pytorch:
执行:
export REQUESTS_CA_BUNDLE=
export CURL_CA_BUNDLE=
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
Job提交脚本:
submit_python.sh
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxxxx --priority 9999 --job_retry 10 --job_type hmpi -R "cpu=1;gpu=1;mem=128" -N 1 -eo error.txt -oo output.txt /home/share/xxxxxxxxxxxx/home/xxxxxxx/xxxxxxx/run_python.sh
MPI运行脚本:
run_python.sh
#!/bin/sh echo ----- print env vars ----- if [ "${CCS_ALLOC_FILE}" != "" ]; then echo " " ls -la ${CCS_ALLOC_FILE} echo ------ cat ${CCS_ALLOC_FILE} cat ${CCS_ALLOC_FILE} fi export HOSTFILE=/tmp/hostfile.$$ rm -rf $HOSTFILE touch $HOSTFILE # parse CCS_ALLOC_FILE ## node name, cores, tasks, task_list # hpcbuild002 8 1 container_22_default_00001_e01_000002 # hpctest005 8 1 container_22_default_00000_e01_000001 ntask=`cat ${CCS_ALLOC_FILE} | awk -v fff="$HOSTFILE" '{} { split($0, a, " ") if (length(a[1]) >0 && length(a[3]) >0) { print a[1]" slots="a[2] >> fff total_task+=a[3] } }END{print total_task}'` echo "openmpi hostfile $HOSTFILE generated:" echo "-----------------------" cat $HOSTFILE echo "-----------------------" echo "Total tasks is $ntask" echo "mpirun -hostfile $HOSTFILE -n $ntask <your application>" #start a simple mpi program #/usr/local/bin/mpirun -hostfile $HOSTFILE -n $ntask hostname /home/HPCBase/HMPI/hmpi/bin/mpirun -hostfile $HOSTFILE -np $ntask --mca plm_rsh_agent /opt/batch/agent/tools/dstart /home/share/xxxxxxxxx/home/xxxxxxxx/anaconda3/bin/python /home/share/xxxxxxxxx/home/xxxxxx/xxxxxxx/hello.py ret=$? rm -rf $HOSTFILE exit $ret
运行代码:
import mpi4py.MPI as MPI import sys import socket import numpy as np import torch def func1(queue, num): import time # time.sleep(num) # time.sleep(180) """ x = np.random.rand(100) for _ in range(2000000): x += np.random.rand(100) num += np.sum(x) """ x=torch.randn(10000000, device="cuda:0") for _ in range(200000): x+=torch.randn(10000000, device="cuda:0") # queue.put(num) queue.put(torch.sum(x).item()) def run_queue(): from multiprocessing import Process, Queue ps = 1 queue = Queue(maxsize=200) # the following attribute can call in anywhere process = [Process(target=func1, args=(queue, num)) for num in range(ps)] [p.start() for p in process] [p.join() for p in process] return [queue.get() for p in process] comm = MPI.COMM_WORLD comm_rank = comm.Get_rank() comm_size = comm.Get_size() node_name = MPI.Get_processor_name() # node_name = socket.gethostname() # point to point communication data_send = [comm_rank]*1 comm.send(data_send,dest=(comm_rank+1)%comm_size) res = run_queue() ### data_recv =comm.recv(source=(comm_rank-1)%comm_size) # print("my rank is %d, and Ireceived:" % comm_rank, data_recv, file=sys.stdout, flush=True) # print(data_recv) with open("/home/share/xxxxxxxx/home/xxxxxx/xxxxxx/results/{}.txt".format(comm_rank, ), "w") as f: f.write("my rank is %d/%d, and node_name: %s Ireceived:" % (comm_rank, comm_size, node_name) + str(data_recv) + str(res) + "\n" )
运行报错:
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
运行时的监控:
上面的Job运行失败,再次尝试运行:
提交超算系统运行:
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxxxxxx -eo error.txt -oo output.txt -R "gpu=1" /usr/bin/nvidia-smi
Thu Jul 6 12:11:58 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... On | 00000000:02:00.0 Off | 0 |
| N/A 29C P0 36W / 250W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
再次提交超算系统:
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxxxxxxxx -eo error.txt -oo output.txt -R "cpu=1" find / -name libcudnn*
运行结果:
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train_static_v8.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_static.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_static_v8.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer_static.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer_static_v8.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train_static.a
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so.8
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so.8.2.0
/home/HPCBase/PACKAGE/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train_static_v8.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_infer.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_static.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_static_v8.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer_static.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_infer_static_v8.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_cnn_train_static.a
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so.8
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_train.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_ops_train.so.8.2.0
/home/HPCBase/Application/GROMACS/cuda-11.3/targets/sbsa-linux/lib/libcudnn_adv_infer.so.8
看到这个信息我们可以知道在华为的超算平台上如果使用CUDA那么其版本应该是arm64-sbsa:
============================================================
再次测试:
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxx -eo error.txt -oo output.txt -R "cpu=1" ls /usr/local/cuda/bin
运行结果:
bin2c
compute-sanitizer
crt
cudafe++
cuda-gdb
cuda-gdbserver
cuda-install-samples-11.4.sh
cuda-uninstaller
cu++filt
cuobjdump
fatbinary
ncu
nsight-sys
nsys
nsys-exporter
nsys-ui
nvcc
nvcc.profile
nvdisasm
nvlink
nv-nsight-cu-cli
nvprune
ptxas
在环境变量中加入:(在提交Job的提交主机上的.bashrc加入内容)
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
使用Job提交检查运行主机slave上的环境变量:
得到PATH变量:
/home/HPCBase/HMPI/hmpi/bin:/home/HPCBase/HMPI/hmpi/bin:/usr/local/cuda-11.4/bin:/home/share/xxxxxxxxxxx/home/xxxxxxxx/anaconda3/bin:/home/share/xxxxxxxxxx/home/xxxxxxxx/anaconda3/condabin:/home/share/xxxxxxxxxxxxx/home/xxxxxxx/.local/bin:/home/share/xxxxxxxxxxx/home/xxxxxxx/bin:/opt/batch/cli/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
LD_LIBRARY_PATH变量:
/home/HPCBase/HMPI/hmpi/lib:/home/HPCBase/HMPI/hmpi/lib:/usr/local/cuda-11.4/lib64:
可以知道,PATH变量中
Job提交主机的原有变量:
/home/share/xxxxxxxxxxxxx/home/xxxxxxx/.local/bin:/home/share/xxxxxxxxxxx/home/xxxxxxx/bin:/opt/batch/cli/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
conda中设置的:
/home/share/xxxxxxxxxxx/home/xxxxxxxx/anaconda3/bin:/home/share/xxxxxxxxxx/home/xxxxxxxx/anaconda3/condabin:
.bashrc中设置的:
/usr/local/cuda-11.4/bin
超算运行主机加入的变量:
/home/HPCBase/HMPI/hmpi/bin:/home/HPCBase/HMPI/hmpi/bin
LD_LIBRARY_PATH变量中
.bashrc中设置的:
/usr/local/cuda-11.4/lib64:
超算运行主机加入的变量:
/home/HPCBase/HMPI/hmpi/lib:/home/HPCBase/HMPI/hmpi/lib:
=============================================================
根据对超算运行主机中环境变量的打印,可以猜测到之所以最开始的代码报错是因为运行主机中cuda的路径没有设置,上边把LD_LIBRARY_PATH路径重新配置后再在提交Job的主机上安装cupy用来测试是否配置成功,根据上面中对CUDA安装路径的测试可以猜测到运行主机上安装的CUDA版本为11.4,因此按照这个版本安装cupy,再次运行:
import mpi4py.MPI as MPI import sys import os import socket import numpy as np import cupy as cp comm = MPI.COMM_WORLD comm_rank = comm.Get_rank() comm_size = comm.Get_size() node_name = MPI.Get_processor_name() # node_name = socket.gethostname() # point to point communication data_send = [comm_rank]*1 comm.send(data_send,dest=(comm_rank+1)%comm_size) # res = run_queue() ### print(os.environ['PATH']) print(os.environ['LD_LIBRARY_PATH']) # print(os.environ['CUDA_HOME']) arr = cp.array([1, 2, 3, 4, 5]) arr += 10 print(arr) print(type(arr)) data_recv =comm.recv(source=(comm_rank-1)%comm_size) # print("my rank is %d, and Ireceived:" % comm_rank, data_recv, file=sys.stdout, flush=True) # print(data_recv) res = 111
运行结果:
证明配置成功,这也证明了在华为超算平台上基本上实现了异构计算配置的成功。
=============================================================
官方资料:
https://support.huawei.com/enterprise/zh/doc/EDOC1100228705/d690fe77
posted on 2023-07-05 18:47 Angry_Panda 阅读(188) 评论(0) 编辑 收藏 举报
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 单线程的Redis速度为什么快?
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 展开说说关于C#中ORM框架的用法!
2022-07-05 python编程中的circular import问题
2021-07-05 MindSpore 建立神经网络
2021-07-05 MindSpore 数据加载及处理
2021-07-05 MindSpore 初探, 使用LeNet训练minist数据集
2020-07-05 《Python数据可视化之matplotlib实践》 源码 第三篇 演练 第八章
2020-07-05 《Python数据可视化之matplotlib实践》 源码 第二篇 精进 第七章