FAILED: cpu_adam.so
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja 编译 adam.so 报错。
报错:
Emitting ninja build file /home/deeplp/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home//anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home//anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home//anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home//anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
[3/3] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
解释:
1、curand 是一个cuda的包。cuda 有安装好的。不需要另外安装。
2、现在这个 cuda 安装有bug。 安装好后,lib 的地址在这里
/usr/local/cuda/targets/x86_64-linux/lib
。需要特别注意.我给他手工建立了一个软连接到 /usr/local/cuda/lib下。
3、cuda 的版本中。11.7版本curand没有动态链接库,只有静态链接库。重新安装了一个 12.3版本的cuda,拷贝过来才搞定。
这个其实是cuda 的坑。。。