FAILED: cpu_adam.so 

c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so

/usr/bin/ld: cannot find -lcurand

collect2: error: ld returned 1 exit status

 

 

ninja 编译 adam.so 报错。

报错:

Emitting ninja build file /home/deeplp/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...

Building extension module cpu_adam...

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home//anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home//anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 

[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home//anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home//anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home//anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o 

[3/3] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so

FAILED: cpu_adam.so 

c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/deeplp/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so

/usr/bin/ld: cannot find -lcurand

collect2: error: ld returned 1 exit status

ninja: build stopped: subcommand failed.

Traceback (most recent call last):

 

 

解释:

1、curand 是一个cuda的包。cuda 有安装好的。不需要另外安装。

2、现在这个 cuda 安装有bug。 安装好后,lib 的地址在这里

/usr/local/cuda/targets/x86_64-linux/lib

。需要特别注意.我给他手工建立了一个软连接到 /usr/local/cuda/lib下。

3、cuda 的版本中。11.7版本curand没有动态链接库,只有静态链接库。重新安装了一个 12.3版本的cuda,拷贝过来才搞定。

这个其实是cuda 的坑。。。

posted on 2024-07-05 16:01  曾冠奇  阅读(49)  评论(0编辑  收藏  举报