GpuArrayException: No cuda device available尝试解决

问题:

在import keras或import ttheano时出现了以下:

>>> import keras
Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available

尝试了pip uninstall theano并且使用conda install theano安装后,出现了更为奇怪的问题,搜索之后发现是由于theano1.0.4和numpy16.0出现不兼容等问题,所以进行了卸载。

重新使用pip install theano之后,进行操作,仍旧是同样的错误:

>>> import theano
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/ib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available

其他配置如下:

[global]
floatX = float32
device =cuda
[cuda]
root=/usr/local/cuda-8.0

##.theanorc文件
echo $PATH
/data_d/old_home/home/.conda/envs/bin:/usr/local/cuda-8.0/bin:/data_d/public/miniconda2/bin:/usr/local/cuda-9.0/bin:/usr/local/sbin:
/usr/local/bin:/usr/sbin:/usr/bin:/s:/usr/local/cuda-8.0/bin/local/games:/snap/bin:/usr/local/cuda-8.0/bin
CUDA_VISIBLE_DEVICES=1
CUDA_HOME=/usr/local/cuda-8.0
PATH="$PATH:/usr/local/cuda-8.0/bin"
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64"

#.bashrc文件
at /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21

所使用的theano版本为1.0.4,对应的pygpu为0.7.6。

是否是cuda-8.0文件夹的所有者被改变?不行。

跑测试程序也是同样的报错:

Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available
Training -----------
('train cost: ', array(4.1908903, dtype=float32))
('train cost: ', array(0.10415509, dtype=float32))
('train cost: ', array(0.01151281, dtype=float32))
('train cost: ', array(0.00458441, dtype=float32))

Testing ------------
40/40 [==============================] - 0s 5us/step
('test cost:', 0.005374030210077763)
('Weights=', array([[0.56634265]], dtype=float32), '\nbiases=', array([2.001063], dtype=float32))

 尝试一:

修改配置文件,改为了cuda0,结果import theano时:

[global]
floatX = float32
device =cuda0
[cuda]
root=/usr/local/cuda-8.0
>>> import theano
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/env/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: GPU is too old for CUDA version

https://blog.csdn.net/qq_33200967/article/details/80689543看到,需要检查cuda是否安装成功,由于直接用make报错,https://devtalk.nvidia.com/default/topic/1048902/cuda-setup-and-installation/cuda-samples-ubuntu-make-file-errors/

所以使用了sudo make -k,发现输出结果为:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: ""
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 963 MBytes (1010040832 bytes)
  ( 1) Multiprocessors, ( 48) CUDA Cores/MP:     48 CUDA Cores
  GPU Max Clock rate:                            1046 MHz (1.05 GHz)
  Memory Clock rate:                             875 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 65536 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = NVS 315
Result = PASS

查看nvidia显卡驱动版本:https://blog.csdn.net/s_sunnyy/article/details/64121826

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 

查看本机nvidia显卡:

:/dev$ ls -l nvidia*
crw-rw-rw- 1 root root 195,   0 5月  17 12:53 nvidia0
crw-rw-rw- 1 root root 195,   1 5月  17 12:53 nvidia1
crw-rw-rw- 1 root root 195, 255 5月  17 12:53 nvidiactl
crw-rw-rw- 1 root root 195, 254 5月  17 12:53 nvidia-modeset
crw-rw-rw- 1 root root 240,   0 5月  17 12:53 nvidia-uvm

查看cudnn的版本:, conda list -n username

cudatoolkit               10.0.130                      0  
cudnn                     7.3.1                cuda10.0_0  

似乎版本过高,https://blog.csdn.net/li57681522/article/details/82491617

安装的cudatoolkit和cudnn程序包版本是:10.0

but实际上,但根本就没有安装过cuda10.0。

所以尝试卸载

conda uninstall cudnn
Fetching package metadata ...........
Solving package specifications: .

Package plan for package removal in environment /data_d/old_home/home/.conda/envs:

The following packages will be REMOVED:

    cudnn: 7.3.1-cuda10.0_0

Proceed ([y]/n)? y
conda uninstall cudatoolkit
Fetching package metadata ...........
Solving package specifications: .

Package plan for package removal in environment /data_d/old_home/home/.conda/envs:

The following packages will be REMOVED:

    cudatoolkit: 10.0.130-0
    cupti:       10.0.130-0

Proceed ([y]/n)? y

使用: 

conda install cudatoolkit=8.0
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /data_d/old_home/home/.conda/envs:

The following NEW packages will be INSTALLED:

    cudatoolkit: 8.0-3

Proceed ([y]/n)? y
conda install cudnn=6.0
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /data_d/old_home/home/.conda/env:

The following NEW packages will be INSTALLED:

    cudnn: 6.0.21-cuda8.0_0

Proceed ([y]/n)? y

 

cudatoolkit               8.0                           3  
cudnn                     6.0.21                cuda8.0_0

查询结果如上。

结果依旧同样的错误。

GpuArrayException: No cuda device available

尝试在新环境下重新安装Cuda等。https://blog.csdn.net/lyy14011305/article/details/59500819

按照这个http://deeplearning.net/software/theano/install_ubuntu.html安装numpy\theano等包时,出现以下问题:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/__init__.py", line 156, in <module>
    import theano.gpuarray

。。。
AttributeError: ('The following error happened while compiling the node', DnnVersion(), '\n', "'module' object has no attribute '_get_ndarray_c_version'")

 

https://github.com/pymc-devs/pymc3/issues/3340的解决办法是将theano升级为1.0.4(conda安装的为1.0.3),但是在升级时遇到了问题:

 conda install theano=1.0.4
Fetching package metadata ...........

PackageNotFoundError: Packages missing in current channels:
            
  - theano 1.0.4*

We have searched for the packages in the following channels:
            
  - https://repo.continuum.io/pkgs/main/linux-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/linux-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/linux-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/linux-64
  - https://repo.continuum.io/pkgs/pro/noarch

 

尝试将numpy降到1.15

conda install numpy=1.15
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /data_d/old_home/home/.conda/envs/xhs2:

The following NEW packages will be INSTALLED:

    mkl_fft:    1.0.12-py27ha843d7b_0
    numpy:      1.15.4-py27h7e9f1db_0

The following packages will be DOWNGRADED:

    numpy-base: 1.16.4-py27hde5b4d6_0 --> 1.15.4-py27hde5b4d6_0

Proceed ([y]/n)? y

 

没有了上面的AttributeError的错误,但是之后报的错仍旧是一模一样,当.theanorc中device =cuda0时,报错:

GpuArrayException: GPU is too old for CUDA version

 

当设置为:device =cuda时,报错:

GpuArrayException: No cuda device available

 

posted @ 2019-06-08 08:55  lypbendlf  阅读(1356)  评论(0编辑  收藏  举报