centos tensorflow 如何使用 gpu
持续监控GPU使用情况命令:
watch -n 10 nvidia-smi
参数解释:
Fan:显示风扇转速,数值在0到100%之间,是计算机的期望转速,如果计算机不是通过风扇冷却或者风扇坏了,显示出来就是N/A;
Temp:显卡内部的温度,单位是摄氏度;
Perf:表征性能状态,从P0到P12,P0表示最大性能,P12表示状态最小性能;
Pwr:能耗表示;
Bus-Id:涉及GPU总线的相关信息;
Disp.A:是Display Active的意思,表示GPU的显示是否初始化;
Memory Usage:显存的使用率;
Volatile GPU-Util:浮动的GPU利用率;
Compute M:计算模式;
watch -n 5 nvidia-smi
命令行参数-n后边跟的是执行命令的周期,以s为单位。
import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0" # 使用第一块GPU
from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
import tensorflow as tf # 查看gpu和cpu的数量 gpus = tf.config.experimental.list_physical_devices(device_type='GPU') cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus,cpus)
AttributeError: module 'tensorflow' has no attribute 'Session'
CentOS查看GPU显卡信息
# yum install pciutils lshw -y
# lspci | grep -E "VGA|NVIDIA"
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
# lspci -v -s 04:00.0
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell Device 3600
Flags: bus master, fast devsel, latency 0, IRQ 63, NUMA node 0
Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 2000 [size=128]
[virtual] Expansion ROM at a4080000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
常用GPU管理命令
1.列出所有可用的Nvidia设备
nvidia-smi -L
2.列出每个GPU卡的详细信息
nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
3.查询某个GPU卡的详细信息(指定GPU卡的id,只截图一部分)
nvidia-smi -i 0 -q
4.要以1秒的更新间隔监控GPU的总体使用情况
nvidia-smi dmon
5.要以1秒的更新间隔监视每个进程的GPU使用情况
nvidia-smi pmon
6.加上-pm参数可设置持久模式:0/禁用,1/启用
nvidia-smi -pm 1
7.加上-e参数可以切换ECC支持:0/禁用,1/启用
nvidia-smi -e 1
8. 加上-r参数可以重启某个GPU卡(0是GPU卡的序号)
nvidia-smi -r -i 0
查看是否安装了相关的软件(CUDA, cuDNN)
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
REF
https://tensorflow.google.cn/install/gpu?hl=zh_cn 【官网推荐docker方式】
https://blog.csdn.net/To_be_little/article/details/124438800
https://www.nhooo.com/note/qa3ovr.html
NVIDIA GeForce GTX 1080 Ti基于16nm GP102核心,352-bit 11GB GDDR5X显存,多达3584个流处理器,
https://cloud.tencent.com/developer/article/1486194?from=15425&areaSource=102001.1&traceId=QZ8GVMtf-DfTefbWaYiDW
https://blog.51cto.com/u_15790101/5673579
email: CentOS 7.3安装NVIDIA-1080ti驱动、cuda、cuDNN、TensorFlow