GPU passthrought setup for Nvidia V100 (Part I)
This is an instruction based on V100 and GPU compute purpose only. There will be two parts for this instruction, Host setup and Guest Setup
Please make sure using Nvidia Tesla production, which means Maxwell,
Pascal, and Volta. We do not have hardware matrix from Nvidia yet.
Please also make sure you have an extra display card on the host at the meantime, or a SSH enviroment at least.
1. HOST enviroment verification
1.1 Make sure Your HOST is SLES12SP3 and so on
baird:~/:[0]# cat /etc/issue
Welcome to SUSE Linux Enterprise Server 15 (x86_64) - Kernel \r (\l).
1.2 Make sure your HOST support VT-d and being enabled from BIOS:
baird:~/:[0]# dmesg | grep -e "Directed I/O"
[ 12.819760] DMAR: Intel(R) Virtualization Technology for Directed I/O
1.3 Make sure if you an extra GPU or VGA card:
baird:~/:[0]# lspci | grep -i "vga"
07:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA
G200e [Pilot] ServerEngines (SEP1) (rev 05)
baird:~/:[0]# lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)
2. Enable IOMMU
vim /etc/default/grub
# Make this line look like this
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt rd.driver.pre=vfio-pci"
grub2-mkconfig -o /boot/grub2/grub.cfg
After reboot, you could verify by
dmesg | grep -e DMAR -e IOMMU
3. Add nouveau to blacklist
baird:~/:[0]# vim /etc/modprobe.d/50-blacklist.conf
add "blacklist nouveau"
4. Setup VFIO and isolate the GPU used for pass-through
Add a file under /etc/modprobe.d
baird:~/:[0]# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1db4
10de:1db4 is vender id and model id, lspci -nn will give you these values
baird:~/:[0]# lspci -nn | grep 03:00.0
03:00.0 3D controller [0302]: NVIDIA Corporation GV100 [Tesla V100 PCIe]
[10de:1db4] (rev a1)
5. load VFIO driver
baird:~/:[0]# modprobe vfio-pci
or add to your initrd file
baird:~/:[0]# cat /etc/dracut.conf.d/gpu-passthrough.conf
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
dracut --force /boot/initrd $(uname -r)
6. Reboot Host and check GPU is isolated in different iommu group and
vfio driver is in use
find /sys/kernel/iommu_groups/*/devices/*
/sys/kernel/iommu_groups/47/devices/0000:03:00.0
/sys/kernel/iommu_groups/49/devices/0000:07:00.0
lspci -k
03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)
Subsystem: NVIDIA Corporation Device 1214
Kernel driver in use: vfio-pci
Kernel modules: nouveau