【verbs】ibv_get_device_name()|ibv_get_device_list()
目录
作者:bandaoyu 地址:https://blog.csdn.net/bandaoyu/article/details/116539866
先决条件
ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。
ibv_get_device_name
函数ibv_get_device_list()返回当前可用的RDMA设备数组
const char *ibv_get_device_name(struct ibv_device *device);
描述
函数用来获取一个与RDMA设备相关联的名字
注意
- 这个名字在一台特定的机器中是唯一的(相同的名字不会分配给其他设备);
- 这个名字在跨InfiniBand fabric并不是唯一的;
- 当一台电脑上拥有多于一台的RDMA设备时,修改RDMA设备在电脑上的位置(例如,总线上的位置),可能会导致关联的name改变;
- 为了区分设备,建议使用设备的GUID进行区分,函数ibv_get_device_guid()可以返回该值;
参数(struct ibv_device *device)
函数ibv_get_device_list()返回值中的某一项。(函数ibv_get_device_list()返回当前可用的RDMA设备数组)
返回值(const char *)
函数返回一个指向设备名的指针,如果出错,返回NULL。
名字的组成
- prefix前缀---描述RDMA设备供应商和样式
- cxgb3 ---Chelsio Communications, T3 RDMA family
- cxgb4 - Chelsio Communications, T4 RDMA family
- ehca - IBM, eHCA family
- ipathverbs - QLogic
- mlx4 - Mellanox Technologies, ConnectX family
- mthca - Mellanox Technologies, InfiniHost family
- nes - Intel, Intel-NE family
- index索引-用来在同一台电脑中区分相同供应商的一个数字
例子
函数ibv_get_device_name()可能返回的RDMA设备名的例子
- mlx4_0
- mlx4_1
- mthca0
#include <stdio.h>
#include <infiniband/verbs.h>
int main(void)
{
struct ibv_device **device_list;
int num_devices;
int i;
device_list = ibv_get_device_list(&num_devices);
if (!device_list) {
fprintf(stderr, "Error, ibv_get_device_list() failed\n");
return -1;
}
printf("%d RDMA device(s) found:\n\n", num_devices);
for (i = 0; i < num_devices; ++ i)
printf("RDMA device[%d]: name=%s\n", i,
ibv_get_device_name(device_list[i]));
ibv_free_device_list(device_list);
return 0;
}
获取intel or mellonx 的device name、ib_port、gid index 、mtu的脚本
用法:get-rdma-device-info eth0
#!/bin/bash
g_vendor=""
g_hexip=""
g_nic_name=""
MY_UUID=$(cat /proc/sys/kernel/random/uuid)
MY_TMP_FILE_PATH=/tmp/${MY_UUID}.txt
MY_LOG_FILE_PATH=/var/ceph_osd_get_rdma_info.log
#mellonx or intel
function set_vendor()
{
vendor_id=`ibv_devinfo|grep "vendor_id"|awk 'NR==1 {print $2}'`
if [[ "0x8086" = ${vendor_id} ]]; then
g_vendor="INTEL"
elif [[ "0x02c9" = ${vendor_id} ]]; then
g_vendor="MELLONX"
else
echo "unknown rdma hca vendor." >>${MY_LOG_FILE_PATH}
exit 1
fi
}
#ip4 to hex ip
function ip4_to_hex()
{
tmpifs=${IFS}
IFS="."
num=0
for str in $1
do
ip[num]=${str}
((num++))
done
g_hexip=`printf "%x%x:%x%x" ${ip[0]} ${ip[1]} ${ip[2]} ${ip[3]}`
IFS=${tmpifs}
}
#intel hca process
#reference:https://downloadmirror.intel.com/30368/eng/README_irdma_1.4.22.txt
function intel_hca()
{
echo "vendor is intel">>${MY_LOG_FILE_PATH}
devices=`ibv_devices|awk 'NR > 2 {print $1}'`
for dev in $devices
do
ifname=`ls /sys/class/infiniband/${dev}/device/net`
if [[ ${g_nic_name} = ${ifname} ]];then
device_name=${dev}
fi
done
ethip=`ip route|grep 'link src'|grep ${g_nic_name}|awk '{print $9}'`
ip4_to_hex ${ethip}
if [ "X${device_name}" != "X" ];then
echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
echo "get device_name failed">>${MY_LOG_FILE_PATH}
exit 1
fi
for port in ` ls /sys/class/infiniband/${device_name}/ports/`
{
for gidx in `ls /sys/class/infiniband/${device_name}/ports/${port}/gids`
{
hca_hex_ip=`cat /sys/class/infiniband/${device_name}/ports/${port}/gids/${gidx}`
if [[ ${hca_hex_ip} =~ ${g_hexip} ]];then
gid_index=${gidx}
ib_port=${port}
fi
}
}
if [ "X${gid_index}" != "X" ];then
echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
echo "get gid_index failed">>${MY_LOG_FILE_PATH}
exit 1
fi
if [ "X${ib_port}" != "X" ];then
echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
echo "get ib_port failed">>${MY_LOG_FILE_PATH}
exit 1
fi
mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`
if [ "X${mtu_index}" != "X" ];then
echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
exit 1
fi
}
#mellonx hca process
#ibdev2netdev、show_gids
function mellonx_hca()
{
echo "vendor is mellonx">>${MY_LOG_FILE_PATH}
device_name=`ibdev2netdev | grep -w ${g_nic_name} | awk -F ' ' '{print $1}'`
if [ "X$device_name" != "X" ];then
echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
echo "get device_name failed">>${MY_LOG_FILE_PATH}
exit 1
fi
gid_index=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $3}'|head -n 1`
if [ "X${gid_index}" != "X" ];then
echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
echo "get gid_index failed">>${MY_LOG_FILE_PATH}
exit 1
fi
ib_port=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $2}'|head -n 1`
if [ "X${ib_port}" != "X" ];then
echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
echo "get ib_port failed">>${MY_LOG_FILE_PATH}
exit 1
fi
mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`
if [ "X${mtu_index}" != "X" ];then
echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
exit 1
fi
}
#====================================================================
#start shell
#====================================================================
echo "input interface name is:$1">${MY_LOG_FILE_PATH}
if [ "X$1" == "X" ]; then
echo "interface is not specific,excample:$0 eth0">>${MY_LOG_FILE_PATH}
exit 1
fi
g_nic_name=$1
is_virtual=`ls -l /sys/class/net/ | grep " $g_nic_name " | grep "\/virtual\/net\/" | wc -l`
if [ $is_virtual -ne 0 ]; then
g_nic_name=`echo $g_nic_name | awk -F "." 'OFS="." {$NF="" ;print $0}' | sed 's/.$//'`
fi
set_vendor
if [[ "INTEL" = ${g_vendor} ]]; then
intel_hca
elif [[ "MELLONX" = ${g_vendor} ]]; then
mellonx_hca
else
echo "Unable to determine the vendor. exit 1">>${MY_LOG_FILE_PATH}
exit 1
fi
cat ${MY_TMP_FILE_PATH}
rm -f ${MY_TMP_FILE_PATH}
ibv_get_device_list
先决条件
ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。
描述
ibv_get_device_list() 返回当前可用的以 NULL 结尾的 RDMA 设备数组。应该使用 ibv_free_device_list() 释放该数组。
不应直接访问数组条目。相反,应该用以下service verbs操作它们:ibv_get_device_name()、ibv_get_device_guid() 和 ibv_open_device()。
参数
名称 | 方向 | 描述 |
---|---|---|
num_devices | out | (optional) 如果不为NULL,则设置为数组中返回的设备数 |
返回值
ibv_get_device_list() 成功时返回可用 RDMA 设备的数组,如果请求失败则返回 NULL 并设置 errno。
如果没有找到设备,则将 num_devices 设置为 0,并返回非 NULL。
可能的 errno 值为:
EPERM - 权限被拒绝。 errno 1
ENOMEM - 内存不足,无法完成操作。 errno 12
ENOSYS - 内核不支持 RDMA(相关函数没有实现) errno 38。
例子
不带参数获取设备列表:struct ibv_device **dev_list; dev_list = ibv_get_device_list(NULL); if (!dev_list) exit(1);带参数获取设备列表:
struct ibv_device **dev_list; int num_devices; dev_list = ibv_get_device_list(&num_devices); if (!dev_list) exit(1);
常见问题
我调用了 ibv_get_device_list() 并返回 NULL,这是什么意思?
这是一个不应失败的verb,请检查模块 ib_uverbs 是否已加载。(命令 lsmod)
我调用了 ibv_get_device_list() 并且它根本没有找到任何 RDMA 设备(空列表),这是什么意思?
驱动程序找不到任何 RDMA 设备。
- 检查 lspci,如果您的机器中有任何 RDMA 设备
- 检查是否加载了 RDMA 设备的低级驱动程序,使用 lsmod
- 检查 dmesg /var/log/messages 是否有错误
翻译自:https://www.rdmamojo.com/2012/05/31/ibv_get_device_list/
更多参考:Device Operations - RDMA Aware Programming User Manual v1.7 - NVIDIA Networking Docs
struct ibv_device
{
struct ibv_device_ops ops;
enum ibv_node_type node_type;
enum ibv_transport_type transport_type;
char name[IBV_SYSFS_NAME_MAX];
char dev_name[IBV_SYSFS_NAME_MAX];
char dev_path[IBV_SYSFS_PATH_MAX];
char ibdev_path[IBV_SYSFS_PATH_MAX];
};
ops pointers to alloc and free functions
node_type IBV_NODE_UNKNOWN
IBV_NODE_CA
IBV_NODE_SWITCH
IBV_NODE_ROUTER
IBV_NODE_RNIC
transport_type IBV_TRANSPORT_UNKNOWN
IBV_TRANSPORT_IB
IBV_TRANSPORT_IWARP
name kernel device name eg “mthca0”
dev_name uverbs device name eg “uverbs0”
dev_path path to infiniband_verbs class device in sysfs
ibdev_path path to infiniband class device in sysfs
错误记录
ibv_get_device_list() 没有找到任何 RDMA 设备,返回num=0.报错No such file or directory
问题原因:
ceph 用systemd管理进程,而
/lib/systemd/system/ceph-mds@.service 中PrivateDevices=yes,这样一来:进程启动后运作在一个私有的文件系统空间,这个私有系统空间中/dev 被一个最小化的版本替代,仅包含非物理设备的节点:
When PrivateDevices=yes is set in the [Service] section of a systemd service unit file, the processes run for the service will run in a private file system namespace where /dev is replaced by a minimal version that only includes the device nodes /dev/null, /dev/zero, /dev/full, /dev/urandom, /dev/random, /dev/tty as well as the submounts /dev/shm, /dev/pts, /dev/mqueue, /dev/hugepages, and the /dev/stdout, /dev/stderr, /dev/stdin symlinks. No device nodes for physical devices will be included however.
而根据ibv_get_device_lis源码,需要去/sys/class/infiniband读取list:
[root@a1 ceph]# ls /sys/class/infiniband
mlx5_0 mlx5_1
因为此时mds进程的私有系统空间不包含/sys/,所以报错:No such file or directory。
[Unit]
Description=Ceph metadata server daemon
After=network-online.target local-fs.target time-sync.target
Wants=network-online.target local-fs.target time-sync.target
PartOf=ceph-mds.target[Service]
LimitCORE=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
EnvironmentFile=-/etc/sysconfig/ceph
Environment=CLUSTER=ceph
ExecStart=/opt/h3c/bin/ceph-mds -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecReload=/bin/kill -HUP $MAINPID
PrivateDevices=yes
ProtectHome=true
ProtectSystem=full
PrivateTmp=true
TasksMax=infinity
Restart=on-failure
StartLimitInterval=30min
StartLimitBurst=50
RestartSec=10s[Install]
WantedBy=ceph-mds.targe
还需要注意:使用ibverbs api注意事项|https://blog.csdn.net/bandaoyu/article/details/124327417?spm=1001.2014.3001.5501
从失败的 ibv_devinfo 命令中恢复:IBM Docs
Recovering from a failed ibv_devinfo command
Last Updated: 2021-03-01
The ibv_devinfo command can fail when modules or hardware drivers fail to load or when libraries are missing.
About this task
The ibv_devinfo command generally fails with one of two common errors. The recovery steps for each of those two errors, and one less common error, are given below.
Procedure
- Error: Failed to get IB devices list: Function not implemented.
One of the common causes of this failure is that the ib_uverbs module might not by loaded or it might not be enabled at the correct run levels. To recover from this error, complete the following steps:
- To verify the ib_uverbs module is loaded, run the following command and look for similar output:
lsmod | grep ib_uverbs ib_uverbs 44238 0
- To verify that the RDMA run level is set to on for levels 3 and 5, run the following command and look for similar output:
If RDMA is off, run the following commands to activate RDMA on levels 3 and 5:chkconfig --list | grep rdma 0:off 1:off 2:off 3:on 4:off 5:on 6:off
Run the following command to restart RDMA:chkconfig --level 3 rdma on chkconfig --level 5 rdma on
openibd restart/rdma restart
- If there is a missing library, you will see an error similar to the following:
libibverbs: Warning: couldn't load driver 'mlx4': libmlx4-rdmav2.so: cannot open shared object file: No such file or directory libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 No IB devices found.
If you receive this error, install the libmlx4 user level library.
- To verify the ib_uverbs module is loaded, run the following command and look for similar output:
- Error: No IB devices found.
If no IB devices are found, complete the following steps:
- Check to see if the relevant hardware driver is loaded. If a hardware driver is missing, then run the following command:
modprobe <hardware driver>
- Verify that the hardware driver is loaded by default by editing the configuration file.
- Run the following command to restart RDMA:
openibd restart/rdma restart
- Check to see if the relevant hardware driver is loaded. If a hardware driver is missing, then run the following command:
- Error: On Red Hat Enterprise Linux 5.x on ppc64, the wrong libraries are installed.
Red Hat Enterprise Linux 5.x on ppc64 requires 32-bit user level libraries like libmlx4. However, by default, the 64-bit libraries are installed. Make sure that you have the correct 32-bit libraries installed.
libibverbs can't find running IB devices
749816 – libibverbs can't find running IB devices
驱动安装上但是不正常
按intel的教程将驱动装上了,但是工作不正常。最后发现是系统内核自带的内核态rdma驱动是mellonx的,而我们安装的rdma-core 是用户态的驱动。 二者并不匹配。
modinfo ib-core命令可以看到
H3C发行的镜像的ib-core是/lib/modules/5.10.38-21.01.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko , 是mellonx提供的内核态rdma core
而不是原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz,也就是H3C发行版本用mlnx的ib_core.ko替换了原生的ib_core.ko.xz)
从名字中的mlnx看,应该是mellonx的驱动,对intel RDMA网卡应该是不适配的,所以造成了无法正常使用。Ice 应该也是一样的情况.
直接将/lib/modules/5.10.38-21.01.el7.x86_64/extra 删掉,重启,让系统加载原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz
然后再重新安装intel的驱动。遇到冲突的,就用yum remove 卸载掉mellonx的驱动