【verbs】ibv_get_device_name()|ibv_get_device_list()

目录

先决条件

ibv_get_device_name

ibv_get_device_list


作者:bandaoyu 地址:https://blog.csdn.net/bandaoyu/article/details/116539866

先决条件

ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。

ibv_get_device_name

函数ibv_get_device_list()返回当前可用的RDMA设备数组

const char *ibv_get_device_name(struct ibv_device *device);

描述

函数用来获取一个与RDMA设备相关联的名字

注意

  • 这个名字在一台特定的机器中是唯一的(相同的名字不会分配给其他设备);
  • 这个名字在跨InfiniBand fabric并不是唯一的;
  • 当一台电脑上拥有多于一台的RDMA设备时,修改RDMA设备在电脑上的位置(例如,总线上的位置),可能会导致关联的name改变;
  • 为了区分设备,建议使用设备的GUID进行区分,函数ibv_get_device_guid()可以返回该值;

参数(struct ibv_device *device)

函数ibv_get_device_list()返回值中的某一项。(函数ibv_get_device_list()返回当前可用的RDMA设备数组)

返回值(const char *)

函数返回一个指向设备名的指针,如果出错,返回NULL。

名字的组成

  • prefix前缀---描述RDMA设备供应商和样式
    • cxgb3 ---Chelsio Communications, T3 RDMA family
    • cxgb4 - Chelsio Communications, T4 RDMA family
    • ehca - IBM, eHCA family
    • ipathverbs - QLogic
    • mlx4 - Mellanox Technologies, ConnectX family
    • mthca - Mellanox Technologies, InfiniHost family
    • nes - Intel, Intel-NE family
  • index索引-用来在同一台电脑中区分相同供应商的一个数字

例子

函数ibv_get_device_name()可能返回的RDMA设备名的例子

  • mlx4_0
  • mlx4_1
  • mthca0
#include <stdio.h>
#include <infiniband/verbs.h>

int main(void)
{
    struct ibv_device **device_list;
    int num_devices;
    int i;

    device_list = ibv_get_device_list(&num_devices);
    if (!device_list) {
        fprintf(stderr, "Error, ibv_get_device_list() failed\n");
        return -1;
    }

    printf("%d RDMA device(s) found:\n\n", num_devices);

    for (i = 0; i < num_devices; ++ i)
        printf("RDMA device[%d]: name=%s\n", i,
               ibv_get_device_name(device_list[i]));

    ibv_free_device_list(device_list);

    return 0;
}

获取intel or mellonx 的device name、ib_port、gid index 、mtu的脚本

用法:get-rdma-device-info  eth0

#!/bin/bash

g_vendor=""
g_hexip=""
g_nic_name=""


MY_UUID=$(cat /proc/sys/kernel/random/uuid)
MY_TMP_FILE_PATH=/tmp/${MY_UUID}.txt
MY_LOG_FILE_PATH=/var/ceph_osd_get_rdma_info.log


#mellonx or intel
function  set_vendor()
{

 vendor_id=`ibv_devinfo|grep "vendor_id"|awk 'NR==1 {print $2}'`


if [[ "0x8086" = ${vendor_id} ]]; then

     g_vendor="INTEL"

elif [[ "0x02c9" = ${vendor_id} ]]; then

     g_vendor="MELLONX"
else

echo "unknown rdma hca vendor." >>${MY_LOG_FILE_PATH}

exit 1

fi

}


#ip4 to hex ip
function ip4_to_hex()
{
tmpifs=${IFS}

IFS="."
num=0

for str in $1
do
ip[num]=${str}
((num++))
done

g_hexip=`printf "%x%x:%x%x" ${ip[0]} ${ip[1]} ${ip[2]} ${ip[3]}`

IFS=${tmpifs}
}


#intel hca process
#reference:https://downloadmirror.intel.com/30368/eng/README_irdma_1.4.22.txt
function  intel_hca()
{
echo "vendor is intel">>${MY_LOG_FILE_PATH}


devices=`ibv_devices|awk 'NR > 2 {print $1}'`

for dev in $devices
do
ifname=`ls /sys/class/infiniband/${dev}/device/net`
if [[ ${g_nic_name} = ${ifname} ]];then
device_name=${dev}
fi
done

ethip=`ip route|grep 'link src'|grep ${g_nic_name}|awk '{print $9}'`

ip4_to_hex ${ethip}


if [ "X${device_name}" != "X" ];then
  echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
  echo "get device_name failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


for port in ` ls /sys/class/infiniband/${device_name}/ports/`
{
    for gidx in `ls /sys/class/infiniband/${device_name}/ports/${port}/gids`
    {
    	hca_hex_ip=`cat /sys/class/infiniband/${device_name}/ports/${port}/gids/${gidx}`
     
     	if [[ ${hca_hex_ip} =~ ${g_hexip} ]];then
     		gid_index=${gidx}
     		ib_port=${port}
     	fi

     }
}


if [ "X${gid_index}" != "X" ];then
  echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
  echo "get gid_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


if [ "X${ib_port}" != "X" ];then
  echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
  echo "get ib_port failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`

if [ "X${mtu_index}" != "X" ];then
  echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
  echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

}



#mellonx hca process
#ibdev2netdev、show_gids
function  mellonx_hca()
{

echo "vendor is mellonx">>${MY_LOG_FILE_PATH}

device_name=`ibdev2netdev | grep -w ${g_nic_name} | awk -F ' ' '{print $1}'`

if [ "X$device_name" != "X" ];then
  echo "device_name"=${device_name} >>${MY_TMP_FILE_PATH}
else
  echo "get device_name failed">>${MY_LOG_FILE_PATH}
  exit 1
fi


gid_index=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $3}'|head -n 1`

if [ "X${gid_index}" != "X" ];then
  echo "gid_index"=${gid_index} >>${MY_TMP_FILE_PATH}
else
  echo "get gid_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

ib_port=`show_gids | grep -w ${device_name} |grep -w "v2"| awk -F ' ' '$5 !="" {print $2}'|head -n 1`

if [ "X${ib_port}" != "X" ];then
  echo "ib_port"=${ib_port} >>${MY_TMP_FILE_PATH}
else
  echo "get ib_port failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

mtu_index=`ibv_devinfo|grep -A 17 ${device_name} |grep active_mtu|awk '{print $3}'|awk -F "[()]" '{print $2}'`

if [ "X${mtu_index}" != "X" ];then
  echo "mtu_index"=${mtu_index} >>${MY_TMP_FILE_PATH}
else
  echo "get mtu_index failed">>${MY_LOG_FILE_PATH}
  exit 1
fi

}





#====================================================================
#start shell
#====================================================================


echo "input interface name is:$1">${MY_LOG_FILE_PATH}

if [ "X$1" == "X" ]; then
  echo "interface is not specific,excample:$0 eth0">>${MY_LOG_FILE_PATH}
  exit 1
fi

g_nic_name=$1

is_virtual=`ls -l /sys/class/net/ | grep " $g_nic_name " | grep "\/virtual\/net\/" | wc -l`
if [ $is_virtual -ne 0 ]; then
  g_nic_name=`echo $g_nic_name | awk -F "." 'OFS="." {$NF="" ;print $0}' | sed 's/.$//'`
fi


set_vendor


if [[ "INTEL" = ${g_vendor} ]]; then

	intel_hca

elif [[ "MELLONX" = ${g_vendor} ]]; then

	mellonx_hca
else

echo "Unable to determine the vendor. exit 1">>${MY_LOG_FILE_PATH}
exit 1   

fi

cat ${MY_TMP_FILE_PATH}

rm -f ${MY_TMP_FILE_PATH}

ibv_get_device_list

先决条件

ibv_fork_init() 应该在调用 libibverbs 中的任何其他函数之前调用。

描述
ibv_get_device_list() 返回当前可用的以 NULL 结尾的 RDMA 设备数组。应该使用 ibv_free_device_list() 释放该数组。


不应直接访问数组条目。相反,应该用以下service verbs操作它们:ibv_get_device_name()、ibv_get_device_guid() 和 ibv_open_device()。

参数

名称方向描述
num_devicesout(optional) 如果不为NULL,则设置为数组中返回的设备数


返回值
ibv_get_device_list() 成功时返回可用 RDMA 设备的数组,如果请求失败则返回 NULL 并设置 errno。

如果没有找到设备,则将 num_devices 设置为 0,并返回非 NULL。


可能的 errno 值为:
EPERM - 权限被拒绝。 errno 1
ENOMEM - 内存不足,无法完成操作。 errno  12
ENOSYS - 内核不支持 RDMA(相关函数没有实现) errno  38。

例子


不带参数获取设备列表:

struct ibv_device **dev_list;
 
dev_list = ibv_get_device_list(NULL);
if (!dev_list)
        exit(1);

带参数获取设备列表:

struct ibv_device **dev_list;
int num_devices;
 
dev_list = ibv_get_device_list(&num_devices);
if (!dev_list)
        exit(1);


常见问题


我调用了 ibv_get_device_list() 并返回 NULL,这是什么意思?
这是一个不应失败的verb,请检查模块 ib_uverbs 是否已加载。(命令 lsmod)

我调用了 ibv_get_device_list() 并且它根本没有找到任何 RDMA 设备(空列表),这是什么意思?

驱动程序找不到任何 RDMA 设备。
- 检查 lspci,如果您的机器中有任何 RDMA 设备
- 检查是否加载了 RDMA 设备的低级驱动程序,使用 lsmod
- 检查 dmesg /var/log/messages 是否有错误
 

翻译自:https://www.rdmamojo.com/2012/05/31/ibv_get_device_list/

更多参考:Device Operations - RDMA Aware Programming User Manual v1.7 - NVIDIA Networking Docs 

struct ibv_device
{
    struct ibv_device_ops   ops;
    enum ibv_node_type  node_type;
    enum ibv_transport_type transport_type;
    char    name[IBV_SYSFS_NAME_MAX];
    char    dev_name[IBV_SYSFS_NAME_MAX];
    char    dev_path[IBV_SYSFS_PATH_MAX];
    char    ibdev_path[IBV_SYSFS_PATH_MAX];
};
 
ops pointers to alloc and free functions
node_type   IBV_NODE_UNKNOWN
    IBV_NODE_CA
    IBV_NODE_SWITCH
    IBV_NODE_ROUTER
    IBV_NODE_RNIC
transport_type  IBV_TRANSPORT_UNKNOWN
    IBV_TRANSPORT_IB
    IBV_TRANSPORT_IWARP
name    kernel device name eg “mthca0”
dev_name    uverbs device name eg “uverbs0”
dev_path    path to infiniband_verbs class device in sysfs
ibdev_path  path to infiniband class device in sysfs

 错误记录

ibv_get_device_list() 没有找到任何 RDMA 设备,返回num=0.报错No such file or directory

问题原因: 

ceph 用systemd管理进程,而

/lib/systemd/system/ceph-mds@.service 中PrivateDevices=yes,这样一来:进程启动后运作在一个私有的文件系统空间,这个私有系统空间中/dev 被一个最小化的版本替代,仅包含非物理设备的节点:

When PrivateDevices=yes is set in the [Service] section of a systemd service unit file, the processes run for the service will run in a private file system namespace where /dev is replaced by a minimal version that only includes the device nodes /dev/null, /dev/zero, /dev/full, /dev/urandom, /dev/random, /dev/tty as well as the submounts /dev/shm, /dev/pts, /dev/mqueue, /dev/hugepages, and the /dev/stdout, /dev/stderr, /dev/stdin symlinks. No device nodes for physical devices will be included however.

Changes/PrivateDevicesAndPrivateNetwork:https://fedoraproject.org/wiki/Changes/PrivateDevicesAndPrivateNetwork

 而根据ibv_get_device_lis源码,需要去/sys/class/infiniband读取list:

[root@a1 ceph]# ls /sys/class/infiniband
mlx5_0  mlx5_1

因为此时mds进程的私有系统空间不包含/sys/,所以报错:No such file or directory。

[Unit]
Description=Ceph metadata server daemon
After=network-online.target local-fs.target time-sync.target
Wants=network-online.target local-fs.target time-sync.target
PartOf=ceph-mds.target

[Service]
LimitCORE=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
EnvironmentFile=-/etc/sysconfig/ceph
Environment=CLUSTER=ceph
ExecStart=/opt/h3c/bin/ceph-mds -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecReload=/bin/kill -HUP $MAINPID
PrivateDevices=yes
ProtectHome=true
ProtectSystem=full
PrivateTmp=true
TasksMax=infinity
Restart=on-failure
StartLimitInterval=30min
StartLimitBurst=50
RestartSec=10s

[Install]
WantedBy=ceph-mds.targe


还需要注意:使用ibverbs api注意事项|https://blog.csdn.net/bandaoyu/article/details/124327417?spm=1001.2014.3001.5501

从失败的 ibv_devinfo 命令中恢复:IBM Docs

Recovering from a failed ibv_devinfo command

Last Updated: 2021-03-01

The ibv_devinfo command can fail when modules or hardware drivers fail to load or when libraries are missing.

About this task

The ibv_devinfo command generally fails with one of two common errors. The recovery steps for each of those two errors, and one less common error, are given below.

Procedure

  1. Error: Failed to get IB devices list: Function not implemented.

    One of the common causes of this failure is that the ib_uverbs module might not by loaded or it might not be enabled at the correct run levels. To recover from this error, complete the following steps:

    1. To verify the ib_uverbs module is loaded, run the following command and look for similar output:
      lsmod | grep ib_uverbs
      
      ib_uverbs              44238  0
    2. To verify that the RDMA run level is set to on for levels 3 and 5, run the following command and look for similar output:
      chkconfig --list | grep rdma
      
      0:off 1:off 2:off 3:on 4:off 5:on 6:off 
      If RDMA is off, run the following commands to activate RDMA on levels 3 and 5:
      chkconfig --level 3 rdma on 
      	chkconfig --level 5 rdma on
      Run the following command to restart RDMA:
      openibd restart/rdma restart
    3. If there is a missing library, you will see an error similar to the following:
      libibverbs: Warning: couldn't load driver 'mlx4': libmlx4-rdmav2.so: cannot open shared object file: No such file or directory 
      	libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 
      	No IB devices found. 

      If you receive this error, install the libmlx4 user level library.

  2. Error: No IB devices found.

    If no IB devices are found, complete the following steps:

    1. Check to see if the relevant hardware driver is loaded. If a hardware driver is missing, then run the following command:
      modprobe <hardware driver>
    2. Verify that the hardware driver is loaded by default by editing the configuration file.
    3. Run the following command to restart RDMA:
      openibd restart/rdma restart
  3. Error: On Red Hat Enterprise Linux 5.x on ppc64, the wrong libraries are installed.

    Red Hat Enterprise Linux 5.x on ppc64 requires 32-bit user level libraries like libmlx4. However, by default, the 64-bit libraries are installed. Make sure that you have the correct 32-bit libraries installed.

libibverbs can't find running IB devices 

749816 – libibverbs can't find running IB devices

驱动安装上但是不正常

按intel的教程将驱动装上了,但是工作不正常。最后发现是系统内核自带的内核态rdma驱动是mellonx的,而我们安装的rdma-core 是用户态的驱动。 二者并不匹配。

modinfo ib-core命令可以看到
H3C发行的镜像的ib-core是/lib/modules/5.10.38-21.01.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko , 是mellonx提供的内核态rdma core

而不是原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz,也就是H3C发行版本用mlnx的ib_core.ko替换了原生的ib_core.ko.xz)

从名字中的mlnx看,应该是mellonx的驱动,对intel RDMA网卡应该是不适配的,所以造成了无法正常使用。Ice 应该也是一样的情况.

直接将/lib/modules/5.10.38-21.01.el7.x86_64/extra 删掉,重启,让系统加载原生发行版本的 /lib/modules/5.10.38-21.01.el7.x86_64/kernel/drivers/infiniband/core/ib_core.ko.xz

然后再重新安装intel的驱动。遇到冲突的,就用yum remove 卸载掉mellonx的驱动

posted on 2022-10-04 01:23  bdy  阅读(157)  评论(0编辑  收藏  举报

导航