【verbs】使用ibverbs api注意事项|libibverbs 中 fork() 支持的状态如何?
使用ibverbs api注意事项
int ibv_fork_init(void)
Input Parameters:
None
Output Parameters:
None
Return Value:
0 on success, -1 on error. If the call fails, errno will be set to indicate the reason for the failure.
- ibv_fork_init初始化libibverbs的数据结构以安全地处理fork()函数,并避免数据损坏,不管fork()是否被明确地或隐式地调用,如在system()调用。
- 如果所有父进程线程总是被阻塞,直到所有子进程通过exec()操作结束或更改地址空间,则不需要调用ibv_fork_init。
- 该功能适用于支持madvise()(2.6.17及更高版本)的MADV_DONTFORK标志的Linux内核。
- 将环境变量RDMAV_FORK_SAFE或IBV_FORK_SAFE设置为任何值与调用ibv_fork_init()具有相同的效果。
- 将环境变量RDMAV_HUGEPAGES_SAFE设置为任何值将告知库检查内核对内存区域使用的底层页大小。如果应用程序通过库(例如libhugetlbfs)直接或间接使用庞大的页面,这是必需的。
- 调用ibv_fork_init()将会降低性能,因为每个内存注册需要额外的系统调用,并且分配给追踪内存区域的额外内存。精确的性能影响取决于工作负载,通常不会很大。
- 设置RDMAV_HUGEPAGES_SAFE为所有内存启用增加了额外的开销。
(https://www.freesion.com/article/8223180236/)
问:libibverbs 中 fork() 支持的状态如何?是否仍然如下面的中描述:
(as listed below) ? Or is there some improvement (less constraint) to support fork() since then?
When possible, avoid using fork()
fork() or other system calls that call fork(), for example: system(), popen(), etc.
- If fork() must be used, the next rules should be followed:
Prepare libibverbs to work with fork():Call the verb ibv_fork_init()
or
Setting the environment variables RDMAV_FORK_SAFE or IBV_FORK_SAFE
This will allocate internal structures in way which is more “fork()”-friendly
- RDMA should be used only in the parent process
Child process should call immediately exec*() or exit()
注:exec*()
调用fork之后,子进程调用exec*系列函数运行一个新的程序,它不会创建新的进程,而是用新的程序替换子进程的地址空间,代码段,数据,堆,栈,所以,新的pid不变,新进程从main函数开始。exec*系列函数https://blog.csdn.net/m0_51543618/article/details/109253523
- If huge pages are used, set the environment variable RDMAV_HUGEPAGES_SAFE as well
Warning: Not following those rules may lead to data corruption or segmentation fault!
答:是的,除了你提到的那些,你应该还需要参考:
https://network.nvidia.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
原文:What's the status of fork() support in libibverbs? - Mellanox OFED - NVIDIA Developer Forums
使用systemctl 管理进程的,PrivateDevices不能设置为yes
ibv_get_device_list() 没有找到任何 RDMA 设备,返回num=0.报错No such file or directory
问题原因:
ceph 用systemd管理进程,而
/lib/systemd/system/ceph-mds@.service 中PrivateDevices=yes,这样一来:进程启动后运作在一个私有的文件系统空间,这个私有系统空间中/dev 被一个最小化的版本替代,仅包含非物理设备的节点:
When PrivateDevices=yes is set in the [Service] section of a systemd service unit file, the processes run for the service will run in a private file system namespace where /dev is replaced by a minimal version that only includes the device nodes /dev/null, /dev/zero, /dev/full, /dev/urandom, /dev/random, /dev/tty as well as the submounts /dev/shm, /dev/pts, /dev/mqueue, /dev/hugepages, and the /dev/stdout, /dev/stderr, /dev/stdin symlinks. No device nodes for physical devices will be included however.
Changes/PrivateDevicesAndPrivateNetwork:https://fedoraproject.org/wiki/Changes/PrivateDevicesAndPrivateNetwork
而根据ibv_get_device_lis源码,需要去/sys/class/infiniband读取list:
[root@a1 ceph]# ls /sys/class/infiniband
mlx5_0 mlx5_1
因为此时mds进程的私有系统空间不包含/sys/,所以报错:No such file or directory。
原文链接:https://blog.csdn.net/bandaoyu/article/details/116539866