【vbers】libibverbs 线程安全级别|libibverbs thread safe level

让我们从底线开始:

verbs API 是完全线程安全的,并且可以从进程中的每个线程调用verbs。线程安全的一部分在 libibverbs 级别实现,一部分在底层驱动程序库级别实现。甚至可以从不同的线程处理相同的资源(操作的原子性得到保证)。

支持的多线程操作包括但不限于:

  1. 使用 RDMA 设备打开上下文。
  2. 读取异步事件 - 每个事件将由一个线程准确读取。
  3. 确认异步事件。
  4. 创建与同一对象(Context、PD、CQ、SRQ)相关联的 RDMA 资源。(libibverbs 能)保证每个新创建的 RDMA 资源都有自己的唯一编号。例如:一个特定的QP编号在某一特定时间将只分配给一个QP。 l_key 和 r_key 也是如此。
  5. 销毁与同一对象(上下文、PD、CQ、SRQ)关联的 RDMA 资源。
  6. 查询或修改 RDMA 资源。
  7. 将工作请求发布到任何队列(QP 或 SRQ)——不同的线程可能会发布到不同的 RDMA 资源,不同的线程可能会发布到相同的 RDMA 资源。
  8. 轮询特定 CQ 上的工作完成 - 每个工作完成将由一个轮询线程准确读取。
  9. 请求来自 CQ 的通知。
  10. 读取完成事件。
  11. 确认完成事件。
  12. 将 QP 附加到多播组/从多播组分离。

关于内部实现的一点信息:这种线程安全性是通过使用 libibverbs 的 pthread 原语和底层驱动程序库(例如自旋锁和互斥锁、条件变量)来保证的。

一般来说:

  1. 互斥体用于保护控制路径中的关键部分。
  2. 条件变量用于控制路径中资源的引用计数。
  3. 自旋锁被用于可以在数据路径中访问的 RDMA 对象区域,例如:CQ、QP 和 SRQ。(这些资源访问极快)这允许快速调度线程

但是,创建 RDMA 资源通常涉及动态内存分配,而销毁 RDMA资源通常涉及动态内存释放。
同一个资源不能在任何线程中被多次销毁,并且资源在被销毁后不能使用。用户必须遵守这些规则,否则可能会导致分段错误。
一个好的做法是在创建它的同一线程中释放每个 RDMA 资源。这不是强制性的,但它是防止双重销毁或使用已销毁资源的好方法。

常见问题


RDMA verbs 线程安全吗还是我必须使用互斥锁保护 RDMA 代码?
RDMA verbs是完全线程安全的。


在线程中使用 RDMA verbs有哪些限制?
没有任何限制。避免多次销毁资源并避免使用已销毁的资源是与覆盖线程编程无关的限制。

原文:libibverbs thread safe level - RDMAmojo RDMAmojo

Let's start with the bottom line: the verbs API is fully thread safe and verbs can be called from every thread in the process. Part of the thread safe is implemented at the libibverbs level and part of it is implemented at the low-level driver library level.

The same resource can even be handled from different threads (the atomicity of the operations is guaranteed). The supported operations that can be performed in multiple threads include, but not limited to:

  • Opening context using RDMA device.
  • Reading Asynchronous events - each event will be read exactly by one thread.
  • Acknowledging an Asynchronous event.
  • Creating RDMA resources which are associated with the same object (Context, PD, CQ, SRQ). It is guaranteed that each newly created RDMA resource will have its own unique number. For example: a specific QP number will be assigned at a given time to only one QP. Same goes for l_key and r_key.
  • Destroying RDMA resources which are associated with the same object (Context, PD, CQ, SRQ).
  • Query or modify RDMA resources.
  • Posting Work Request to any Queue (QP or SRQ) - different threads may post to different RDMA resources, and different threads may post to the same RDMA resource.
  • Polling for Work Completions on a specific CQ - each Work Completion will be read exactly by one of the polling threads.
  • Requesting notifications from a CQ.
  • Reading a Completion event.
  • Acknowledging a Completion Event.
  • Attaching/detaching a QP to/from multicast groups.

A little bit information about the internal implementation:
This thread safeness is guaranteed by using pthread primitives by libibverbs and the low-level driver libraries, such as spinlocks and mutexes, conditional variables. In general:

  • Mutexes are being used for protecting critical sections in the control path.
  • Conditional variables are being used for reference counting of resources in the control path.
  • spinlocks are being used in RDMA objects areas that may be accessed in the data-path, for example: CQ, QP and SRQ. This allows fast scheduling of threads

However, creating RDMA resources usually involved in dynamic memory allocation and destroying RDMA resources usually involved in a dynamic memory release. The same resource cannot be destroyed more than once, at any thread, and a resource cannot be used after it was destroyed. It is up to the user to follow those rules and not doing so may result in a segmentation fault.

A good practice will be releasing every RDMA resource in the same thread that it was created in. This isn't mandatory, but is a good way to prevent double destruction or using a destroyed resource.

FAQs

Are the RDMA verbs thread safe or do I have to protect the RDMA code with a mutex?

Yes. The RDMA verbs are fully thread safe.

What are the limitations of working with RDMA verbs in threads?

There aren't any limitations. Avoid destroying a resource more than once and avoid working with a resource that was destroyed is a limitation that isn't related to mulch-threaded programming.


 

posted on 2022-10-04 01:22  bdy  阅读(8)  评论(0编辑  收藏  举报

导航