OFI

ofi-guide/OFIGuide.md at master · ofiwg/ofi-guide · GitHub

 /////////////////////////////

////////////////////////////

联想的DAOS形态产品文档
《Designing DAOS Solutions with Lenovo ThinkSystem SR630 Servers》
 
High Performance Network Adapters
To ensure a balanced network bandwidth that matches the eight NVMe SSDs with a total of 32 PCIe lanes, two 16-lane Mellanox ConnectX-5 or ConnectX-6 VPI cards provide connectivity to the HPC fabric (one card on each CPU socket).
The recommended HPC fabric for DAOS is InfiniBand, with the libfabrics ofi+verbs provider. ConnectX-5 cards support EDR, and ConnectX-6 cards support HDR100.
Since all of these cards are VPI cards, they can also be used in 100 Gbps Ethernet mode (together with the libfabrics ofi+tcp or ofi+sockets provider).
////////////////////////////////////

High Performance Network Adapters
To ensure a balanced network bandwidth that matches the eight or ten NVMe PCIe 4.0 SSDs (with a total of 32 or 40 PCIe 4.0 lanes), two PCIe 4.0 16-lane Mellanox ConnectX-6 VPI cards provide connectivity to the HPC fabric (one card on each CPU socket). The recommended HPC fabric for DAOS is InfiniBand, with the libfabrics ofi+verbs provider. The ConnectX-6 cards support full HDR speed (200 Gbps),

and since they are VPI cards they can also be used in 100 Gbps or 200 Gbps Ethernet mode (together with the libfabrics ofi+tcp or ofi+sockets provider). It is also possible to use dual-port ConnectX-6 HDR100 cards, whose two ports in aggregate also

///////////////////////////

DAOS2.0的官方文档
Fabric Support
DAOS relies on libfabric for communication in the DAOS data plane.
DAOS Version 2.0 requires at least libfabric Version 1.14. The RPM distribution of DAOS includes libfabric RPM packages with the correct version. It is strongly recommended to use exactly the provided libfabric version on all DAOS servers and all DAOS clients.
Not all libfabric providers are supported.

DAOS Version 2.0 has been validated mainly with the verbs provider for InfiniBand fabrics, and the sockets provider for other fabrics.
Future DAOS releases may add support for additional libfabric providers.

////////////////////////////////

PSM

High-speed InfiniBand networking from Intel. See fi_psm(7) for more information.

PSM2

High-speed Omni-Path networking from Intel. See fi_psm2(7) for more information.

PSM3

High-speed Ethernet networking from Intel. See fi_psm3(7) for more information.

Verbs

This provider uses the Linux Verbs API for network transport. Application performance is, obviously expected to be similar to that of the native Linux Verbs API. Analogous to the Sockets provider, the Verbs provider is intended to enable developers to write, test, and debug application code on platforms that only have Linux Verbs-based networking. See fi_verbs(7) for more information.

 /////////////////////////////////////////////

psm3
The Intel® Performance Scaled Messaging 3 (Intel® PSM3) provider is a high-performance protocol that provides a low-level communication interface for the Intel® Ethernet Fabric Suite family of products. PSM3 enables mechanisms that are necessary for implementing higher level communication interfaces in parallel environments such as MPI and AI training frameworks.
The Intel® PSM3 interface differs from the Intel® Omni-Path PSM2 interface in the following ways:
  • PSM3 includes new features and optimizations for Intel® Ethernet Fabric hardware and processors.
  • PSM3 supports only the Open Fabrics Interface (OFI, aka Libfabric). The PSM API is no longer exposed.
  • PSM3 includes performance improvements specific to the Intel® Ethernet Fabric Suite.
  • PSM3 supports standard Ethernet networks and leverages standard RoCEv2 protocols as implemented by the Intel® Ethernet Fabric Suite NICs.
 /////////////////////////////////////////////

Verbs
The verbs provider enables applications using OFI to be run over any verbs hardware (Infiniband, iWarp, and RoCE). It uses the Linux Verbs API for network transport and translates OFI calls to appropriate verbs API calls. It uses librdmacm for communication management and libibverbs for other control and data transfer operations.
See the fi_verbs(7) man page for more details.
Dependencies
The verbs provider requires libibverbs (v1.1.8 or newer) and librdmacm (v1.0.16 or newer). If you are compiling libfabric from source and want to enable verbs support, you will also need the matching header files for the above two libraries. If the libraries and header files are not in default paths, specify them in CFLAGS, LDFLAGS and LD_LIBRARY_PATH environment variables.

Troubleshooting / Known issues

fi_getinfo returns -FI_ENODATA

  • Set FI_LOG_LEVEL=info or FI_LOG_LEVEL=debug (if debug build of libfabric is available) and check if there any errors because of incorrect input parameters to fi_getinfo.
  • Check if “fi_info -p verbs” is successful. If that fails the following checklist may help in ensuring that the RDMA verbs stack is functional:
    • If libfabric was compiled, check if verbs provider was built. Building verbs provider would be skipped if its dependencies (listed in requirements) aren’t available on the system.
    • Verify verbs device is functional:
      • Does ibv_rc_pingpong (available in libibverbs) test work?
        • Does ibv_devinfo (available in libibverbs) show the device with PORT_ACTIVE status?
          • Check if Subnet Manager (SM) is running on the switch or on one of the nodes in the cluster.
          • Is the cable connected?
    • Verify librdmacm is functional:
      • Does ucmatose test (available in librdmacm) work?
      • Is the IPoIB interface (e.g. ib0) up and configured with a valid IP address?
posted @ 2022-01-02 16:15  乌鸦嘴-raven  阅读(274)  评论(0编辑  收藏  举报