【RDMA】RoCE Debug Flow for Linux(Linux下调试RoCE的流程)

原文:https://community.mellanox.com/s/article/RoCE-Debug-Flow-for-Linux

 

*For the RoCE recommended configuration and verification, please click here .

This post provides guidelines for how to debug the RoCE network and how to tune RoCE performance. The following flowchart describes the process of RoCE troubleshooting.
Information on how to run the tests listed in the flowchart below can be found in the subsequent sections.

Figure 1: RoCE Debug Flow Chart


 

Test #1 - Check RDMA Connectivity using ibv_rc_pingpong


This test verifies that RoCE traffic can be sent between the client and the server sides. This test does not require rdma-cm to be enabled.
To check the RDMA connectivity, follow the steps below.

On the server  side
  1. Find the server’s ibdev(s) using:
  1. rdma command, in case you are working with Upstream.
    The output will be a list of the servers’ InfiniBand devices and their matching netdevs.
# rdma link
1/1: mlx5_0/1: state ACTIVE physical_state LINK_UP netdev enp17s0f0
2/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev enp17s0f1
3/1: mlx5_2/1: state ACTIVE physical_state LINK_UP netdev enp134s0f0
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev enp134s0f1


OR:
 
  1. ibdev2netdev command, in case you are working with OFED.
    The output will be a list of the servers’ InfiniBand devices and their matching netdevs.

# ibdev2netdev
# ibdev2netdev
mlx5_0 port 1 ==> enp17s0f0 (Up)
mlx5_1 port 1 ==> enp17s0f1 (Up)
mlx5_2 port 1 ==> enp134s0f0 (Up)
mlx5_3 port 1 ==> enp134s0f1 (Up)
 
  1. Find the netdev’s IP address. Select an InfiniBand device from the previous step to be tested, and find the matching netdev’s IP address.
Note: In the examples to follow, the netdev used is mlx5_1/1, obtained from the previous step.

# ip address  show dev enp17s0f1
12: enp17s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ec:0d:9a:ae:11:9d brd ff:ff:ff:ff:ff:ff
    inet 12.7.156.240/8 brd 12.255.255.255 scope global enp17s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::ee0d:9aff:feae:119d/64 scope link
       valid_lft forever preferred_lft forever


 
  1. Find the netdev’s GID.
        On the same device, find the matching netdev’s GID using show_gids command.

# show_gids mlx5_1
DEV      PORT     INDEX    GID                                             IPv4              VER      DEV
---      ----     -----    ---                                             ------------      ---      ---
mlx5_1   1        0        fe80:0000:0000:0000:ee0d:9aff:feae:119d                        v1       enp17s0f1
mlx5_1   1        1        fe80:0000:0000:0000:ee0d:9aff:feae:119d                        v2       enp17s0f1
mlx5_1   1        2        0000:0000:0000:0000:0000:ffff:0c07:9cf0    12.7.156.240      v1       enp17s0f1
mlx5_1   1        3        0000:0000:0000:0000:0000:ffff:0c07:9cf0    12.7.156.240      v2       enp17s0f1
n_gids_found=4
 
  1. Run ibv_rc_pingpong as server to ensure connectivity is achieved.
Run the rc ping pong server tool using the RoCE v2 GID obtained from the previous step (Index 3 from the table above). This can be done using ibv_rc_pingpong command in case you are working with Upstream.

# ibv_rc_pingpong -d mlx5_1 -g 3
  local address:  LID 0x0000, QPN 0x003968, PSN 0x3869d8, GID ::ffff:12.7.156.240
  remote address: LID 0x0000, QPN 0x001960, PSN 0x39c9d6, GID ::ffff:12.7.156.239
8192000 bytes in 0.01 seconds = 12475.92 Mbit/sec
1000 iters in 0.01 seconds = 5.25 usec/iter

On the client side
  1. Find the server’s ibdev(s) using:
  1. rdma command, in case you are working with Upstream.
      The output will be a list of the servers’ InfiniBand devices and their matching netdevs.
 
# rdma link
1/1: mlx5_0/1: state ACTIVE physical_state LINK_DOWN netdev enp17s0f0
2/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev enp17s0f1
3/1: mlx5_2/1: state ACTIVE physical_state LINK_DOWN netdev enp134s0f0
4/1: mlx5_3/1: state ACTIVE physical_state LINK_DOWN netdev enp134s0f1


OR
 
  1. ibdev2netdev command, in case you are working with OFED.
      The output will be a list of the servers’ InfiniBand devices and their matching netdevs.

#ibdev2netdev
mlx5_0 port 1 ==> enp17s0f0 (Down)
mlx5_1 port 1 ==> enp17s0f1 (Up)
mlx5_2 port 1 ==> enp134s0f0 (Down)
mlx5_3 port 1 ==> enp134s0f1 (Down)

Note: In the examples to follow, the netdev used is mlx5_1/1.
  1. Find the server’s GID using show_gids command.

# show_gids mlx5_1
DEV      PORT     INDEX    GID                                             IPv4              VER      DEV
---      ----     -----    ---                                             ------------      ---      ---
mlx5_1   1        0        fe80:0000:0000:0000:ee0d:9aff:feae:11e5                        v1       enp17s0f1
mlx5_1   1        1        fe80:0000:0000:0000:ee0d:9aff:feae:11e5                        v2       enp17s0f1
mlx5_1   1        2        0000:0000:0000:0000:0000:ffff:0c07:9cef    12.7.156.239      v1       enp17s0f1
mlx5_1   1        3        0000:0000:0000:0000:0000:ffff:0c07:9cef    12.7.156.239      v2       enp17s0f1
n_gids_found=4
 
  1. Run rc ping pong client
Run the rc ping pong client tool using the RoCE v2 GID obtained from the previous step (Index 3 from the table above). This can be done using ibv_rc_pingpong command in case you are working with Upstream.

#  [root@l-csi-0124l ~]# ibv_rc_pingpong -d mlx5_1 -g 3 12.7.156.240
  local address:  LID 0x0000, QPN 0x001960, PSN 0x39c9d6, GID ::ffff:12.7.156.239
  remote address: LID 0x0000, QPN 0x003968, PSN 0x3869d8, GID ::ffff:12.7.156.240
8192000 bytes in 0.00 seconds = 14864.14 Mbit/sec
1000 iters in 0.00 seconds = 4.41 usec/iter
[root@l-csi-0124l ~]#
 

Results

Success criteria: average bandwidth on the client side is larger than 0.
In case the test was completed successfully but you have no RDMA service, please contact Mellanox support with the output of sysinfo snapshot command, which can be downloaded at:  https://github.com/Mellanox/linux-sysinfo-snapshot
In case of failure, check IP connectivity (see the following Test #2: Basic RDMA Check)

Extra info

Test #2: Basic RDMA Check

This test verifies some basic preconditions for RDMA traffic establishment .
 
  1. Check that RoCE is enabled on both the server and the client sides.

# lspci -D | grep Mellanox
0000:11:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
0000:11:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
0000:86:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0000:86:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

# cat /sys/bus/pci/devices/0000\:11\:00.1/roce_enable
1
If RoCE is disabled (roce_enable is set to 0), enable it:
 
  1. Using rdma command (in case you are working with Upstream).
# devlink dev param set pci/0000:00:00.1 name enable_roce value 1 cmode runtime

OR
 
  1. Using ibdev2netdev command (in case you are working with OFED)

#  echo 1 > /sys/bus/pci/devices/0000\:11\:00.1/roce_enable 
 
  1. Perform an MTU check.
RoCE requires an MTU of at least 1024 bytes for net payload. In the sub-steps below, check for larger MTUs that address additional headers, such as IP header, VLANs, tunnels, etc.
The MTU shall be guaranteed end-to-end without the need to perform segmentation and reassembly.
 

2.a. Set the MTU value on the server and the client sides:

Verify that the MTU is larger than 1250 Bytes
# ip address  show dev enp17s0f1
12: enp17s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ec:0d:9a:ae:11:9d brd ff:ff:ff:ff:ff:ff
    inet 12.7.156.240/8 brd 12.255.255.255 scope global enp17s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::ee0d:9aff:feae:119d/64 scope link
       valid_lft forever preferred_lft forever
 

2.b. Perform an end-to-end MTU check. Ping the server:


# ping -f -c 100  -s 1250 -M do  12.7.156.240
PING 12.7.156.240 (12.7.156.240) 1250(1278) bytes of data.
 
--- 12.7.156.240 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.003/0.003/0.012/0.001 ms, ipg/ewma 0.008/0.003 ms

Success criteria: Both tests above passed OK. If not, please correct the MTU size

 

Step 4. Check Device Info by running ibv_devinfo on the server

# ibv_devinfo  -d mlx5_1 -vvv
hca_id:  mlx5_1
          transport:                           InfiniBand (0)
          fw_ver:                               16.24.1000



                             GID[  1]:                   fe80:0000:0000:0000:ee0d:9aff:feae:119d
                             GID[  2]:                   0000:0000:0000:0000:0000:ffff:0c07:9cf0
                             GID[  3]:                   0000:0000:0000:0000:0000:ffff:0c07:9cf0

Success criteria: command succeeded.
 

Results

If configuration has been updated as a result of the test (such as change of MTU), it means the test is done successfully. In such case, check IP connectivity (Test #3)
If the issue still exists, re-do the steps in Test #1.
In case of failure (command returned with an error, hang, etc,), contact Mellanox support with the output of sysinfo snapshot command which can be downloaded at  https://github.com/Mellanox/linux-sysinfo-snapshot

Extra info

More details on ping command can be found at:
https://linux.die.net/man/8/ping
More details on show_gids command can be found at:
https://community.mellanox.com/s/article/understanding-show-gids-script
More details on ibv_devinfo command can be found at:
https://linux.die.net/man/1/ibv_devinfo

Test #3: Check IP Connectivity using Ping

This test verifies that IP traffic can be sent between the client and the server sides.

On the server side:
Find the server’s IP address by following the second step in Test #1 above.

On the client side:
Ping the server# ping -f -c 100 12.7.156.240
PING 12.7.156.240 (12.7.156.240) 56(84) bytes of data.
 
--- 12.7.156.240 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.002/0.003/0.015/0.002 ms, ipg/ewma 0.007/0.002 ms

Results

Success criteria: low packet loss, 0% is preferred  
On success go to: Contact Mellanox support with the output of sysinfo snapshot command which can be downloaded at  https://github.com/Mellanox/linux-sysinfo-snapshot
Upon failure, go to: Verify IP, Ethernet connectivity (‎7)

Extra info

More details on the ping command can be found at:
https://linux.die.net/man/8/ping

Test #4: Verify IP and Ethernet Connectivity

This test enables you to track down the reason for not having IP connectivity.  To check for IP and Ethernet connectivity issues, run the following tests.

Test #4. A: IP connectivity problems might be a result of the interface being down. For that, check the port state by verifying that the physical port is up:
# ip address  show dev enp17s0f1
12: enp17s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ec:0d:9a:ae:11:9d brd ff:ff:ff:ff:ff:ff
    inet 12.7.156.240/8 brd 12.255.255.255 scope global enp17s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::ee0d:9aff:feae:119d/64 scope link
       valid_lft forever preferred_lft forever

Test #4 B: Make sure that the number of dropped packets does not increase from one run of IP command to another.

Results

Success criteria: Test is done and IP connectivity is resumed.
If the issue still exists, contact Mellanox support and provide them with the output of sysinfo snapshot command which can be downloaded at  https://github.com/Mellanox/linux-sysinfo-snapshot
 

posted on 2022-10-04 01:23  bdy  阅读(137)  评论(0编辑  收藏  举报

导航