Loading

[性能测试]网卡性能问题

描述

目前排查下来,是tcp重传次数导致bandwidth上不去,至于为啥重传次数过高,需要继续往下排查。
经过排查,最终问题是因为网卡的FIFO满了。

什么是FIFO?

在系统设计中,以增加数据传输率、处理大量数据流、匹配具有不同传输率的系统为目的而广泛使用FIFO存储器,从而提高了系统性能

排查过程如下:

1、利用tcpdump抓下在iperf3压测的时候的包命令如下:

tcpdump -i bondmg -e -n -p tcp and host 172.17.2.183 and port 5201 -vvv >>tcp.txt
# 经过分析可以看到传送数据和三次握手的时候经常会报checksum不匹配

15:40:27.814376 34:a2:a2:08:9e:03 > 34:a2:a2:08:9f:cf, ethertype 802.1Q (0x8100), length 78: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 64873, offset 0, flags [DF], proto TCP (6), length 60)
10.128.151.123.60508 > 172.17.2.183.targus-getdata1: Flags [S], cksum 0x95a6 (correct), seq 4161619188, win 29200, options [mss 1460,sackOK,TS val 84130205 ecr 0,nop,wscale 8], length 0 
15:40:27.814394 34:a2:a2:08:9f:cf > 34:a2:a2:08:9e:03, ethertype 802.1Q (0x8100), length 78: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
172.17.2.183.targus-getdata1 > 10.128.151.123.60508: Flags [S.], cksum 0x4426 (incorrect -> 0x9f85), seq 924348114, ack 4161619189, win 28960, options [mss 1460,sackOK,TS val 84102162 ecr 84130205,nop,wscale 8], length 0 
15:40:27.814443 34:a2:a2:08:9e:03 > 34:a2:a2:08:9f:cf, ethertype 802.1Q (0x8100), length 70: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 64874, offset 0, flags [DF], proto TCP (6), length 52)
10.128.151.123.60508 > 172.17.2.183.targus-getdata1: Flags [.], cksum 0x3f00 (correct), seq 1, ack 1, win 115, options [nop,nop,TS val 84130205 ecr 84102162], length 0 
15:40:27.814467 34:a2:a2:08:9e:03 > 34:a2:a2:08:9f:cf, ethertype 802.1Q (0x8100), length 107: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 64875, offset 0, flags [DF], proto TCP (6), length 89)
10.128.151.123.60508 > 172.17.2.183.targus-getdata1: Flags [P.], cksum 0x5f23 (correct), seq 1:38, ack 1, win 115, options [nop,nop,TS val 84130205 ecr 84102162], length 37
15:40:27.814473 34:a2:a2:08:9f:cf > 34:a2:a2:08:9e:03, ethertype 802.1Q (0x8100), length 70: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 8897, offset 0, flags [DF], proto TCP (6), length 52)
172.17.2.183.targus-getdata1 > 10.128.151.123.60508: Flags [.], cksum 0x441e (incorrect -> 0x3edc), seq 1, ack 38, win 114, options [nop,nop,TS val 84102162 ecr 84130205], length 0 
15:40:27.814493 34:a2:a2:08:9f:cf > 34:a2:a2:08:9e:03, ethertype 802.1Q (0x8100), length 71: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 8898, offset 0, flags [DF], proto TCP (6), length 53)
172.17.2.183.targus-getdata1 > 10.128.151.123.60508: Flags [P.], cksum 0x441f (incorrect -> 0x35d3), seq 1:2, ack 38, win 114, options [nop,nop,TS val 84102162 ecr 84130205], length 1 
15:40:27.814537 34:a2:a2:08:9e:03 > 34:a2:a2:08:9f:cf, ethertype 802.1Q (0x8100), length 70: vlan 151, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 64876, offset 0, flags [DF], proto TCP (6), length 52)
10.128.151.123.60508 > 172.17.2.183.targus-getdata1: Flags [.], cksum 0x3eda (correct), seq 38, ack 2, win 115, options [nop,nop,TS val 84130205 ecr 84102162], length 0          
~

2、看看这个bond的网卡都哪些包丢失了,命令如下

ethtool -s ens2
     rx_missed_errors: 294
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 313
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 13

3、发现rx_missed_errors值不为0,这个是干嘛的

Most drivers interchange their use of the counters rx_missed_errors, rx_fifo_errors, and rx_over_errors, but they typically set one or more of these counters to the MPC (missed packet count) counter, which is incremented when a packet arrives and is lost because the card's FIFO queue is full。

4、也就是说FIFO 队列满了,解决方法如下

# 查看当前设置
ethtool -g ens2 # 发现当前的hardware settings是512

# 调大
ethtool -G ens2 rx 4096
ethtool -G ens2 tx 4096

5、再次iperf3测试,达到预期9Gbit的带宽。
   但多次测试下来,非常不稳定,当iperf3压测时间延长比如60s,最后基本稳定在8Gbit左右。

解决不稳定问题

1、查看驱动版本:
   4.4.0-k-rh7.3
2、查看内核版本:
   3.10.0-693.17.1.el7.x86_64
3、查看万兆网卡型号:
   Intel X710
4、利用rpm -Uvh --nodeps强制升级内核版本3.10.0-862.14.4.el7.x86_64和网卡驱动版本5.1.0-k-rh7.5,解决该问题。

posted @ 2022-02-18 11:09  一介布衣·GZ  阅读(706)  评论(0编辑  收藏  举报