服务器大量的fin_wait1 状态长时间存在原因分析
有一台服务器,出现很多的fin_wait1状态的socket。
环境:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.32-358.el6.x86_64
链路情况如下:
ss -s
Total: 2753 (kernel 3336)
TCP: 6046 (estab 730, closed 5001, orphaned 0, synrecv 0, timewait 5000/0), ports 564
Transport Total IP IPv6
* 3336 - -
RAW 1 1 0
UDP 599 599 0
TCP 1045 1037 8
INET 1645 1637 8
FRAG 0 0 0
[root@localhost ~]# ss -t state fin-wait-1
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 384259 server_ip:serverport 10.231.18.150:44581
0 763099 server_ip:serverport 10.231.17.55:45202
0 543095 server_ip:serverport 10.231.22.76:35348
0 2379216 server_ip:serverport 10.231.22.37:56283
0 1237680 server_ip:serverport 10.231.17.161:48815
0 1720677 server_ip:serverport 10.231.16.73:51550
0 619159 server_ip:serverport 10.231.138.28:58986
0 474399 server_ip:serverport 10.231.18.82:45256
0 928420 server_ip:serverport 10.231.20.171:53326
0 27771 server_ip:serverport 10.231.138.38:58963
0 614664 server_ip:serverport 10.231.26.94:51083
0 152535 server_ip:serverport 10.231.19.184:43375
只贴出一部分。
仔细看,发现fin-wait-1的发送队列中,还有一部分数据没有发送到对端,当然用户态close的时候,这种情况是允许的,是没有经过确认的报文缓存。
看fin-wait-1状态的socket占用了多少内存:
ss -tn state fin-wait-1|grep -v Recv-Q|awk 'BEGIN{sum=0}{sum+=$2}END{printf "%.2f\n",sum}'
221797627.00
发现占用还是很多的,因为这个是我在修改tcp_max_orphans为很小的情况下才占用的200多M,事实上在我没处理之前,占用达到了16G。
ss里面还有个-m选项,也是查看socket的内存的,不过我没怎么喜欢用,它主要查看的是:
if (tb[INET_DIAG_MEMINFO]) { const struct inet_diag_meminfo *minfo = RTA_DATA(tb[INET_DIAG_MEMINFO]); printf(" mem:(r%u,w%u,f%u,t%u)", minfo->idiag_rmem, minfo->idiag_wmem, minfo->idiag_fmem, minfo->idiag_tmem);
比如:
[root@ZSL-VS3000-3 ~]# ss -miten -o state last-ack|head -8 Recv-Q Send-Q Local Address:Port Peer Address:Port 1 7 49.115.46.140:60422 218.31.255.143:21 timer:(persist,097ms,1) ino:0 sk:ffff880d91cdd080 mem:(r0,w558,f3538,t0) sack cubic wscale:9,9 rto:202 rtt:2.625/1.75 ato:40 cwnd:1 ssthresh:7 send 4.4Mbps rcv_space:14600 0 51 49.115.46.140:21 110.157.1.10:54611 timer:(persist,147ms,1) ino:0 sk:ffff88140f9b0340 mem:(r0,w602,f3494,t0) sack cubic wscale:9,9 rto:213 rtt:13.625/10.25 ato:40 cwnd:1 ssthresh:7 send 857.2Kbps rcv_space:14600 0 51 49.115.46.140:21 110.157.2.9:59688 timer:(persist,174ms,1) ino:0 sk:ffff880560f88a40 mem:(r0,w602,f3494,t0) sack cubic wscale:9,9 rto:219 rtt:19.875/11.75 ato:40 cwnd:1 ssthresh:3 send 587.7Kbps rcv_space:14600 0 51 49.115.46.140:21 110.157.1.9:51252 timer:(persist,003ms,1) ino:0 sk:ffff88012d400d80
网上搜索,大多是修改tcp_fin_timeout,其实这个是针对FIN-WAIT-2的。
tcp_fin_timeout- INTEGER Time to hold socket in state FIN-WAIT-2, if it was closed
by our side. Peer can be broken and never close its side,
or even died unexpectedly. Default value is 60sec.
Usual value used in 2.2 was 180 seconds, you may restore
it, but remember that if your machine is even underloaded WEB server, you risk to overflow memory with kilotons of dead sockets,
FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
because they eat maximum 1.5K of memory, but they tend
to live longer. Cf. tcp_max_orphans.
内核中对TCP_FIN_WAIT1的设置是:
} else if (tcp_close_state(sk)) {
/* We FIN if the application ate all the data before
* zapping the connection.
*/
/* RED-PEN. Formally speaking, we have broken TCP state
* machine. State transitions:
*
* TCP_ESTABLISHED -> TCP_FIN_WAIT1
* TCP_SYN_RECV -> TCP_FIN_WAIT1 (forget it, it's impossible)
* TCP_CLOSE_WAIT -> TCP_LAST_ACK
*
* are legal only when FIN has been sent (i.e. in window),
* rather than queued out of window. Purists blame.
*
* F.e. "RFC state" is ESTABLISHED,
* if Linux state is FIN-WAIT-1, but FIN is still not sent.
*
* The visible declinations are that sometimes
* we enter time-wait state, when it is not required really
* (harmless), do not send active resets, when they are
* required by specs (TCP_ESTABLISHED, TCP_CLOSE_WAIT, when
* they look as CLOSING or LAST_ACK for Linux)
* Probably, I missed some more holelets.
* --ANK
*/
tcp_send_fin(sk);
也就是先设置fin-wait-1状态,然后再发fin包,
理论上说,fin-wait-1的状态应该很难看到才对,因为只要收到对方的ack,就应该迁移到fin-wait-2了,如果收到对方的fin,则应该迁移到
closing状态。但是该服务器上就很多。难道没有收到ack或者fin包?抓包验证下:
进而抓包, tcpdump -i eth0 "tcp[tcpflags] & (tcp-fin) != 0" -s 90 -p -n
根据报文,看到了fin包的重传,而有时候刚好遇到对端通告的窗口为0,所以进入fin-wait-1后呆的时间就比较长。
比如我看到fin-wait-1之后,我可以抓之后的交互包,
[root@localhost ~]# ss -to state Fin-wait-1 |head -200 |grep -i 10.96.152.219
0 387543 服务器ip 客户端ip 1 timer:(persist,1.140ms,0)
在这里我们看到了坚持定时器,通过脚本,我获取了fin-wait-1的链路,有的定时器的长度退避之后,达到了120s,然后一直120s进行探测。
在这里说一下脚本的思路:
#!/bin/bash
while [ 1 ]
do
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端口1|grep -v 服务器端口2|sort -n >caq_old
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端口1|grep -v 服务器端口1|sort -n >caq_new
diff caq_old caq_new|grep '>'|awk '{print $2}' >diff_caq
sleep 1
###1s之后,重新获取fin-wait-1状态的链路情况,如果新增的链路还存在,那么说明该链路至少存活超过1s,主要是抓坚持定时器比较长的情况
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端口1|grep -v 服务器端口2|sort -n >caq_new
while read line
do
grep -q $line caq_new
if [ $? -eq 0 ]
then
###live for 1 second
echo $line
exit
else
continue;
fi
done < diff_caq
done
然后抓包:
./get_fin.sh |awk -F '[:]' '{print $2}'|xargs -t tcpdump -i eth1 -s 90 -c 200 -p -n port
可以看到部分抓包:
20:34:26.355115 IP 服务器ip和端口 > 客户端ip和端口: Flags [.], ack 785942062, win 123, length 0
20:34:26.358210 IP 客户端ip和端口 >服务器ip和端口: Flags [.], ack 1, win 0, length 0
20:34:28.159117 IP服务器ip和端口 > 客户端ip和端口: Flags [.], ack 1, win 123, length 0
20:34:28.162169 IP 客户端ip和端口 >服务器ip和端口: Flags [.], ack 1, win 0, length 0
20:34:31.761502 IP 服务器ip和端口 > 客户端ip和端口: Flags [.], ack 1, win 123, length 0
20:34:31.765416 IP 客户端ip和端口 > 服务器ip和端口: Flags [.], ack 1, win 0, length 0