23文件描述符耗尽(二)linux【本地】
linux环境,
cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
ulimit -n
100001
sysctl -a | grep fs.file-nr 832 0 183945
==================================
1 纯文件
file.jar
1.1 ulimit -n 1024
打印
1016
1017
1018
1019
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at com.jds.test.bio.p12.FileClient.run(FileClient.java:30)
at com.jds.test.bio.p12.FileClient.main(FileClient.java:18)
1.1.1
[root@VM_0_9_centos ~]# lsof -p 26848|wc -l
1049 (不准?)
1.2 ulimit -n 2048
打印
2040
2041
2042
2043
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at com.jds.test.bio.p12.FileClient.run(FileClient.java:30)
at com.jds.test.bio.p12.FileClient.main(FileClient.java:18)
1.2.1
[root@VM_0_9_centos ~]# lsof -p 27081|wc -l
2073
2 纯socket
server.jar tcp.jar
2.1 两端ulimit -n 1024
服务端:
1013
1014
1015
1016
1017
Exception in thread "main" java.net.SocketException: Too many open files (Accept failed)
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at com.jds.test.bio.p12.Server.server(Server.java:31)
at com.jds.test.bio.p12.Server.main(Server.java:18)
[root@VM_0_9_centos ~]#
客户端:
1014
1015
1016
1017
1018
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
......
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
2.1.1
服务端进程没了
客户端:
[root@VM_0_9_centos ~]# lsof -p 29174|wc -l
31(由于server异常关闭导致这个数值不起眼也正常)
2.1.2
netstat -an|grep 12123
tcp6 0 0 127.0.0.1:12123 127.0.0.1:53496 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:52144 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:53312 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:52906 TIME_WAIT
可以看出,服务端异常退出关闭连接
2.2 两端ulimit -n 2048
服务端:
2036
2037
2038
2039
2040
2041
Exception in thread "main" java.net.SocketException: Too many open files (Accept failed)
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at com.jds.test.bio.p12.Server.server(Server.java:31)
at com.jds.test.bio.p12.Server.main(Server.java:18)
[root@VM_0_9_centos ~]#
客户端:
2036
2037
2038
2039
2040
2041
2042
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
...
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
Connection refused (Connection refused)
2.2.1
服务端进程没了
客户端:
[root@VM_0_9_centos ~]# lsof -p 29782|wc -l
31(由于server异常关闭导致这个数值不起眼也正常)
2.2.2
netstat -an|grep 12123
tcp6 0 0 127.0.0.1:12123 127.0.0.1:53496 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:52144 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:53312 TIME_WAIT
tcp6 0 0 127.0.0.1:12123 127.0.0.1:52906 TIME_WAIT
可以看出,服务端异常退出关闭连接
2.3 两端默认 100001,client循环10w次
两端打印到26260,后续1s一条连接,太慢了不等了
3 socket + file
server.jar client1.jar
3.1 两端ulimit -n 1024
服务端打印:
1013
1014
1015
1016
1017
Exception in thread "main" java.net.SocketException: Too many open files (Accept failed)
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at java.net.ServerSocket.accept(ServerSocket.java:528)
at com.jds.test.bio.p12.Server.server(Server.java:31)
at com.jds.test.bio.p12.Server.main(Server.java:18)
[root@VM_0_9_centos ~]#
客户端打印:
1013
1014
1015
1016
1017
1018
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
Too many open files
。。。。。
Too many open files
Too many open files
Too many open files
Too many open files
tcp done
start file
1
2
3
4
5
6
。。。
1014
1015
1016
1017
1018
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at com.jds.test.bio.p12.FileClient.run(FileClient.java:30)
at com.jds.test.bio.p12.Client.run(Client.java:32)
at com.jds.test.bio.p12.Client.main(Client.java:11)
3.1.1
这里有个奇怪的地方,client先搞定了1024个socket,然后又读了1024个文件,不合理啊
但是,
lsof -p 29619 | wc -l
1049
细心的可以发现,服务端挂了,释放了所有连接,client端一起释放了这些连接和文件描述符,故能再读1024个文件
3.2 将server端ulimit -n 2000使服务端不会挂 client端保持1024
结果还是没有改变,诊断一下
3.2.1
[root@VM_0_9_centos ~]# lsof -p 10212|wc -l
1050
[root@VM_0_9_centos ~]# lsof -p 10240|wc -l
1049
服务端与客户端都占用了1050左右个文件
3.2.2
tcp6 0 0 127.0.0.1:44846 127.0.0.1:12123 FIN_WAIT2
tcp6 1 0 127.0.0.1:12123 127.0.0.1:44518 CLOSE_WAIT
tcp6 1 0 127.0.0.1:12123 127.0.0.1:45590 CLOSE_WAIT
tcp6 0 0 127.0.0.1:44364 127.0.0.1:12123 FIN_WAIT2
tcp6 1 0 127.0.0.1:12123 127.0.0.1:45456 CLOSE_WAIT
可以看到服务端本次未异常关闭,socket的连接全都处于关闭流程,而且是有客户端率先发起关闭
为了继续验证socket为什么被关闭,我们剔除掉file的因素,进行3.3的实验
3.3 server.jar(2000) + tcp.jar (1024)
tcp6 0 0 127.0.0.1:47724 127.0.0.1:12123 FIN_WAIT2 tcp6 0 0 127.0.0.1:47266 127.0.0.1:12123 FIN_WAIT2 tcp6 1 0 127.0.0.1:12123 127.0.0.1:48202 CLOSE_WAIT tcp6 1 0 127.0.0.1:12123 127.0.0.1:48142 CLOSE_WAIT [root@VM_0_9_centos ~]# jps 12161 jar 12115 jar 12230 Jps [root@VM_0_9_centos ~]# lsof -p 12161|wc -l 31 [root@VM_0_9_centos ~]# lsof -p 12115|wc -l 1050
客户端已经没有socket fd文件占用了,FIN_WAIT2不占用文件描述符,来看代码
public void client() throws Exception { final String s2 = "localhost"; Socket s = new Socket(); s.connect(new InetSocketAddress(s2, PORT)); // 连接成功后计数 System.out.println(atomicInteger.incrementAndGet()); }
怀疑这段代码s被回收,而Socket是实现了Closeable接口,自动调用close,而服务端没有close或shutdownOutput,没有回应,所以有Fin_wait2 和close_wait
至于为啥服务端没有先回收,就不再深究了,可能服务端代码内存充足不需要回收
此外这段代码在mac下,没有连接断开:
JoycedeMacBook:~ joyce$ netstat -an|grep 12123|grep ESTABLISHED|wc -l 8118 JoycedeMacBook:~ joyce$ jps 4384 4995 Launcher 4996 AppMain 4988 AppMain 5020 Jps JoycedeMacBook:~ joyce$ lsof -p 4996|wc -l 4855 JoycedeMacBook:~ joyce$ lsof -p 4988|wc -l 4138
可能是mac本地内存充足不需要回收,而线上腾讯的机器内存太小,给的也太少,导致一次gc全回收了
4 终结
4.1 S 2000 C 1024,100次循环以下代码
public void client2() throws Exception { new Thread(new Runnable() { @Override public void run() { try { final String s2 = "localhost"; Socket s = new Socket(); s.connect(new InetSocketAddress(s2, PORT)); // 连接成功后计数 System.out.println(atomicInteger.incrementAndGet()); while (s.getInputStream().read(new byte[100]) != -1) { ; } } catch (Exception e) { e.printStackTrace(); } } }).start(); }
tcp6 0 0 127.0.0.1:49238 127.0.0.1:12123 ESTABLISHED
tcp6 0 0 127.0.0.1:12123 127.0.0.1:49272 ESTABLISHED
tcp6 0 0 127.0.0.1:49250 127.0.0.1:12123 ESTABLISHED
tcp6 0 0 127.0.0.1:49164 127.0.0.1:12123 ESTABLISHED
tcp6 0 0 127.0.0.1:12123 127.0.0.1:49120 ESTABLISHED
[root@VM_0_9_centos ~]# jps
13091 jar
12791 jar
13306 Jps
[root@VM_0_9_centos ~]# lsof -p 13091|wc -l
1049
[root@VM_0_9_centos ~]# lsof -p 12791|wc -l
132
客户端输出:
43
44
45
tcp done
46
47
48
...
98
99
100
start file
1
2
3
4
.......
914
915
916
917
918
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at com.jds.test.bio.p12.FileClient.run(FileClient.java:30)
at com.jds.test.bio.p12.Client.run(Client.java:32)
at com.jds.test.bio.p12.Client.main(Client.java:11)
100个tcp成功+918个file成功(共1018个fd)+最后1个file失败
4.1 S 2000 C 100,100次循环以下代码
tcp6 0 0 127.0.0.1:12123 127.0.0.1:50586 ESTABLISHED
tcp6 0 0 127.0.0.1:50602 127.0.0.1:12123 ESTABLISHED
tcp6 0 0 127.0.0.1:12123 127.0.0.1:50710 ESTABLISHED
tcp6 0 0 127.0.0.1:50598 127.0.0.1:12123 ESTABLISHED
tcp6 0 0 127.0.0.1:50630 127.0.0.1:12123 ESTABLISHED
[root@VM_0_9_centos ~]# jps
16112 jar
16149 jar
16319 Jps
[root@VM_0_9_centos ~]# lsof -p 16112|wc -l
126
[root@VM_0_9_centos ~]# lsof -p 16149|wc -l
125
客户端输出:
43
44
45
tcp done
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
46
47
48
49
。。。。
89
90
91
92
93
94
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.connect(Socket.java:605)
at java.net.Socket.connect(Socket.java:556)
at com.jds.test.bio.p12.TcpClient$1.run(TcpClient.java:59)
at java.lang.Thread.run(Thread.java:748)
start file
java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at com.jds.test.bio.p12.FileClient.run(FileClient.java:30)
at com.jds.test.bio.p12.Client.run(Client.java:32)
at com.jds.test.bio.p12.Client.main(Client.java:11)
94个tcp成功(共94个fd)+6个tcp失败+最后1个file失败
结论:
0 证明了centos 7.5 系统默认所有进程(系统级)183945,单进程(用户级)10w
1 linux环境下,立足用户级(未触达系统级,但触达用户级),通过纯文件、纯socket、socket+文件3种方式验证了文件描述符耗尽导致too many open files in system
2 linux环境下,立足用户级,通过socket+文件方式验证了socket导致的 too many open files 同样会继续影响到后续程序纯文件读取,证明是socket和纯file是同一个文件的概念
3 验证了Closeable接口会自动调用close
4 实践了linux环境 lsof -p port|wc -l的值可能比ulimit -n大
5 实践了FIN_WAIT2不占用文件描述符
附件: