socket 编程 : shutdown vs close
TCP/IP 四次挥手
首先作者先描述一下TCP/IP 协议中四次挥手的过程,如果对此已经熟悉的读者可以跳过本节。
四次挥手
这是一个很经典的示例图,众所周知tcp socket 在一个生命周期中有很多个状态,读者可以使用ss命令查看,其中在断开连接的时候 client端 会经历如下三个状态:FIN_WAIT1、FIN_WAIT2、TIME_WAIT 直到CLOSED, 而server端会经历CLOSE_WAIT、LAST_ACK 直到CLOSED。
shutdown vs close
在linux c++ 网络编程中 socket的关闭有两个常用的函数 close 和 shutdown两个函数。作者今天讨论一下在tcp/ip 协议中这两个函数有什么不同。
功能上
linux有一个特点:file、 socket、 dev 都会通过一个 file description (文件描述符)标识,都抽象成IO操作。 对于close 函数来讲,socket 的 fd 与其他fd 描述符没啥区别。下面给出 close 函数的描述:
close() closes a file descriptor, so that it no longer refers to any
file and may be reused. Any record locks (see fcntl(2)) held on the
file it was associated with, and owned by the process, are removed
(regardless of the file descriptor that was used to obtain the lock).
If fd is the last file descriptor referring to the underlying open
file description (see open(2)), the resources associated with the
open file description are freed; if the file descriptor was the last
reference to a file which has been removed using unlink(2), the file
is deleted.
主要注意的有两点:一、一个进程中调用 close 函数会减少 fd的内核引用计数, 如果是最后一个引用 fd 的进程调用了close, 就会将fd 对应的资源彻底释放; 二、在进程中调用close 后 该fd不可以再使用。
对应于 tcp/ip socket 编程来讲,如果一个 socket 在 n 个进程中使用,只有一个进程 close( socket fd) 是不会触发 tcp/ip 的四次挥手过程。但是 在调用 close函数后, 该socket fd不可以在该进程中被函数调用来与其他进程通信。
tcp/ip 是一个全双工的面向链接的通信协议,一个tcp/ip socket可以同时用于收取和发送信息, 那么就可能存在如下的场景: 进程不再需要读取数据 但仍然需要接受数据 或者 相反的情况。shutdown() 函数就具有这种能力,shutdown()函数描述如下:
The shutdown() call causes all or part of a full-duplex connection on
the socket associated with sockfd to be shut down. If how is
SHUT_RD, further receptions will be disallowed. If how is SHUT_WR,
further transmissions will be disallowed. If how is SHUT_RDWR,
further receptions and transmissions will be disallowed.
在调用函数的时候可以设置关闭的模式:SHUT_RD 关闭读取、 SHUT_WR 关闭写入、 SHUT_RDWR 完全关闭。
实际情况
从函数的介绍上,我们可以很清楚的看出两者的区别,那实际上两者实际上在tcp/ip 协议中会触发怎样操作?? 作者做了一个简单的实验:
通过 tcpdump 抓取 close函数、以及shutdown的三种关闭模式的网络包,分析其在底层网络上的行为。
下面贴出测试代码的主函数(可以忽略代码部分,直接看实际的实验结果):
int main(int argc, char const *argv[])
{
/* code */
if (argc < 2) {
printf("must select: 1:server or 2: client.\n");
return -1;
}
int type = atoi(argv[1]);
CTcp* s = new CTcp();
if(type == 1 ){
s->registryCallBackMethod((void*)recv_cb_ch, NULL, CBase::READ);
s->registryCallBackMethod((void*)close_cb, NULL, CBase::CLOSE);
s->bindAddress("127.0.0.1", 9906);
s->startServer();
std::cin.get();
}else{
s->connect("127.0.0.1", 9906, 5);
s->sendMessage("hello", sizeof("hello"));
printf("must input the test type:\n 1: close 2: shutdown: \n");
scanf("%d", &type);
int client = s->socketClient();
switch(type){
case 1:{
close(client);
std::cin.get();
}
break;
case 2:{
printf("please input shutdown type:\n 1: read, 2: write, 3: all\n");
scanf("%d", &type);
if (type == 1){
shutdown(client, SHUT_RD);
}else if(type == 2){
shutdown(client, SHUT_WR);
}else{
shutdown(client, SHUT_RDWR);
}
std::cin.get();
std::cin.get();
}
break;
default:
printf("the type is not support %d\n", type);
}
}
delete s;
s = NULL;
return 0;
}
作者的实验环境是在 centos 系统的云主机中,调用的socket函数为标准库函数,下面贴出实验过程和结果:
setup1. 启动服务端
$ ./shutdown 1
the max is 3
the server time is out
setup2. 启动tcpdump监听
$sudo tcpdump -i lo -vv
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
因为服务端绑定的是 127.0.0.1地址,所以在tcpdump 中指定了 lo (本地回环网卡)。
setup3. 启动客户端
$./shutdown 2
connect .....
connect is success!
must input the test type:
1: close 2: shutdown:
客户端在链接后会自动发送一个'hello' 消息给服务端, 此时tcpdump抓取到如下的数据包:
09:17:41.773070 IP (tos 0x0, ttl 64, id 17657, offset 0, flags [DF], proto TCP (6), length 60)
localhost.41894 > localhost.9906: Flags [S], cksum 0xfe30 (incorrect -> 0x3df3), seq 967462950, win 43690, options [mss 65495,sackOK,TS val 2188873883 ecr 0,nop,wscale 7], length 0
09:17:41.773098 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
localhost.9906 > localhost.41894: Flags [S.], cksum 0xfe30 (incorrect -> 0xc0f0), seq 989081322, ack 967462951, win 43690, options [mss 65495,sackOK,TS val 2188873883 ecr 2188873883,nop,wscale 7], length 0
09:17:41.773124 IP (tos 0x0, ttl 64, id 17658, offset 0, flags [DF], proto TCP (6), length 52)
localhost.41894 > localhost.9906: Flags [.], cksum 0xfe28 (incorrect -> 0x9335), seq 1, ack 1, win 342, options [nop,nop,TS val 2188873883 ecr 2188873883], length 0
09:17:41.773168 IP (tos 0x0, ttl 64, id 17659, offset 0, flags [DF], proto TCP (6), length 58)
localhost.41894 > localhost.9906: Flags [P.], cksum 0xfe2e (incorrect -> 0x4f55), seq 1:7, ack 1, win 342, options [nop,nop,TS val 2188873883 ecr 2188873883], length 6
09:17:41.773177 IP (tos 0x0, ttl 64, id 14859, offset 0, flags [DF], proto TCP (6), length 52)
localhost.9906 > localhost.41894: Flags [.], cksum 0xfe28 (incorrect -> 0x932f), seq 1, ack 7, win 342, options [nop,nop,TS val 2188873883 ecr 2188873883], length 0
其中 9906 端口是服务端端口, 41894 端口是客户端端口。前三个报文中双方完成了三次握手,同步了报文首地址偏移量(seq)、窗口大小(win)、报文最大存活时间(mss)等等。4、5报文 完成了'hello' 消息的发送和应答,客户端的报文偏移量 seq = sizeof('hello') + 1 = 7。
setup4. 开始测试
-
close 测试:
$ ./shutdown 2 connect ..... connect is success! must input the test type: 1: close 2: shutdown: 1
tcpdump 抓取报文显示:
09:32:59.182336 IP (tos 0x0, ttl 64, id 1672, offset 0, flags [DF], proto TCP (6), length 52) localhost.41896 > localhost.9906: Flags [F.], cksum 0xfe28 (incorrect -> 0x5aca), seq 7, ack 1, win 342, options [nop,nop,TS val 2189791308 ecr 2189785847], length 0 09:32:59.223130 IP (tos 0x0, ttl 64, id 48256, offset 0, flags [DF], proto TCP (6), length 52) localhost.9906 > localhost.41896: Flags [.], cksum 0xfe28 (incorrect -> 0x454c), seq 1, ack 8, win 342, options [nop,nop,TS val 2189791349 ecr 2189791308], length 0 09:33:02.183021 IP (tos 0x0, ttl 64, id 48257, offset 0, flags [DF], proto TCP (6), length 52) localhost.9906 > localhost.41896: Flags [F.], cksum 0xfe28 (incorrect -> 0x39bc), seq 1, ack 8, win 342, options [nop,nop,TS val 2189794308 ecr 2189791308], length 0 09:33:02.183053 IP (tos 0x0, ttl 64, id 24386, offset 0, flags [DF], proto TCP (6), length 52) localhost.41896 > localhost.9906: Flags [.], cksum 0x2e03 (correct), seq 8, ack 2, win 342, options [nop,nop,TS val 2189794309 ecr 2189794308], length 0
可见close触发了tcp/ip的四次挥手, 在双方互相发送FIN 消息并确认后结束了socket链接。
-
shutdown + SHUT_RD 测试:
./shutdown 2 connect ..... connect is success! must input the test type: 1: close 2: shutdown: 2 please input shutdown type: 1: read, 2: write, 3: all 1
此时查看tcpdump的抓取记录会发现没有任何新增的数据包,这说明在此种情况下客户端并未发送任何报文给服务端。
-
shutdown + SHUT_WR 测试:
./shutdown 2 connect ..... connect is success! must input the test type: 1: close 2: shutdown: 2 please input shutdown type: 1: read, 2: write, 3: all 2
tcpdump 抓取报文显示:
localhost.41900 > localhost.9906: Flags [F.], cksum 0xfe28 (incorrect -> 0x173a), seq 7, ack 1, win 342, options [nop,nop,TS val 2190212694 ecr 2190205129], length 0 09:40:00.602136 IP (tos 0x0, ttl 64, id 5571, offset 0, flags [DF], proto TCP (6), length 52) localhost.9906 > localhost.41900: Flags [.], cksum 0xfe28 (incorrect -> 0xf983), seq 1, ack 8, win 342, options [nop,nop,TS val 2190212735 ecr 2190212694], length 0 09:40:03.561641 IP (tos 0x0, ttl 64, id 5572, offset 0, flags [DF], proto TCP (6), length 52) localhost.9906 > localhost.41900: Flags [F.], cksum 0xfe28 (incorrect -> 0xedf3), seq 1, ack 8, win 342, options [nop,nop,TS val 2190215694 ecr 2190212694], length 0 09:40:03.561661 IP (tos 0x0, ttl 64, id 30067, offset 0, flags [DF], proto TCP (6), length 52) localhost.41900 > localhost.9906: Flags [.], cksum 0xfe28 (incorrect -> 0xe23b), seq 8, ack 2, win 342, options [nop,nop,TS val 2190215694 ecr 2190215694], length 0
可以看出它触发了tcp/ip四次挥手的操作。
-
shutdown + SHUT_RDWR 测试:
略
它也会触发四次挥手操作。
总结
简单的总结一下如上的测试:
operator | send FIN |
---|---|
close | yes |
shutdown SHUTRD | no |
shutdown SHUTWR | yes |
shutdown SHUTRDWR | yes |
值得一提的是,在client端调用close() 函数后,如果server 端没有调用 close()函数,四次挥手就会无法完成。此时client端 socket 会进入 TIME_WAIT 状态,直到时间耗尽才会回收socket分配的资源,而server端在此后继续发送消息会触发 SINGLE_PIPE 信号,如果这个信号没有被 服务端进程处理的话,默认会导致服务端进程退出。