1.概述
- 在使用SIM卡拨号建立TCP连接时,非常容易出现问题,尤其是物联网卡,各种幺蛾子层出不穷.一会儿连不上基站了,一会儿又信号不行了,有时候还可能被运营商给限速了,有些专网卡还必须设置指定的APN...心累
- 上周客户使用我司聚合路由器时,只使用了一张物联网卡,反馈说是在电梯中与服务器断开连接,查看日志发现是建立TCP连接时
connect
函数一直在报错Operation now in progress
,就想着上网查一下这个错误可能是什么原因导致的.结果网上文章不少,大多属于灌水文章,还是得靠自己来总结一下呀
2.简单的TCP连接
- 使用下面这段代码,建立一个非常简单的TCP连接
- 注意:在调用connect函数之前,通过设置发送超时时间SO_SNDTIMEO,将connect函数的阻塞时间由默认的75s改为了5s
int tcp_connect(char *server_ip, int server_port, char *local_ip)
{
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if(sockfd < 0)
{
printf("ERROR:socket\n");
return -1;
}
//bind
struct sockaddr_in local_addr;
bzero(&local_addr, sizeof(local_addr));
local_addr.sin_family = AF_INET;
local_addr.sin_port = htons(0);
local_addr.sin_addr.s_addr = inet_addr(local_ip);
int ret = bind(sockfd, (struct sockaddr *)&local_addr, sizeof(local_addr));
if(ret < 0)
{
printf("ERROR:bind\n");
return -2;
}
// server
struct sockaddr_in serv_addr;
bzero(&serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(server_port);
serv_addr.sin_addr.s_addr = inet_addr(server_ip);
struct timeval tm;
tm.tv_sec = 5;
tm.tv_usec = 0;
// send timeout,相当于设置connect最多阻塞5s
setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, (void *)&tm, sizeof(tm));
// recv timeout
setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, (void *)&tm, sizeof(tm));
// disable NAGLE
int yes = 1;
setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, (void *)&yes, sizeof(yes));
ret = connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr));
if(ret < 0)
{
printf("ERROR:connect errno:%d errMsg:%s\n", errno, strerror(errno));
return -3;
}
return sockfd;
}
3. EINPROGRESS(errno 115:Operation now in progress)的几种情况
3.1 man connect
- 首先看一下man手册中如何描述的,说是当连接设置为非阻塞时,目标没有及时应答,但是我遇到的情况都是阻塞的socket,明显不对
EINPROGRESS
The socket is nonblocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by
selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level
SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error
codes listed here, explaining the reason for the failure).
3.2 连接一个不存在主机
- 我的局域网中并不存在IP是192.168.16.3的主机,连接192.168.16.3进行测试
avit@ubuntu:connect$ ./a.out -c
client MODE Usage: ./a.out -c [serverIP] [serverPort] [localIP]
avit@ubuntu:connect$ ./a.out -c 192.168.16.3 8888 0.0.0.0 #5s后进程退出
ERROR:connect errno:115 errMsg:Operation now in progress
- 因为我设置了send timeout为5s,因此connect失败时,进程会在5s后退出
- 进程退出时,打印了错误信息
Operation now in progress
,错误码115正是EINPROGRESS
- tcpdump抓包,结果如下:
root@ubuntu:/home/avit# tcpdump -ni any port 8888
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10:16:19.754676 IP 192.168.16.135.34549 > 192.168.16.3.8888: Flags [S], seq 2127802455, win 64240, options [mss 1460,sackOK,TS val 3462831271 ecr 0,nop,wscale 7], length 0
10:16:20.777469 IP 192.168.16.135.34549 > 192.168.16.3.8888: Flags [S], seq 2127802455, win 64240, options [mss 1460,sackOK,TS val 3462832293 ecr 0,nop,wscale 7], length 0
10:16:22.793522 IP 192.168.16.135.34549 > 192.168.16.3.8888: Flags [S], seq 2127802455, win 64240, options [mss 1460,sackOK,TS val 3462834293 ecr 0,nop,wscale 7], length 0
- 可见从10:16:19开始,connect工尝试发送了3次SYN包,也就是说在到达5s超时时间之前,共三次尝试建立TCP连接
- 具体的连接过程请查看
<<TCP/IP 卷一:协议>>
的第18章节
3.3 网络中延时比较大时,connect也会出现EINPROGRESS错误
- 使用tc模拟弱网环境,将网络链路的延时设置为5000ms
tc qdisc add dev eth1 root netem delay 5000ms
- 使用该链路建立TCP连接测试,即使用eth1来建立TCP连接
root@avit:/tmp# ifconfig eth1
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.16.15 netmask 255.255.255.0 broadcast 192.168.16.255
inet6 fe80::2eb:59ff:fe4b:6eda prefixlen 64 scopeid 0x20<link>
ether 00:eb:59:4b:6e:da txqueuelen 1000 (Ethernet)
RX packets 42842 bytes 2945872 (2.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 23157 bytes 1530968 (1.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# 122.112.238.19是位于公网上的TCP服务器, 端口号23232, eth1的IP为192.168.16.15
root@avit:/tmp# ./a.out -c 122.112.238.19 23232 192.168.16.15
ERROR:connect errno:115 errMsg:Operation now in progress
- 如上所示,网络中延时过大时,也会导致connect函数出现EINPROGRESS错误
3.4 网络丢包严重时,也可能出现EINPROGRESS错误
- 使用tc设置链路的丢包率为80%,模拟丢包率非常大的弱网环境(注意清除上一步中的设置延时的tc规则)
tc qdisc add dev eth1 root netem loss 80%
root@avit:/tmp# ./a.out -c 122.112.238.19 23232 192.168.16.15
local ip:192.168.16.15 connect 122.112.238.19:23232 succeed
root@avit:/tmp# ./a.out -c 122.112.238.19 23232 192.168.16.15
local ip:192.168.16.15 connect 122.112.238.19:23232 succeed
root@avit:/tmp# ./a.out -c 122.112.238.19 23232 192.168.16.15
ERROR:connect errno:115 errMsg:Operation now in progress
- 如上所示,丢包率比较高时,也可能出现
EINPROGRESS:Operation now in progress
的错误,但并不是每一次都会出现该错误
4. ETIMEDOUT (errno 110:Connection timed out)
- connect函数的默认超时时间为75s,连接一个不存在的主机时,就会出现ETIMEDOUT错误
- 将上面代码中设置发送超时的代码注释掉,重新编译,然后尝试链接一个不存在的主机
avit@ubuntu:connect$ ./a.out -c 192.168.16.3 8888 0.0.0.0 # 75s后进程退出
ERROR:connect errno:110 errMsg:Connection timed out
- 如上,75s后进程退出,connect报
Connection timed out
的错误
- tcpdump抓包,结果如下:
root@ubuntu:/home/avit# tcpdump -ni any port 8888
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:18:49.314001 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552746896 ecr 0,nop,wscale 7], length 0
11:18:50.345449 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552747926 ecr 0,nop,wscale 7], length 0
11:18:52.361119 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552749940 ecr 0,nop,wscale 7], length 0
11:18:56.424827 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552754000 ecr 0,nop,wscale 7], length 0
11:19:04.617917 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552762187 ecr 0,nop,wscale 7], length 0
11:19:20.745391 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552778247 ecr 0,nop,wscale 7], length 0
11:19:54.282001 IP 192.168.16.135.50465 > 192.168.16.3.8888: Flags [S], seq 140672580, win 64240, options [mss 1460,sackOK,TS val 3552811695 ecr 0,nop,wscale 7], length 0
- 在到达75s超时时间之前,一共发送了7个SYN包, 即共七次尝试建立TCP连接
5. ECONNREFUSED(errno 111)
5.1 connect一个未监听的端口
- connect连接一个存在的主机,但是该主机上没有运行监听8888端口的tcp server
avit@ubuntu:connect$ ./a.out -c 192.168.16.7 8888 0.0.0.0
ERROR:connect errno:111 errMsg:Connection refused
5.2 connect一个设置防火墙为REJECT
的TCP SERVER
- 如果在TCP SERVER上设置了禁止连接的iptables防火墙策略,同样会报
Connection refused
的错误,前提是iptables设置的防护墙动作为REJECE
,而不是DROP
- 在TCP SERVER
192.168.16.135
主机上设置防火墙策略,目的端口是8888的TCP连接将被REJECT
iptables -t filter -I INPUT -p tcp --dport 8888 -j REJECT
- 在TCP CLIENT
192.168.16.8
主机上运行测试程序,尝试连接 192.168.16.135的8888端口
root@avit:/tmp# ./a.out -c 192.168.16.135 8888 0.0.0.0
ERROR:connect errno:111 errMsg:Connection refused
- 如上,connect函数立即返回
Connection refused
的错误
5.3 connect一个设置防火墙为DROP
的TCP SERVER
- 在TCP SERVER
192.168.16.135
主机上设置防火墙策略,目的端口是8888的TCP连接将被DROP
iptables -t filter -I INPUT -p tcp --dport 8888 -j DROP
- 在TCP CLIENT
192.168.16.8
主机上运行测试程序,尝试连接 192.168.16.135的8888端口
root@avit:/tmp# ./a.out -c 192.168.16.135 8888 0.0.0.0
connect errno:110 errMsg:Connection timed out
- 如上:connect连接失败时返回
Connection timed out
的错误,但是比较奇怪的是测试connect函数的超时时间将远远超过75s,使用time
命令实测达到了130s
root@avit:/tmp# time ./a.out -c 192.168.16.135 8888 0.0.0.0
ERROR:connect errno:110 errMsg:Connection timed out
real 2m10.555s
user 0m0.004s
sys 0m0.001s
- tcpdump抓包结果如下,可以看到,超过75s之后,connect函数就不再尝试连接了,但是为什么connect函数不返回呢?整不明白
root@avit:/# tcpdump -ni any port 8888
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:49:14.542022 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397600320 ecr 0,nop,wscale 7], length 0
11:49:15.565112 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397601344 ecr 0,nop,wscale 7], length 0
11:49:17.581088 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397603360 ecr 0,nop,wscale 7], length 0
11:49:21.713075 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397607492 ecr 0,nop,wscale 7], length 0
11:49:29.905086 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397615684 ecr 0,nop,wscale 7], length 0
11:49:46.029123 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397631808 ecr 0,nop,wscale 7], length 0
11:50:19.053024 IP 192.168.16.8.52195 > 192.168.16.135.8888: Flags [S], seq 1005796129, win 65535, options [mss 1460,sackOK,TS val 1397664832 ecr 0,nop,wscale 7], length 0
6. ENETUNREACH(errno 101:Network is unreachable)
- 这种错误最常见了:网络故障,或路由配置出错,或没有设置默认网关均会出现该错误
- 如下,拔掉网线,绑定本地IP
0.0.0.0
,此时就会报Network is unreachable
root@avit:/tmp# ./a.out -c 122.112.238.19 23233 0.0.0.0
ERROR:connect errno:101 errMsg:Network is unreachable