keepalived+lvs tcp check 引起的后端服务报Connection reset by peer
一,现象描述
2019-01-11 15:10:49.426 ERROR 8 --- [tLoopGroup-4-42] c.c.s.listener.DefaultExceptionListener : Connection reset by peer java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745)
二,故障原因
1,client端的SoTimeout , 就java来说就是java.net.Socket的setSoTimeout方法设置的, setSoTimeout(0)就是表明超时时间无限大。这个值是为读取阻塞设置超时的。
2,baidu介绍修改keepalived配置文件中persistence_timeout时间的都是扯,关闭或设置更高都不能解决。
3,几乎可以判断是keepalived的健康检测机制,和后端java服务的socketIO判断机制冲突造成。
三,解决办法
方法一: 取消LVS方式进行tcp转发,进而改为http方式反向代理,问题即可解决。 当然,这是在业务允许使用http的情况下,如果必须使用tcp协议,那就得使用下面的方法了。 方法二: 修改keepalived配置文件 virtual_server 192.168.20.140 55555 { delay_loop 6 lb_algo wrr lb_kind DR #persistence_timeout 900 protocol TCP real_server 192.168.20.154 55555 { weight 100 MISC_CHECK { misc_path "/data/shell/check_port.pl -h 192.168.20.154 -p 55555 -w 5 -c 10" misc_timeout 10 } } } #将原来的TCP_CHECK方式改为 MISC_CHECK模式,将perl脚本传到指定目录下,给可执行权限。 #查看系统是否支持perl rpm -q perl #perl -v 可查看版本信息 perl脚本下载地址:https://exchange.nagios.org/directory/Plugins/Network-Protocols/%2A-TCP-and-UDP-%28Generic%29/check_port-2Epl/details
cat check_port.pl
#!/usr/bin/perl -w
#===============================================================================
#
# FILE: check_port.pl
#
# USAGE: check_port.pl -p <port> -h <host> (-c <critical> -w <warning> -v)
#
# DESCRIPTION: tests to see if the port is responding and can display timing
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Tim Pretlove
# VERSION: 1.3
# CREATED: 04/12/09 13:57:23
# REVISION: ---
# LICENCE: GNU
#
# AUTHOR: Jim Sander jim.sander@jdsmedia.net
# VERSION: 1.2
# MODIFIED: 10-04-2014 16:00
# BUGS: Socket::pack_sockaddr_in, length is 0 error for unresolvable hostnames
# NOTES: Fixed; now exits with '3', status UNKNOWN, and 'host lookup failed'
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
#===============================================================================
use strict;
use warnings;
use Socket;
use Getopt::Long;
use Time::HiRes qw(gettimeofday tv_interval);
my ($crit, $warn, $timeout, $host, $portnum, $verbose);
GetOptions(
'crtitical=s' => \$crit,
'warning=s' => \$warn,
'timeout=s' => \$timeout,
'host=s' => \$host,
'port=s' => \$portnum,
'verbose' => \$verbose) or HELP_MESSAGE();
sub testport {
my ($host,$port,$protocol,$timeout) = @_;
my $startsec;
my $elapsed = 0;
if (!defined $timeout) { $timeout = 10 }
if (!defined $protocol) { $protocol = "tcp" }
my $proto = getprotobyname($protocol);
my $iaddr = inet_aton($host);
if ( !defined $iaddr ){ return 3,$elapsed; }
my $paddr = sockaddr_in($port, $iaddr);
$startsec = [gettimeofday()];
socket(SOCKET, PF_INET, SOCK_STREAM, $proto) or die "socket: $!";
eval {
local $SIG{ALRM} = sub { die "timeout" };
alarm($timeout);
connect(SOCKET, $paddr) or error();
alarm(0);
};
if ($@) {
close SOCKET || die "close: $!";
$elapsed = tv_interval ($startsec, [gettimeofday]);
return "1",$elapsed;
} else {
close SOCKET || die "close: $!";
$elapsed = tv_interval ($startsec, [gettimeofday]);
return "0",$elapsed;
}
}
sub HELP_MESSAGE {
print "$0 -p <port> -h <host> (-c <critical> -w <warning> -v)\n";
print "\t -p <port> # port number to examine\n";
print "\t -h <hostname> # hostname or ip address to contact\n";
print "\t -c <seconds> # the number of seconds to wait before a going critical\n";
print "\t -w <seconds> # the number of seconds to wait before a flagging a warning\n";
print "\t -v # displays nagios performance information\n";
print "\te.g $0 -p 80 -h www.google.com -c 1.5 -w 1.0 -v\n";
exit(4);
}
sub printperf {
my ($warning,$critical,$elapsed) = @_;
if ((defined $warning) && (defined $critical)) {
print "|rta=$elapsed" . "s;$warning;$critical;0;$critical";
} else {
print "|rta=$elapsed"
}
}
sub test {
my ($critical,$warning,$host,$portnum,$timeout) = @_;
my $proto = "tcp";
my ($rc,$elapsed) = testport($host,$portnum,$proto,$timeout);
if ($rc == 0) {
if (defined $critical) {
if ($critical <= $elapsed) {
return 2,$elapsed;
}
}
if (defined $warning) {
if ($warning <= $elapsed) {
return 1,$elapsed;
}
}
return $rc,$elapsed;
} else {
return 2,$elapsed;
}
}
unless ((defined $portnum) && (defined $host)) {
HELP_MESSAGE();
exit 1;
}
if ((defined $crit) && (defined $warn)) {
if ($crit <= $warn) {
print "Error: warning is greater than critical will never reach warning\n";
exit 4;
}
}
my @mess = qw(OK WARNING CRITICAL UNKNOWN);
my @mess2 = ("is responding","is slow responding","is not responding","host lookup failed");
my ($rc,$elapsed) = test($crit,$warn,$host,$portnum,$timeout);
print "PORT $portnum $mess[$rc]: $host/$portnum $mess2[$rc]";
if (defined $verbose) {
printperf($warn,$crit,$elapsed);
}
exit($rc);