TCP Health Checks
This chapter describes how to configure health checks for TCP.
Introduction
NGINX and NGINX Plus can continually test your TCP upstream servers, avoid the servers that have failed, and gracefully add the recovered servers into the load-balanced group.
Prerequisites
-
You have configured an upstream group of TCP servers in the
stream
context, for example:stream { ... upstream stream_backend { server backend1.example.com:12345; server backend2.example.com:12345; server backend3.example.com:12345; } ... }
-
You have configured a server that passes TCP connections to the server group:
stream { ... server { listen 12345; proxy_pass stream_backend; } ... }
Passive TCP Health Checks
If an attempt to connect to an upstream server times out or results in an error, open source NGINX or NGINX Plus can mark the server as unavailable and stop sending requests to it for a defined amount of time. To define the conditions under which NGINX considers an upstream server unavailable, include the following parameters to the server
directive
fail_timeout
– The amount of time within which a specified number of connection attempts must fail for the server to be considered unavailable. Also, the amount of time that NGINX considers the server unavailable after marking it so.max_fails
– The number of failed attempts that happen during the specified time for NGINX to consider the server unavailable.
意思就是:当upstream里的一个server失败次数达到max_fails,该server接下来的fail_timeout时间内停止服务(不会分发连接给它),过了这个时间继续根据负载算法往该server分配连接
The default values are 10
seconds and 1
attempt. So if a connection attempt times out or fails at least once in a 10-second period, NGINX marks the server as unavailable for 10 seconds. The example shows how to set these parameters to 2 failures within 30 seconds:
upstream stream_backend {
server backend1.example.com:12345 weight=5;
server backend2.example.com:12345 max_fails=2 fail_timeout=30s;
server backend3.example.com:12346 max_conns=3;
}
Active TCP Health Checks
Health checks can be configured to test a wide range of failure types. For example, NGINX Plus can continually test upstream servers for responsiveness and avoid servers that have failed.
NGINX Plus sends special health check requests to each upstream server and checks for a response that satisfies certain conditions. If a connection to the server cannot be established, the health check fails, and the server is considered unhealthy. NGINX Plus does not proxy client connections to unhealthy servers. If several health checks are defined for a group of servers, the failure of any one check is enough for the corresponding server be considered unhealthy.
To enable active health checks:
-
Specify a shared memory zone – a special area where the NGINX Plus worker processes share state information about counters and connections. Add the
zone
directive to the upstream server group and specify the zone name and the amount of memory:stream { ... upstream stream_backend { zone stream_backend 64k; server backend1.example.com:12345; server backend2.example.com:12345; server backend3.example.com:12345; } ... }
-
Enable health checks for servers in the upstream group. Add the
health_check
andhealth_check_timeout
directives to the server that proxies connections to the upstream group:stream { ... server { listen 12345; proxy_pass stream_backend; health_check; health_check_timeout 5s; } ... }
The
health_check
directive enables the health check functionality, whilehealth_check_timeout
overrides theproxy_timeout
value for health checks, as for health checks this timeout needs to be significantly shorter.
Fine-Tuning TCP Health Checks
By default, NGINX Plus tries to connect to each server in an upstream server group every 5 seconds. If the connection cannot be established, NGINX Plus considers the health check failed, marks the server as unhealthy, and stops forwarding client connections to the server.
To change the default behavior, include parameters to the health_check
directive:
interval
– How often (in seconds) NGINX Plus sends health check requests (default is 5 seconds)passes
– Number of consecutive health checks the server must respond to to be considered healthy (default is 1)fails
– Number of consecutive health checks the server must fail to respond to to be considered unhealthy (default is 1)
stream {
...
server {
listen 12345;
proxy_pass stream_backend;
health_check interval=10 passes=2 fails=3;
}
...
}
In the example, the time between TCP health checks is increased to 10
seconds, the server is considered unhealthy after 3
consecutive failed health checks, and the server needs to pass 2
consecutive checks to be considered healthy again.
By default, NGINX Plus sends health check messages to the port specified by the server
directive in the upstream
block. You can specify another port for health checks, which is particularly helpful when monitoring the health of many services on the same host. To override the port, specify the port
parameter of the health_check
directive:
stream {
...
server {
listen 12345;
proxy_pass stream_backend;
health_check port=8080;
}
...
}
The “match {}” Configuration Block
You can verify server responses to health checks by configuring a number of tests. These tests are defined with the match
{}
configuration block placed in the stream
{}
context. Specify the match
{}
block and set its name, for example, tcp_test
:
stream {
...
match tcp_test {
...
}
}
Then refer to the block from the health_check
directive by including the match
parameter and the name of the match
block:
stream {
...
server {
listen 12345;
health_check match=tcp_test;
proxy_pass stream_backend;
}
...
}
The conditions or tests under which a health check succeeds are set with send
and expect
parameters:
send
– The text string or hexadecimal literals (“/x” followed by two hex digits) to send to the serverexpect
– Literal string or regular expression that the data returned by the server needs to match
These parameters can be used in different combinations, but no more than one send
and one expect
parameter can be specified at a time:
-
If no
send
orexpect
parameters are specified, the ability to connect to the server is tested. -
If the
expect
parameter is specified, the server is expected to unconditionally send data first:match pop3 { expect ~* "\+OK"; }
-
If the
send
parameter is specified, it is expected that the connection will be successfully established and the specified string will be sent to the server:match pop_quit { send QUIT; }
-
If both the
send
andexpect
parameters are specified, then the string from thesend
parameter must match the regular expression from theexpect
parameter:stream { ... upstream stream_backend { zone upstream_backend 64k; server backend1.example.com:12345; } match http { send "GET / HTTP/1.0\r\nHost: localhost\r\n\r\n"; expect ~* "200 OK"; } server { listen 12345; health_check match=http; proxy_pass stream_backend; } }
The example shows that in order for a health check to pass, the HTTP request must be sent to the server, and the expected result from the server contains
200
OK
to indicate a successful HTTP response.