TCP Connection Management(5)
TCP Server Operation
In particular, we wish to become familiar with how TCP servers use port numbers and how multiple concurrent clients are handled.
TCP Port Numbers
The following output is on a system with no active secure shell connections.
The -a option reports on all network endpoints, including those in either lis- tening or non-listening state.
The -n flag prints IP addresses as dotted-decimal (or hex) numbers, instead of trying to use the DNS to convert the address to a name,
and prints numeric port numbers (e.g., 22) instead of service names (e.g., ssh).
The -t option selects only TCP endpoints.
The local address (which really means local endpoint) is output as :::22, which is the IPv6-oriented way of referring to the all-zeros address,
also called the wildcard address, along with port number 22.
This means that an incoming connection request (i.e., a SYN) to port 22 will be accepted on any local interface.
If the host were multihomed (this one is), we could specify a single IP address for the local IP address (one of the host’s IP addresses),
and only connections received on that interface would be accepted.
Here, the foreign IP address and foreign port number are not known yet, because the local endpoint is in the LISTEN state, waiting for a connection to arrive.
We now start a secure shell client on the host 10.0.0.3 that connects to this server:
The local IP address corresponds to the interface on which the connection request arrived 。
Also notice that the port number for the ESTABLISHED connection does not change: it is 22, the same as the LISTEN endpoint.
We now initiate another client request from the same system (10.0.0.3) to this server. Here is the relevant netstat output:
TCP cannot determine which process gets an incoming segment by looking at the destination port number(local address) only.
Also, the only one of the three endpoints at port 22 that will receive incoming connection requests is the one in the LISTEN state.
The endpoints in the ESTABLISHED state cannot receive SYN segments, and the endpoint in the LISTEN state cannot receive data segments.
The host operating system ensures this. (If it did not, TCP could become quite confused and not work properly.)
Next we initiate a third client connection, from the IP address 169.229.62.97 that is across the DSL PPPoE link from the server 10.0.0.1, and not on the same Ethernet.
The local IP address of the third ESTABLISHED connection now corresponds to the interface address of the PPPoE link on the multihomed host (67.125.227.195).
Note that the Send-Q status is not 0 but is instead 928 bytes.
This means that the server host has sent 928 bytes on the connection for which it has not yet heard an acknowledgment.
Restricting Local IP Addresses
We can see what happens when the server does not wildcard the local IP address but instead sets it to one particular local address.
If we run our sock program as a server and provide it with a particuclar IP address, that address becomes the local address of the listening endpoint.
For example:
Linux% sock -s 10.0.0.1 8888
This restricts this server to using connections that arrive only on the local IPv4 address 10.0.0.1.
If we now connect to this server from the local network, from the host 10.0.0.3, it works fine
If we instead try to connect to this server from a host using a destination address other than 10.0.0.1 (even including the local address 127.0.0.1),
the connection request is not accepted by the TCP module.
If we watch with tcp- dump, the SYN elicits an RST segment
The server application never sees the connection request—the rejection is done by the operating system’s TCP module,
based on the local address specified by the application and the destination address contained in the arriving SYN seg- ment.
We see that the capability of restricting local IP addresses is quite strict.
Restricting Foreign Endpoints
The abstract interface functions for TCP given in [RFC0793] allow a server doing a passive open to have either a fully specified foreign endpoint (to wait for a particular client to issue an active open)
or an unspecified foreign end- point (to wait for any client).
Incoming Connection Queue
But there is still a chance that multiple connection requests will arrive while the listening server is creating a new process,
or while the operating system is busy running other higher-priority processes, or worse yet, that the server is being attacked with bogus connection requests that are never allowed to be established.
How does TCP handle these scenarios?
To fully explore this question, we must first understand that new connections may be in one of two distinct states before they are made available to an applica- tion.
The first case is connections that have not yet completed but for which a SYN has been received (these are in the SYN_RCVD state).
The second case is connections that have already completed the three-way handshake and are in the ESTABLISHED state but have not yet been accepted by the application.
Internally, the operating system ordinarily has two distinct connection queues, one for each of these cases.
An application has limited control over the sizing of these queues.
In modern Linux kernels this behavior has been changed to be the number of connections in the second case (ESTABLISHED connections).
In Linux, then, the following rules apply:
1 When a connection request arrives (i.e., the SYN segment), the system-wide parameter net.ipv4.tcp_max_syn_backlog is checked (default 1000).
If the number of connections in the SYN_RCVD state would exceed this threshold, the incoming connection is rejected.
2 Each listening endpoint has a fixed-length queue of connections that have been completely accepted by TCP (i.e., the three-way handshake is com- plete) but not yet accepted by the application.
The application specifies a limit to this queue, commonly called the backlog.
This backlog must be between 0 and a system-specific maximum called net.core.somaxconn, inclusive (default 128).
Keep in mind that this backlog value specifies only the maximum number of queued connections for one listening endpoint
This backlog has no effect whatsoever on the maximum number of estab- lished connections allowed by the system,
or on the number of clients that a concurrent server can handle concurrently.
3 Also, the client may think the server is ready to receive data when the client’s active open completes successfully,
before the server application has been noti- fied of the new connection. If this happens, the server’s TCP just queues the incoming data.
4 If there is not enough room on the queue for the new connection, the TCP delays responding to the SYN, to give the application a chance to catch up.
Linux is somewhat unique in this behavior—it persists in not ignoring incoming connections if it possibly can.
If the net.ipv4.tcp_abort_on_ overflow system control variable is set, new incoming connections are reset with a reset segment.
Sending reset segments on overflow is not generally advisable and is not turned on by default.
The client has attempted to contact the server, and if it receives a reset during the SYN exchange, it may falsely conclude that no server is present (instead of concluding that there is a server present but it is busy).
Being too busy is really a form of “soft” or temporary error rather than a hard error.
Normally, when the queue is full, the application or the operating system is busy, preventing the application from servicing incoming connections.This condition could change in a short while.
But if the server’s TCP responded with a reset, the client’s active open would abort (which is what we saw happen if the server was not started).
Without the reset, if the listening server does not get around to accepting some of the already-accepted connections that have filled its queue to the limit, the client’s active open eventually times out, according to normal TCP mechanisms.
In the case of Linux, the connecting clients are just slowed for a sig- nificant period of time—they will neither time out nor be reset.
We can see what happens when the incoming connection queue becomes full using our sock program.
We invoke it with a new option (-O) that tells it to pause after creating the listening endpoint, before accepting any connection requests.
If we then invoke multiple clients during this pause period, the server’s queue of accepted connections should fill, and we can see what happens with tcpdump.
Linux% sock -s -v -q1 -O30000 6666
The -q1 option sets the backlog of the listening endpoint to 1.
The -O30000 option causes the program to sleep for 30,000s before accepting any client connections.
So with Berkeley sockets, be aware that with TCP, when the application is told that a connection has just arrived, TCP’s three-way handshake is already over.
If the server then looks at the client’s IP address and port number and decides it does not want to service this client,
all the server can do is either close the connection (causing a FIN to be sent) or reset the connection (causing an RST to be sent).
Attacks Involving TCP Connection Management
A SYN flood is a TCP DoS attack whereby one or more malicious clients generate a series of TCP connection attempts (SYN segments) and
send them at a server, often with a “spoofed” (e.g., random) source IP address.
The server allocates some amount of connection resources to each partial connection.
Because the connec- tions are never established, the server may start to deny service to future legiti- mate requests
because its memory is exhausted holding state for many half-open connections.
One mecha- nism invented to deal with this issue is called SYN cookies [RFC4987].
The main insight with SYN cookies is that most of the information that would be stored for a connection when a SYN arrives could be encoded inside the Sequence Number field supplied with the SYN + ACK.
The target machine using SYN cookies need not allocate any storage for the incoming connection request—
it allocates real memory only once the SYN + ACK segment has itself been acknowledged (and the initial sequence number is returned).
Producing SYN cookies involves a careful selection process of the TCP ISN at servers.
Essentially, the server must encode any essential state in the Sequence Number field in its SYN + ACK that is returned in the ACK Number field from a legitimate client.
There are several ways of doing this, but we will mention the technique adopted by Linux.
// TODO here