TCP Timeout and Retransmission(2)
Retransmission Ambiguity and Karn’s Algorithm
A problem measuring an RTT sample can occur when a packet is retransmitted.
Say a packet is transmitted, a timeout occurs, the packet is retransmitted, and an acknowledgment is received for it. Is the ACK for the first transmission or the sec- ond?
This is an example of the retransmission ambiguity problem. It happens because unless the Timestamps option is being used, an ACK provides only the ACK num- ber with no indication of which copy (e.g., first or second) of a sequence number is being ACKed.
The paper [KP87] specifies that when a timeout and retransmission occur, we cannot update the RTT estimators when the acknowledgment for the retransmit- ted data finally arrives. This is the “first part” of Karn’s algorithm. It eliminates the acknowledgment ambiguity problem by removing the ambiguity for purposes of computing the RTT estimate.
If we were to simply ignore retransmitted segments entirely when setting the RTO, however, we would be failing to take into account some useful information being provided by the network (i.e., that it is probably experiencing some form of inability to deliver packets quickly).
In such cases, it would be beneficial to reduce the load on the network by decreasing the retransmission rate, at least until pack- ets are no longer being lost. This reasoning is the basis for the exponential backoff behavior we saw in Figure 14-1.
TCP applies a backoff factor to the RTO, which doubles each time a subsequent retransmission timer expires.
Doubling continues until an acknowledgment is received for a segment that was not retransmitted.
At that time, the backoff factor is set back to 1 (i.e., the binary exponential backoff is canceled), and the retrans- mission timer returns to its normal value.
Doubling the backoff factor on subse- quent retransmissions is the “second part” of Karn’s algorithm.
Note that when TCP times out, it also invokes congestion control procedures that alter its sending rate.
(Congestion control is discussed in detail in Chapter 16.)
Karn’s algorithm, then, really consists of two parts. As quoted directly from the 1987 paper [KP87]:
When an acknowledgement arrives for a packet that has been sent more than once (i.e., is retransmitted at least once),
ignore any round-trip measurement based on this packet, thus avoiding the retransmission ambiguity problem.
In addition, the backed-off RTO for this packet is kept for the next packet.
Only when it (or a suc- ceeding packet) is acknowledged without an intervening retransmission will the RTO be recalculated from SRTT.
This algorithm has been a required procedure in a TCP implementation for some time (since [RFC1122]).
There is an exception, however, when the TCP Time- stamps option is being used (see Chapter 13).
In that case, the acknowledgment ambiguity problem can be avoided and the first part of Karn’s algorithm does not apply.
RTT Measurement (RTTM) with the Timestamps Option
The TCP Timestamps option (TSOPT), in addition to providing a basis for the PAWS algorithm we saw in Chapter 13,
can be used for round-trip time measurement (RTTM) [RFC1323].
Timestamps are enabled by default in FreeBSD, Linux, and in response to sys- tems that use them for later versions of Windows.
In Linux, the system configura- tion variable net.ipv4.tcp_timestamps dictates whether or not they are used (value 0 for not used, value 1 for used).
The Linux Method
。。。
RTT Estimator Behaviors
The 1s RTO minimum recommended by [RFC6298] has been removed for the standard method for illustration.
Most real-world TCP implementations today violate this directive anyhow [RKS07].
RTTM Robustness to Loss and Reordering
。。。
Timer-Based Retransmission
Once a sending TCP has established its RTO based upon measurements of the time-varying values of effective RTT,
whenever it sends a segment it ensures that a retransmission timer is set appropriately.
When setting a retransmission timer, the sequence number of the so-called timed segment is recorded, and if an ACK is received in time, the retransmission timer is canceled.
The next time the sender emits a packet with data in it, a new retransmission timer is set, the old one is canceled, and the new sequence number is recorded.
The sending TCP therefore continuously sets and cancels one retransmission timer per connection; if no data is ever lost, no retransmission timer ever expires.
TCP considers a timer-based retransmission as a fairly major event; it reacts very cautiously when it happens by quickly reducing the rate at which it sends data into the network.
It does this in two ways.
The first way is to reduce its sending window size based on congestion control procedures (see Chapter 16).
The other way is to keep increasing a multiplicative backoff factor applied to the RTO each time a retransmitted segment is again retransmitted.
RTO = γRTO
There is typically a maximum backoff factor that γ is not allowed to exceed (Linux ensures that the used RTO never exceeds the value TCP_RTO_MAX, which defaults to 120s).
Once an acceptable ACK is received, γ is reset to 1.
Example
The connection starts out as before, except when the pair of segments with sequence numbers 1 and 1401 is sent, the second packet is dropped.
Presumably the first of these segments reaches the receiver, but the receiver is delaying ACKs and does not respond immediately.
Lacking a response in 219ms, the sender’s retransmission timer expires, causing the packet with sequence number 1 to be resent (this time with TSV value 577, the last one is 352).
Its arrival elicits an ACK from the receiver, which returns to the sender.
Because this ACK acknowledges data and moves the sender’s window forward, its TSER value is used to update the srtt and RTO values to 34 and 234, respectively.
The next three ACKs are generated in response to packets that arrive at the receiver.
The ACKs with the asterisks (*) are all duplicate ACKs and contain SACK information.
For now, because these ACKs do not move the sender’s window for- ward, their TSER values are not used.
With the eventual retransmission and arrival of segment 1401 (at TCP clock time 911) at the receiver,
the repair period is complete, and the receiver responds with ACK number 7001, indicating that all data has been received.
A timer-based retransmission often leads to underutilization of the network capacity.
Fortunately, TCP has another method for detecting and repairing lost packets, which is almost always more effi- cient than timer-based retransmissions.
It is called fast retransmit because it does not require the expiration of a retransmission timer to be invoked.
Fast Retransmit
As a result, packet loss can often be more quickly and efficiently repaired using fast retransmit than with timer-based retransmission.
A typical TCP implements both fast retransmit and timer-based retransmission.
Before we describe fast retransmit in more detail, it is important to realize that TCP is required to generate an immediate acknowledgment (a “duplicate ACK”)
when an out-of-order segment is received, and that the loss of a segment implies out-of- order arrivals at the receiver when subsequent data arrives.
When this happens, a hole is created at the receiver. The sender’s job then becomes filling the receiver’s holes as quickly and efficiently as possible.
The duplicate ACKs sent immediately when out-of-order data arrives are not delayed.
When SACK is used, these duplicate ACKs typically contain SACK blocks as well, which can provide information about more than one hole.
A duplicate ACK (with or without SACK blocks) arriving at a sender is a potential indicator that a packet sent earlier has been lost.
If a receiver receives a packet for a sequence number beyond the one it is expecting next, the expected packet could be either missing or merely delayed.
Because we generally do not know which one, TCP waits for a small number of duplicate ACKs (called the duplicate ACK threshold or dupthresh)
to be received before concluding that a packet has been lost and initiating a fast retransmit.
Traditionally, dupthresh has been a constant (with value 3), but some nonstandard implementations (including Linux) alter this value based on the cur- rent measured level of reordering
Without SACK, no more than one segment is typically retransmitted until an acceptable ACK is received.
With SACK, ACKs contain additional information allowing the sender to fill more than one hole in the receiver per RTT.