tcpack----- 2sack dack

道当我们接收到ack的时候,我们会判断sack段,如果包含sack段的话,我们就要进行处理。这篇blog就主要来介绍内核如何处理sack段

sack dsack概念

TCP在一个RTO超时后会重传数据包,激进的发送者会在一个RTO之内重传数据包,但是因为不知道具体是哪一个包丢失了,只能从重传队列的开头去重传。

SACK就是接收方收到乱序的数据包后,提供给发送方的已经收到的数据包信息,发包方收到这些信息后就会知道哪些数据丢失,然后立即重传缺失部分即可。这就大大提高了数据重传的速度。
同时乱序包会快速ack。
DSACK则是接收方收到了重传的数据包,通知发送方这个包重复发送了,因此发送方就能知道之前的重传(丢包)是一次”误判”, 发送方就会尝试从之前拥塞状态中恢复。

需要注意的是,sack只是一种建议,接受方发送sack后,有可能因为内存不足等原因sack reneging,释放这部分数据。

 

SACK的产生,RFC2018

SACK通常是由数据接收方产生,如果在connection建立的时候,SYN包中有SACK-Permitted 的选项为true,同时自身也支持SACK,那么可以在接收异常的时候,产生SACK option. 如果要发送,SACK中需要携带接收队列中所有没有被确认的数据段信息。
如果接收方选择发送带有SACK的ACK,需要遵循如下规则:
1. 第一个block需要指出是哪一个segment触发SACK option ,我认为就是谁乱序了,才会导致SACK
2. 尽可能多的把所有的block填满
3. SACK 要报告最近接收的不连续的数据块
接收端的行为:
1. 数据没有被确认前,都会保持在滑动窗口内
2. 每一个数据包都有一个SACKed的标志,对于已经标示的segment,重新发送的时候会忽略
3. 如果SACK丢失,超时重传之后,重置所有数据包SACKed 标志

D-SACK RFC2883

D-SACK主要是使用了SACK来告诉发送方有哪些数据被重复接收了,如果是D-SACK,那么SACK option的第一个block代表被重复发送的序列片段需要注意的点:
1. D-SACK仅仅是接收端报告一个重复的连续的片段
2. 每个重复的连续片段只能在一个block中
3.  重复片段的序列号
4. 第二个block指的是data没有被确认的



TCP选项

Sack-Permitted Option

在三次握手的时候,会在双方的syn包中携带本地是否启动sack扩展。 Sack-Permitted选项就是说明本地启用sack。
只会在握手协商的时候带该选项, 握手完成后双方就能使用sack。

 SACK选项分为两类:SACK-Permitted和SACK选项。其中SACK-Permitted选项的格式如下:

TCP Sack-Permitted Option:
 Kind: 4
 +---------+---------+
 | Kind=4  | Length=2|
 +---------+---------+

Sack Option Format

如果对方启用sack,在tcp建立连接后,接收方收到乱序包后,就会使用sack选项来选择确认。
这里每一个Block都是孤立的,说明(Left Edge of Block)前面的一个字节没有被接收,以及(Right Edge of Block)开始的1字节没有被接收。
因为tcp头长度的限制,2nop+8n+2<=40, 可以知道最多有4个sack.
同时如果启用timestamp(2nop+2+8=12), 2nop+8
n+2 <=40-12=28, 知道sack最多有3个

TCP SACK Option:
Kind: 5
Length: Variable
                  +--------+--------+
                  | Kind=5 | Length |
+--------+--------+--------+--------+
|      Left Edge of 1st Block       |
+--------+--------+--------+--------+
|      Right Edge of 1st Block      |
+--------+--------+--------+--------+
|                                   |
/            . . .                  /
|                                   |
+--------+--------+--------+--------+
|      Left Edge of nth Block       |
+--------+--------+--------+--------+
|      Right Edge of nth Block      |
+--------+--------+--------+--------+

 

 SACK-Permitted选项在SYN或SYN|ACK中设置,用于协商开启SACK功能。SACK选项用于通告对端接收数据的信息

接受端发送sack

接受端的sack会包含最新收到的数据包, 从未把最新的情况通告给发送方。

按照概念,只有可能在慢速路径返回sack,慢速路径会调用tcp_data_queue来处理数据包。

  • 收到的包end_seq <= rcv_nxt, 说明整个都是重传的旧包,直接丢弃这个包,并快速dsack。
  • 收到包序号seq < rcv_next, 则dsack [seq, rcv_next]; 如果还有部分新数据,则处理新数据
  • 收到的包序号seq == rcv_next, 说明收到的是顺序包,会更新rcv_next. 此时还会查看ofo队列,看是否窗口中的hole被当前包填上,从而把包从ofo中移走, 并删除历史sack中因为rcv_next更新而过期的部分
  • 收到的包序号seq > rcv_next, 说明是乱序包,要存到ofo队列中,并设置sack
static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
{
    ...
    if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {    //非乱序包
        if (tcp_receive_window(tp) == 0)    //接受窗口满了,不能接受
            goto out_of_window;
        /* Ok. In sequence. In window. */
        if (tp->ucopy.task == current &&    //应用程序上下文中,在tcp_recvmsg
            tp->copied_seq == tp->rcv_nxt && tp->ucopy.len &&    //正要读取这个包,并且应用程序还有剩余缓存
            sock_owned_by_user(sk) && !tp->urg_data) {    //应用程序持有锁
            // copy数据到ucopy中
            ...
        }
        if (eaten <= 0) {    //没有copy到ucopy,或者没有全部copy
queue_and_out:
            ...
            eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);    //添加到sk_receive_queue
        }
        tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);    //尝试更新rcv_nxt
        if (skb->len)
            tcp_event_data_recv(sk, skb);    
        if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
            tcp_fin(sk);    //fin处理
        if (!RB_EMPTY_ROOT(&tp->out_of_order_queue)) {    //ofo队列非空
            tcp_ofo_queue(sk);    //尝试把ofo中的包移动到sk_receive_queue中
            /* RFC2581. 4.2. SHOULD send immediate ACK, when
             * gap in queue is filled.
             */
            if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
                inet_csk(sk)->icsk_ack.pingpong = 0;    //ofo队列被清空了,需要立即发送ack
        }
        if (tp->rx_opt.num_sacks)    //历史sack数量
            tcp_sack_remove(tp);    //如果当前收到的包带sack,则删除其中不需要的部分,有可能ofo填充造成了rcv_next的移动
        tcp_fast_path_check(sk);    //当前是slow path, 尝试开启快速路径
        if (eaten > 0)
            kfree_skb_partial(skb, fragstolen);    //已经全部copy到ucopy,可以释放skb了
        if (!sock_flag(sk, SOCK_DEAD))
            sk->sk_data_ready(sk);    //通知epoll数据到达
        return;
    }
    //判断是否是重传的旧包,标记dsack
    if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) {
        /* A retransmit, 2nd most common case.  Force an immediate ack. */
        NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq);    //设置dsack
out_of_window:
        tcp_enter_quickack_mode(sk);    //进入快速ack
        inet_csk_schedule_ack(sk);    //标记需要ack
drop:
        tcp_drop(sk, skb);    //释放包
        return;
    }
    //收到的包在接收窗口范围之外
    /* Out of window. F.e. zero window probe. */
    if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp)))
        goto out_of_window;    //进入快速ack,通告对方新的接收窗口大小
    //乱序包或者需要dsack都要快速ack
    tcp_enter_quickack_mode(sk);    //快速ack
    if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {    //seq < rcv_next < end_seq,部分旧数据
        /* Partial packet, seq < rcv_next < end_seq */
        SOCK_DEBUG(sk, "partial packet: rcv_next %X seq %X - %X\n",
               tp->rcv_nxt, TCP_SKB_CB(skb)->seq,
               TCP_SKB_CB(skb)->end_seq);
        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, tp->rcv_nxt);    //设置dsack
        /* If window is closed, drop tail of packet. But after
         * remembering D-SACK for its head made in previous line.
         */
        if (!tcp_receive_window(tp))    //接收窗口不足,快速ack通知对方
            goto out_of_window;
        goto queue_and_out; //调用tcp_queue_rcv添加到sk_receive_queue,会尝试合并
    }
    tcp_data_queue_ofo(sk, skb);    //乱续包,添加到ofo队列
}
static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
{
    struct tcp_sock *tp = tcp_sk(sk);
    if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
        int mib_idx;
        if (before(seq, tp->rcv_nxt))
            mib_idx = LINUX_MIB_TCPDSACKOLDSENT;
        else
            mib_idx = LINUX_MIB_TCPDSACKOFOSENT;
        NET_INC_STATS(sock_net(sk), mib_idx);
        tp->rx_opt.dsack = 1;
        tp->duplicate_sack[0].start_seq = seq;
        tp->duplicate_sack[0].end_seq = end_seq;
    }
}

tcp_sack_remove

在tcp_data_queue中,如果收到的包填补了之前的hole,就会更新rcv_next, 如果有历史的sack信息,则会删除失效的部分

    /* Empty ofo queue, hence, all the SACKs are eaten. Clear. */
    if (RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
        tp->rx_opt.num_sacks = 0; //已经没有ofo了,所有sack都被确认
        return;
    }
    for (this_sack = 0; this_sack < num_sacks;) {
        /* Check if the start of the sack is covered by RCV.NXT. */
        if (!before(tp->rcv_nxt, sp->start_seq)) {    //rcv_nxt >= start_seq, 说明当前sack的部分,已经被ofo合并了,可以删除这个sack
            int i;
            /* RCV.NXT must cover all the block! */
            WARN_ON(before(tp->rcv_nxt, sp->end_seq));
            /* Zap this SACK, by moving forward any other SACKS. */
            for (i = this_sack+1; i < num_sacks; i++)
                tp->selective_acks[i-1] = tp->selective_acks[i];    //后面的sack前移,删除当前的sack
            num_sacks--;
            continue;
        }
        this_sack++;    //rcv_nxt < start_seq; 说明还是ofo,不处理
        sp++;
    }
    tp->rx_opt.num_sacks = num_sacks;
}

 

tcp_data_queue_ofo

tcp_data_queue_ofo主要做的就是遍历ofo rbtree, 如果是重复的包,则设置重传的部分为dsack,非重传部分插入ofo rbtree中, 并更新sack。
tp->selective_acks存放里历史的sack记录,如果收到的包填补了hole,则会在tcp_sack_remove中删除无效的部分。
在tcp_data_queue_ofo中,如果更新tp->selective_acks, 则当前包的信息会放在tp->selective_acks[0]中,
也就是说tp->selective_acks[]中的顺序是根据收到乱序包的先后顺序排列的,如果超过数组最大长度,则最旧的sack会被丢弃

static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
{
    ...
    p = &tp->out_of_order_queue.rb_node;
    if (RB_EMPTY_ROOT(&tp->out_of_order_queue)) {    //ofo队列中为空,简单插入新的sack
        /* Initial out of order segment, build 1 SACK. */
        if (tcp_is_sack(tp)) {
            tp->rx_opt.num_sacks = 1;
            tp->selective_acks[0].start_seq = seq;
            tp->selective_acks[0].end_seq = end_seq;
        }
        rb_link_node(&skb->rbnode, NULL, p);
        rb_insert_color(&skb->rbnode, &tp->out_of_order_queue);
        tp->ooo_last_skb = skb;
        goto end;
    }
    /* In the typical case, we are adding an skb to the end of the list.
     * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup.
     */
    if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) {    //对于普遍场景,先尝试合并skb到上一个乱序包
coalesce_done:    //合并完成
        tcp_grow_window(sk, skb);    //尝试增加窗口通告
        kfree_skb_partial(skb, fragstolen);    //skb已经被合并,可以释放
        skb = NULL;
        goto add_sack;
    }
    /* Can avoid an rbtree lookup if we are adding skb after ooo_last_skb */
    if (!before(seq, TCP_SKB_CB(tp->ooo_last_skb)->end_seq)) {    //如果序号比ooo_last_skb大,则可以直接添加,避免查找
        parent = &tp->ooo_last_skb->rbnode;
        p = &parent->rb_right;    //添加到ooo_last_skb的右子树
        goto insert;
    }
    //需要查找这个ofo包的添加位置
    /* Find place to insert this segment. Handle overlaps on the way. */
    parent = NULL;
    while (*p) {
        parent = *p;
        skb1 = rb_entry(parent, struct sk_buff, rbnode);
        if (before(seq, TCP_SKB_CB(skb1)->seq)) {    //比当前节点小,添加到左子树
            p = &parent->rb_left;
            continue;
        }
        // skb1->seq <= seq
        if (before(seq, TCP_SKB_CB(skb1)->end_seq)) { //skb1->seq <= seq < skb1->end_seq
            if (!after(end_seq, TCP_SKB_CB(skb1)->end_seq)) {    //序号所有部分都已经在当前节点
                /* All the bits are present. Drop. */
                NET_INC_STATS(sock_net(sk),
                          LINUX_MIB_TCPOFOMERGE);
                __kfree_skb(skb);    //重复包,直接释放
                skb = NULL;
                tcp_dsack_set(sk, seq, end_seq);    //添加到dsack
                goto add_sack;
            }
            // skb1->seq <= seq < skb1->end_seq  &&  skb1->end_seq < end_seq
            if (after(seq, TCP_SKB_CB(skb1)->seq)) {    //有部分重叠
                /* Partial overlap. */
                tcp_dsack_set(sk, seq, TCP_SKB_CB(skb1)->end_seq);    //设置重叠部分dsack
            } else {  //skb1->seq = seq <= skb1->end_seq < end_seq
                /* skb's seq == skb1's seq and skb covers skb1.
                 * Replace skb1 with skb.
                 */    //skb中包含了全部的skb1
                rb_replace_node(&skb1->rbnode, &skb->rbnode,    //使用skb替换skb1
                        &tp->out_of_order_queue);
                tcp_dsack_extend(sk,    //设置或合并现有dsack设置
                         TCP_SKB_CB(skb1)->seq,    //因为skb包含了全部skb1部分,则整个skb1都被重传了
                         TCP_SKB_CB(skb1)->end_seq);
                NET_INC_STATS(sock_net(sk),
                          LINUX_MIB_TCPOFOMERGE);
                __kfree_skb(skb1);    //释放skb1
                goto merge_right;    //还要继续查看skb1的右子数有没有需要合并的部分
            }
        } else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) { // skb1->seq < skb1->end_seq <= seq
            goto coalesce_done;    //尝试合并
        }
        p = &parent->rb_right;    //比当前加点大,查找右子树
    }
insert:
    /* Insert segment into RB tree. */
    rb_link_node(&skb->rbnode, parent, p);        //找到合适位置后插入ofo队列
    rb_insert_color(&skb->rbnode, &tp->out_of_order_queue);
merge_right:
    /* Remove other segments covered by skb. */
    while ((q = rb_next(&skb->rbnode)) != NULL) {    //查看右子树中有没需要合并的节点
        skb1 = rb_entry(q, struct sk_buff, rbnode);
        if (!after(end_seq, TCP_SKB_CB(skb1)->seq))    //没有交集,不需要合并
            break;
        if (before(end_seq, TCP_SKB_CB(skb1)->end_seq)) {    //有交集
            tcp_dsack_extend(sk, TCP_SKB_CB(skb1)->seq,    //更新dsack
                     end_seq);
            break;
        }
        //完全包含当前节点,删除该节点,并更新dsack
        rb_erase(&skb1->rbnode, &tp->out_of_order_queue);
        tcp_dsack_extend(sk, TCP_SKB_CB(skb1)->seq,
                 TCP_SKB_CB(skb1)->end_seq);
        NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOMERGE);
        tcp_drop(sk, skb1);    //可以删除skb1
    }
    /* If there is no skb after us, we are the last_skb ! */
    if (!q)
        tp->ooo_last_skb = skb;    //没有下一个skb了,更新ooo_last_skb
add_sack:
    if (tcp_is_sack(tp))
        tcp_sack_new_ofo_skb(sk, seq, end_seq);    //添加新的sack
end:
    if (skb) {    //没有被合并
    //https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f
    //跟in-order包一样,调整窗口
        tcp_grow_window(sk, skb);
        skb_set_owner_r(skb, sk);
    }
}
static void tcp_sack_new_ofo_skb(struct sock *sk, u32 seq, u32 end_seq)
{
    struct tcp_sock *tp = tcp_sk(sk);
    struct tcp_sack_block *sp = &tp->selective_acks[0];
    int cur_sacks = tp->rx_opt.num_sacks;
    int this_sack;
    if (!cur_sacks)    //还没有sack, 直接更新
        goto new_sack;
    for (this_sack = 0; this_sack < cur_sacks; this_sack++, sp++) {
        if (tcp_sack_extend(sp, seq, end_seq)) {    //尝试合并
            /* Rotate this_sack to the first one. */    //合并成功
            for (; this_sack > 0; this_sack--, sp--)    //把合并成功的sack移动selective_acks[0]
                swap(*sp, *(sp - 1));
            if (cur_sacks > 1)
                tcp_sack_maybe_coalesce(tp);    //尝试把后面的sack合并到selective_acks[0]
            return;
        }
    }
    //没有找到能合并的sack
    /* Could not find an adjacent existing SACK, build a new one,
     * put it at the front, and shift everyone else down.  We
     * always know there is at least one SACK present already here.
     *
     * If the sack array is full, forget about the last one.
     */
    if (this_sack >= TCP_NUM_SACKS) {    //已经到最大值了
        this_sack--;
        tp->rx_opt.num_sacks--;    //直接释放最后一个位置
        sp--;
    }
    for (; this_sack > 0; this_sack--, sp--)
        *sp = *(sp - 1);    //把前面的后移
    //新的sack放到最前面的位置
new_sack:
    /* Build the new head SACK, and we're done. */
    sp->start_seq = seq;
    sp->end_seq = end_seq;
    tp->rx_opt.num_sacks++;
}

目前properly increase rcv_ssthresh for ofo packets

TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

也就是:之前只有in-order packet才会grow_window吗,但是这样有个问题,在packet loss以及ofo的时候,没有足够的window,会导致不能快速重传

但是最新结果表明: a sender needs to receive three duplicate ACK before starting fast retransmit. Sending ACK with different receive window do not count. Even if enabling SACK is strongly recommended these days, there still are some cases where it has to be disabled. Not increasing the window seems better than having to rely on RTO.

也就是触发快速重传需要连续3个dupack,但是如果是如果收到dupack的时候window不一样,就不会认为此ack为dupack。所以不会触发快速重传。

所以最新逻辑为:

:
	if (skb) {
		/* For non sack flows, do not grow window to force DUPACK
		 * and trigger fast retransmit.
		 */
		if (tcp_is_sack(tp))
			tcp_grow_window(sk, skb);
		skb_condense(skb);
		skb_set_owner_r(skb, sk);
	}

tcp_send_ack

收到乱序包或者重传包都会快速ack,
调用tcp_send_ack->tcp_transmit_skb->tcp_options_write
在tcp_options_write中更新tcp中dsack和sack选项。如果有dsack,则dsack一定会是第一个sack选项。

struct tcp_sock {
    ...
    /* SACKs data, these 2 need to be together (see tcp_options_write) */
    struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */    //只有一个,且在sack前面
    struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
    ...
};
static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
                  struct tcp_out_options *opts)
{
    ...
    if (unlikely(opts->num_sack_blocks)) {
        struct tcp_sack_block *sp = tp->rx_opt.dsack ?
            tp->duplicate_sack : tp->selective_acks;
        int this_sack;
        *ptr++ = htonl((TCPOPT_NOP  << 24) |
                   (TCPOPT_NOP  << 16) |
                   (TCPOPT_SACK <<  8) |
                   (TCPOLEN_SACK_BASE + (opts->num_sack_blocks *
                             TCPOLEN_SACK_PERBLOCK)));
        for (this_sack = 0; this_sack < opts->num_sack_blocks;
             ++this_sack) {
            *ptr++ = htonl(sp[this_sack].start_seq);
            *ptr++ = htonl(sp[this_sack].end_seq);
        }
        tp->rx_opt.dsack = 0;
    }
    ...
}

sack Reneging

接收方收到数据包后需要查看接收缓存是否足够,如果不够,可能会删除ofo中的数据,同时也要重置保存的sack的信息。
这时候发送方收到的sack信息就会失效。
Renege的意思就是违约, 因此sack只是一种hint,接收方不会保证sack的数据不会释放

static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb,
                 unsigned int size)
{
    if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || //接收缓存不够
        !sk_rmem_schedule(sk, skb, size)) {    //并且超过系统设置的最大分配空间了
        if (tcp_prune_queue(sk) < 0)    //尝试合并ofo/sk_receive_queue来腾出空间
            return -1;    //还是不够,超过sk_rcvbuf, 返回失败
        while (!sk_rmem_schedule(sk, skb, size)) {    //再次确认
            if (!tcp_prune_ofo_queue(sk)) //释放ofo队列中的数据
                return -1;
        }
    }
    return 0;
}
static bool tcp_prune_ofo_queue(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);
    struct rb_node *node, *prev;
    if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
        return false;
    NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
    node = &tp->ooo_last_skb->rbnode;
    do {
        prev = rb_prev(node);
        rb_erase(node, &tp->out_of_order_queue);
        tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
        sk_mem_reclaim(sk);
        if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
            !tcp_under_memory_pressure(sk))
            break;
        node = prev;
    } while (node);
    tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);
    /* Reset SACK state.  A conforming SACK implementation will
     * do the same at a timeout based retransmit.  When a connection
     * is in a sad state like this, we care only about integrity
     * of the connection not performance.
     */
    if (tp->rx_opt.sack_ok)
        tcp_sack_reset(&tp->rx_opt);    //ofo中的数据由于接收缓存不足被删除,sack不能反映ofo中信息了,需要重置清空
    return true;
}

 

 

http://www.cnhalo.net/2016/10/13/linux-tcp-dsack-sack/

SACK & D-SACK Interaction
No difference between SACK & D-SACK, except that first SACK block is used to report a duplicate PDU
in D-SACK.
No separate negotiation / options for D-SACK.
D-SACK is compatible with current implementations of SACK option in TCP.

D-SACK TCP & Retransmissions
D-SACK allows TCP to determine when retransmission is not necessary and thereby undo congestion
control measures.
D-SACK allows TCP to determine if the network is duplicating packets.
D-SACK does not allow a sender to determine if both the original and retransmitted PDU are received,
or the original is lost and the retransmitted PDU is duplicated by the network.

Extension to SACK
When D-SACK is used, the first block of the SACK option should specify the duplicate PDU.
D-SACK block is used to report duplicate contiguous sequence of data received by the receiver in the
most recent packet.
Each duplicate is reported at most once.
Allows the sender TCP to determine when a retransmission is not necessary.

D-SACK Rules
If D-SACK block reports duplicate PDU from (possibly larger) block of data in the receiver buffer above
the cumulative acknowledgement, the second SACK block (the first non D-SACK block) should specify
this block.
The first SACK block is considered as D-SACK block, if multiple sequences are duplicated, only the
first is contained in the D-SACK block.

Motivations
This could be used to make TCP more robust to reordered packets, ACK loss, packet replication,
and/or early retransmit timeouts.
As a result, TCP could be more robust in environments with:
link-level retransmissions
widely-varying round trip times
loss on the return (ACK) path
routing mechanisms that result in packet reordering

D-SACK处理流程
1)look at the first SACK block :
—If the first SACK block is covered by the Cumulative Acknowledgement field, then it is a D-SACK
block, and is reporting duplicate data.
—Else, if the first SACK block is covered by the second SACK block, then the first SACK block is a
D-SACK block, and is reporting duplicate data.

2)otherwise, interpret the SACK blocks using the normal SACK procedures.

D-SACK作用举例
1)Identify a retransmit timeout due to ACK loss

2)Identify an early retransmission timeout

3)Identify a false retransmit due to reordering

4)Identify data packet replication in the network

posted @ 2019-11-23 22:07  codestacklinuxer  阅读(637)  评论(0编辑  收藏  举报