free batches of packets in tcp_prune_ofo_queue()
之前在做waf并发压力测试的时候,遇到一个问题,仪器测试正常,但是真实环境测试超时丢包的验证的时候,并发cps都很低。
查看cat /proc/net/netstat发现OfoPruned 对应值很大,看内核代码才发现,内存不够或rmem超过sk_rcvbuf,就会私房ofo队列,还是全部释放。当时将全部释放改为释放最高的50%,效果明显。
今天查看新的内核发现依旧修改了。
在TCP套接口接收数据过程中,如果套接口接收缓存已经大于限定的套接口缓存限值,或者TCP系统占用的缓存已超过限定的总阈值,内核将使用tcp_prune_queue函数尝试回收接收队列占用的缓存。首先使用tcp_collapse_ofo_queue函数尝试合并out_of_order_queue队列中的重复数据,之后使用tcp_collapse函数尝试将sk_receive_queue队列中的数据折叠到少量的skb结构中;最后如果接收缓存还是占用过高,调用函数tcp_prune_ofo_queue删除out_of_order_queue队列中的数据包。
目前看最新的改动原则有:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/net/ipv4/tcp_input.c?id=36a6503feddadbbad415fb3891e80f94c10a9b21
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
1、让漏洞有机会被填补。这意味着如果数据包的序列位于传入数据包序列之前。
2、如果有数千个数据包停留在那里,则不会增加太大的延迟。(但如果应用程序缩小 SO_RCVBUF,我们仍然可能会此处释放整个队列)
针对这一块最新的内核改动如下:
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct rb_node *node, *prev;
bool pruned = false;
int goal;
if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
return false;
goal = sk->sk_rcvbuf >> 3;
node = &tp->ooo_last_skb->rbnode;
do {
struct sk_buff *skb = rb_to_skb(node);
/* If incoming skb would land last in ofo queue, stop pruning. */
if (after(TCP_SKB_CB(in_skb)->seq, TCP_SKB_CB(skb)->seq))
break;
pruned = true;
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
goal -= skb->truesize;
tcp_drop_reason(sk, skb, SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
tp->ooo_last_skb = rb_to_skb(prev);
if (!prev || goal <= 0) {
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
break;
goal = sk->sk_rcvbuf >> 3;
}
node = prev;
} while (node);
if (pruned) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
* is in a sad state like this, we care only about integrity
* of the connection not performance.
*/
if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt);
}
return pruned;
}
需要注意的是:
1、/* If incoming skb would land last in ofo queue, stop pruning. */---
2、只有发生了重传队列修剪才会重置sack选项信息
After commits 36a6503fedda ("tcp: refine tcp_prune_ofo_queue()
to not drop all packets") and 72cd43ba64fc1
("tcp: free batches of packets in tcp_prune_ofo_queue()")
tcp_prune_ofo_queue() drops a fraction of ooo queue,
to make room for incoming packet.
However it makes no sense to drop packets that are
before the incoming packet, in sequence space.
In order to recover from packet losses faster,
it makes more sense to only drop ooo packets
which are after the incoming packet.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
+ * This means we do not drop packets from ooo queue if their sequence
+ * is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
@@ -5336,24 +5338,31 @@ new_range:
*
* Return true if queue has shrunk.
*/
-static bool tcp_prune_ofo_queue(struct sock *sk)
+static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct rb_node *node, *prev;
+ bool pruned = false;
int goal;
if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
return false;
- NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
goal = sk->sk_rcvbuf >> 3;
node = &tp->ooo_last_skb->rbnode;
+
do {
+ struct sk_buff *skb = rb_to_skb(node);
+
+ /* If incoming skb would land last in ofo queue, stop pruning. */
+ if (after(TCP_SKB_CB(in_skb)->seq, TCP_SKB_CB(skb)->seq))
+ break;
+ pruned = true;
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
- goal -= rb_to_skb(node)->truesize;
- tcp_drop_reason(sk, rb_to_skb(node),
- SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
+ goal -= skb->truesize;
+ tcp_drop_reason(sk, skb, SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
+ tp->ooo_last_skb = rb_to_skb(prev);
if (!prev || goal <= 0) {
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
@@ -5362,16 +5371,18 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
}
node = prev;
} while (node);
- tp->ooo_last_skb = rb_to_skb(prev);
- /* Reset SACK state. A conforming SACK implementation will
- * do the same at a timeout based retransmit. When a connection
- * is in a sad state like this, we care only about integrity
- * of the connection not performance.
- */
- if (tp->rx_opt.sack_ok)
- tcp_sack_reset(&tp->rx_opt);
- return true;
+ if (pruned) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+ /* Reset SACK state. A conforming SACK implementation will
+ * do the same at a timeout based retransmit. When a connection
+ * is in a sad state like this, we care only about integrity
+ * of the connection not performance.
+ */
+ if (tp->rx_opt.sack_ok)
+ tcp_sack_reset(&tp->rx_opt);
+ }
+ return pruned;
}
第一版:to not drop all packets
第二版本:每批次清楚队列12.5%的容量
第三版本:在序列空间中丢弃传入数据包之前的数据包是没有意义的。为了更快地从数据包丢失中恢复,只丢弃传入数据包之后的 ofo 数据包更有意义
当时自己为了解决这个问题时;修改内核的时候,默认删除50%的量直接就干了
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!