排查 “Detected Tx Unit Hang”问题
实现功能:
使用自己已经分配的内存让skb->data指向,而不是使用alloc_malloc()。
部分代码如下:
1 /* 2 * build a new sk_buff 3 */ 4 //struct sk_buff *send_skb = kmem_cache_alloc_node(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA, NUMA_NO_NODE); 5 struct sk_buff *send_skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA); 6 7 if (!send_skb) { 8 //spin_unlock(&lock); 9 return NF_DROP; 10 } 11 12 //printk("what2\n"); 13 memset(send_skb, 0, offsetof(struct sk_buff, tail)); 14 atomic_set(&send_skb->users, 2); 15 send_skb->cloned = 0; 16 17 send_skb->head = mmap_buf + 1024; 18 send_skb->data = mmap_buf + 1024; 19
第18行,mmap_buf是提前分配的内存。
在/var/log/messages中网卡驱动会输出错误信息:
1 ep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 2 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <13> 3 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <0>, <1ea> 4 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1ea> 5 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <0> 6 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 7 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <15> 8 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <1>, <1eb> 9 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1eb> 10 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <1> 11 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 12 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <14> 13 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <0>, <1ea> 14 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1ea> 15 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <0> 16 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 17 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <4> 18 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <0>, <1ea> 19 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1ea> 20 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <0> 21 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 22 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <12> 23 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <5>, <1ef> 24 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1ef> 25 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <5> 26 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang 27 Sep 28 15:36:17 10g-host2 kernel: Tx Queue <2> 28 Sep 28 15:36:17 10g-host2 kernel: TDH, TDT <2>, <1ec> 29 Sep 28 15:36:17 10g-host2 kernel: next_to_use <1ec> 30 Sep 28 15:36:17 10g-host2 kernel: next_to_clean <2> 31 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
在排除各种原因后,定位为分配的mmap_buf存在问题。使用vmalloc()分配不正确,改为kmalloc()后正常。
《Linux内核设计与实现》第12.5节有解释,应该是:网卡设备要求分配的物理地址连续,而vmalloc()只是虚拟地址连续