Linux 端口限流(tc + iptables)

Linux 端口限流(tc + iptables)

 

关于 iptables

iptables 是包过滤软件,包过滤的顺序如下:

每一个包都会匹配 rule 策略,而每一个 rule 策略会有一个 action,触发了其中一个 rule 就不会触发另外一个 rule,但如果要触发的 rule 放在最后面,那么可以想象,包过滤的效率就会大大降低,所以设计策略的时候要尽量将常用的策略放在最前面,策略的顺序可以通过不断的调整 -A 和 -I 策略,甚至还有 return 的动作,设计 iptables 的人真的很厉害。

这是 iptables 内部的 table 表和 chain 链,可以理解为 iptables 是一个大网,table 就是小网,里面的 chain 就是他的网线,当数据包经过这个小网的时候必然会触碰这些网线,这样“看不顺眼”的数据包就会被拦住。鸟哥的图画的真好。这里需要理解的是数据包的流行会分 2 个地方,就是进入本机或者不进入本机,进入本机的包就会走 input 的 chain 链,不进入本机的包就会去 FORWARD,什么是进入或者不进入呢?

举个例子就是这是一台路由器服务器,服务器上面假设了 web 服务器,然后这个路由器负责的内部网络还有一台数据库服务器,不过这台服务器是独立于路由器的另外一台机器,不过上网也是要经过路由器,那么一个外网用户访问这个 web 服务器和访问数据库服务器的行为就是进入本机和不进入本机的行为,因为 web 服务器是跟路由器在同一台机器上的,所以要进入本机,因为数据库服务器是另外一台机器上的,所以不进入本机。解释得好渣,还是看鸟哥吧。鸟哥乃神人。回归主题,看下图的结构,可以看出如果我们要在 iptables 上操刀的话可以在任何表上操刀,例如可以在 PREROUTING,FORWARD,POSTROUTING 表上做限速是完全没有问题的,前提是要注意不能冲突,每个表都有各自的作用。所以一般来说,要写 iptables 策略的时候都要跟着这个图来笔画一下,这样才能知道有没有写错。

  1. filter (過濾器):主要跟進入 Linux 本機的封包有關,這個是預設的 table 喔!
  2. INPUT:主要與想要進入我們 Linux 本機的封包有關;
  3. OUTPUT:主要與我們 Linux 本機所要送出的封包有關;
  4. FORWARD:這個咚咚與 Linux 本機比較沒有關係, 他可以『轉遞封包』到後端的電腦中,與下列 nat table 相關性較高。
  5. nat (位址轉換):是 Network Address Translation 的縮寫, 這個表格主要在進行來源與目的之 IP 或 port 的轉換,與 Linux 本機較無關,主要與 Linux 主機後的區域網路內電腦較有相關。
  6. PREROUTING:在進行路由判斷之前所要進行的規則(DNAT/REDIRECT)
  7. POSTROUTING:在進行路由判斷之後所要進行的規則(SNAT/MASQUERADE)
  8. OUTPUT:與發送出去的封包有關
  9. mangle (破壞者):這個表格主要是與特殊的封包的路由旗標有關, 早期僅有 PREROUTING 及 OUTPUT 鏈,不過從 kernel 2.4.18 之後加入了 INPUT 及 FORWARD 鏈。 由於這個表格與特殊旗標相關性較高,所以像咱們這種單純的環境當中,較少使用 mangle 這個表格。
Table (表名)Explanation (注释)
nat nat 表的主要用处是网络地址转换,即 Network Address Translation,缩写为 NAT。做过 NAT 操作的数据包的地址就被改变了,当然这种改变是根据我们的规则进行的。属于一个流的包只会经过这个表一次。如果第一个包被允许做 NAT 或 Masqueraded,那么余下的包都会自 动地被做相同的操作。也就是说,余下的包不会再通过这个表,一个一个的被 NAT,而是自动地完成。这就 是我们为什么不应该在这个表中做任何过滤的主要原因,对这一点,后面会有更加详细的讨论。PREROUTING 链的作用是在包刚刚到达防火墙时改变它的目的地址,如果需要的话。 OUTPUT 链改变本地产生的包的目的地址。POSTROUTING 链在包就要离开防火墙之前改变其源地址。
mangle 这个表主要用来 mangle 数据包。我们可以改变不同的包及包 头的内容,比如 TTL,TOS 或 MARK。 注意 MARK 并没有真正地改动数据包,它只是在内核空间为包设了一个标记。防火墙 内的其他的规则或程序(如 tc)可以使用这种标记对包进行过滤或高级路由。这个表有五个内建的链: PREROUTING,POSTROUTING, OUTPUT,INPUT 和 FORWARD。PREROUTING 在包进入防火墙之后、路由判断之前改变 包, POSTROUTING 是在所有路由判断之后。 OUTPUT 在确定包的目的之前更改数据包。INPUT 在包被路由到本地 之后,但在用户空间的程序看到它之前改变包。FORWARD 在最初的路由判 断之后、最后一次更改包的目的之前mangle包。注意,mangle 表不能做任何 NAT,它只是改变数据包的 TTL,TOS 或 MARK,而不是其源目地址。NAT 是在nat 表中操作的。
filter filter 表是专门过滤包 的,内建三个链,可以毫无问题地对包进行 DROP、LOG、ACCEPT 和 REJECT 等操作。FORWARD 链过滤所有不是本地产生的并且目的地不是本地(所谓本地就是防火墙了)的包,而 INPUT 恰恰针对那些目的地是本地的包。OUTPUT 是用来过滤所有本地生成的包的

  • iptables 是主要工作在第三,四层的,即主要处理 ip、tcp,偶尔能够在第七层工作是因为打了 patch。
  • 什么是数据包:其实就是只 ip 数据包和 tcp 数据包

    包(Packet)是 TCP/IP 协议通信传输中的数据单位,一般也称“数据包”。有人说,局域网中传输的不是“帧”(Frame)吗?没错,但是 TCP/IP 协议是工作在 OSI 模型第三层(网络层)、第四层(传输层)上的,而帧是工作在第二层(数据链路层)。上一层的内容由下一层的内容来传输,所以在局域网中,“包”是包含在“帧”里的。

举例来说 tcp 包的包头含有以下这些信息(等等):

信息解释iptables 关键字
源 IP 地址 发送包的 IP 地址。 src
目的 IP 地址 接收包的 IP 地址。 dst
源端口 源系统上的连接的端口。 sport
目的端口 目的系统上的连接的端口。 dport

 

关于 tc

TC--Traffic Control

TC 是 linux 中的流量控制模块,利用队列规定建立起数据包队列,并定义了队列中数据包的发送方式,从而实现对流量的控制。关键字:队列系统,包接收和传输。

Traffic control is the name given to the sets of queuing systems and mechanisms by which packets are received and transmitted on a router. This includes deciding which (and whether) packets to accept at what rate on the input of an interface and determining which packets to transmit in what order at what rate on the output of an interface.

这里是官方翻译:http://my.oschina.net/guol/blog/82453?p=1
原版:http://www.tldp.org/HOWTO/Traffic-Control-HOWTO/

tc工作位置图:

在我使用的过程中,对于他的理解是有一些加深了:

  1. tc 就是看门的,like as a dog,所以这就可以解释了为什么要 iptables + tc 了,tc 能够和 iptables 合作,因为可以从图上看到各自工作的位置是不一样的,各施其职。
  2. tc 对于包一视同仁,专门负责包的排队分发,官方里面提到一个很经典的说法就是他是一个接收和传输的队列系统,tc 翻译为交通管制是很巧妙的,有鉴于此,我认为他的限速效果最好,无论你是 p2p 包还是什么加密包,只要是包就要受到约束,这样就可以避免了那些日新月异的封装加密之类的包被逃掉了。
  3. tc 主要是以 mark 的形式来匹配,所以使用的时候 mark 标记需要注意不要冲突,mark 标记是 iptables 里面提到的一个东西:

    6.5.5. MARK target

    用来设置 mark 值,这个值只能在本地的 mangle 表里使用,不能用在其他任何地方,就更不用说路由器或 另一台机子了。因为 mark 比较特殊,它不是包本身的一部分,而是在包穿越计算机的过程中由内核分配的和它相关联的一个字段。它可以和本地的高级路由功能联用,以使不同的包能使用不同的队列要求,等等。如 果你想在传输过程中也有这种功能,还是用 TOS target 吧。有关高级路由的更多信息,可以查看 Linux Advanced Routing and Traffic Control HOW-TO。

    mark 只能存在于内核之中,不受三界法则影响,所以 mark 值我觉得是配置 tc 的特别需要注意的地方,尤其是如果你使用了 wifidog 之类的要玩 mark 的时候。

  4. tc 的类是树架构,有主干和叶这样很分明的区分的,这种层次是很容易理解的,不过文档的解释是相当的难理解,难理解的是怎么做,命令写法简直坑爹。

  5. 涉及很多相当高深的队列算法,流控制模式其实略懂就行了,诸葛先生不也就略懂么。所以不是那种极端情况其实无须特别考虑这个。
  6. 对于 tc 来说,上传和下载行为是这样区分的,上传,就是用户端发送数据包给服务器,假设路由器是双网卡,所以负责发送数据包给服务器的是外网网卡,所以限制上传速度在外网网卡处, 下载,就是服务器发送数据包给用户,因为路由器是双网卡的关系,所以负责发送数据包给用户的是内网网卡,所以限制下载速度是在内网网卡,因为 tc 是一个能够负责接收数据包的工具,所以限制上传速度其实就是限制外网网卡接收用户发送的数据包的速度,而限制下载速度其实就是限制内网网卡接收到要发送给用户的数据包的速度。

 

测试流程介绍

  1. 首先需要建立 tc 策略
  2. 然后由 iptables 来进行调用,主要通过 set mark,根据不同的 mark 标记来进行不同的 tc 策略调用

备注

  1. 测试环境是 eth0 负责外网,p3p1 是负责内网
  2. 考虑到特殊需求,tc 限制的是所有的包,所以需要 iptables 将发到内网服务器的包分开处理,以便实现访问外网能够限制网速,访问内网没有限制

上传

清除 eth0 所有队列规则

tc qdisc del dev eth0 root 2>/dev/null

定义最顶层(根)队列规则,并指定 default 类别编号,为网络接口 eth1 绑定一个队列,类型为 htb,并指定了一个 handle 句柄 1:0 用于标识它下面的子类,没有标识的会被分配到默认子类 123(默认值只是设置而已,可以不用)

tc qdisc add dev eth0 root handle 1:0 htb default 123

用于为队列建一个主干类,带宽为 100 Mbit,最大速率为 100 Mbit,(这里是 bit,所以实际速度需要除以 8)优先级为 0,htb 的主干类不能互相借用带宽,但是一个父类的所有子类之间可以借用带宽,这里 parent 1:0 是刚才建立的 handle1:0 ,classid 是他的子类,分类号为 1:1,冒号前面是父类号,后面是子类号

tc class add dev eth0 parent 1:0 classid 1:1 htb rate 100Mbit ceil 100Mbit prio 0

为主干类建立第一个叶分类,带宽为 10Mbit,最大速为 10 Mbit,优先级为1,所有叶分类的全部子类优先级低于主干类,以防止重要数据堵塞,主要还是避免逻辑混乱,10 Mbit 必须要有 96 kbit 的 burst 速度

tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10Mbit ceil 10Mbit prio 1 burst 96kbit

设置调度,sfq 随机公平算法,这里的 parent 是指隶属于之前的子分类,你需要对哪一个子分类的条目做队列分配控制就需要在这里写对应的子分类 id 在每个类下面再附加上另一个队列规定,随机公平队列(SFQ),不被某个连接不停占用带宽,以保证带宽的平均公平使用:

  1. #SFQ(Stochastic Fairness Queueing,随机公平队列),SFQ的关键词是“会话”(或称作“流”) ,
  2. #主要针对一个TCP会话或者UDP流。流量被分成相当多数量的FIFO队列中,每个队列对应一个会话。
  3. #数据按照简单轮转的方式发送, 每个会话都按顺序得到发送机会。这种方式非常公平,保证了每一
  4. #个会话都不会没其它会话所淹没。SFQ之所以被称为“随机”,是因为它并不是真的为每一个会话创建
  5. #一个队列,而是使用一个散列算法,把所有的会话映射到有限的几个队列中去。
  6. #参数perturb是多少秒后重新配置一次散列算法。默认为10
tc qdisc add dev eth0 parent 1:11 handle 111:0 sfq perturb 10

设置过滤器 filter,对应之前配置的哪一个父类和子类,然后设置控制编号 handle,这里是跟 iptables 的 mark 相对应的,并且多个不同的filter注意 prio 不要相同。

tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1001 fw classid 1:11

设置 iptables规则,在 mangle 表的 postroutingchain 上配置,源地址是 172.16.1.138 并且目标地址不是 192.168.0.10,从网卡eth0发出的包进行 mark,mark 号是 1001

iptables -t mangle -A POSTROUTING -s 172.16.1.138/32 ! -d 192.168.0.10 -o eth0 -j MARK --set-xmark 1001

设置 return 是为了加快包检查,return 的顺序是:子链——>父链——>缺省的策略,检查到源地址是 172.16.1.138 并且目标地址不是 192.168.0.10 的包就会跳到 postrouting 层,然后会继续检查其他这层的 chain,这样不用每个包都要检查一次这条 chain 的内容了,加快了一倍的速度

iptables -t mangle -A POSTROUTING -o eth0 -s 172.16.1.138 ! -d 192.168.0.10 -j RETURN

下载

  1. tc qdisc del dev p3p1 root 2>/dev/null
  2. tc qdisc add dev p3p1 root handle 1:0 htb default 123
  3. tc class add dev p3p1 parent 1:0 classid 1:1 htb rate 100Mbit ceil 100Mbit prio 0
  4. tc class add dev p3p1 parent 1:1 classid 1:11 htb rate 10Mbit ceil 10Mbit prio 1
  5. tc qdisc add dev p3p1 parent 1:11 handle 111:0 sfq perturb 10
  6. tc filter add dev p3p1 parent 1:0 protocol ip prio 1 handle 1000 fw classid 1:11

这里用 I 的是 insert 一条配置,这样排序会放在前面,因为 iptables 是按顺序匹配的,并且为了跟 wifidog 的策略避免冲突

  1. iptables -t mangle -I POSTROUTING -o p3p1 -d 172.16.1.138 ! -s 192.168.0.10 -j MARK --set-mark 1000
  2. iptables -t mangle -I POSTROUTING -o p3p1 -d 172.16.1.138 ! -s 192.168.0.10 -j RETURN

 

================

 

htb基础知识:Linux Htb队列规定指南中文版:http://wenku.baidu.com/view/64da046825c52cc58bd6beac.html

TC基础知识: Linux 的高级路由和流量控制LARTC
http://download.csdn.net/detail/wuwentao2000/3963140

 

iptables基础知识: 中文howto:http://man.chinaunix.net/network/iptables-tutorial-cn-1.1.19.html

 

来源一:http://www.right.com.cn/forum/viewthread.php?tid=71981&highlight=QOS

#现在开始用TC建立数据的上行和下行通道
TCA="tc class add dev br0"
TFA="tc filter add dev br0"
tc qdisc del dev br0 root
tc qdisc add dev br0 root handle 1: htb
tc class add dev br0 parent 1: classid 1:1 htb rate 1600kbit            #这个1600是下行总速度
$TCA parent 1:1 classid 1:10 htb rate 200kbit ceil 400kbit prio 2     #这个是10号通道的下行速度,最小200,最大400,优先级为2
$TCA parent 1:1 classid 1:25 htb rate 1000kbit ceil 1600kbit prio 1   #这是我自己使用的特殊25号通道,下行速度最小1000,最大1600,优先级为1, 呵呵,待遇就是不一样
$TFA parent 1:0 prio 2 protocol ip handle 10 fw flowid 1:10             
$TFA parent 1:0 prio 1 protocol ip handle 25 fw flowid 1:25
tc qdisc add dev br0 ingress
$TFA parent ffff: protocol ip handle 35 fw police rate 800kbit mtu 12k burst 10k drop      #这是我自己使用的35号上行通道,最大速度800
$TFA parent ffff: protocol ip handle 50 fw police rate 80kbit mtu 12k burst 10k drop         #这是给大伙使用的50号上行通道,最大速度80

#好了,现在用iptables来觉得哪些人走哪些通道吧,哈哈,由于dd wrt的iptables不支持ip range,所以只能每个IP写一条语句,否则命令无效

iptables -t mangle -A POSTROUTING -d 192.168.1.22 -j MARK --set-mark 10     #ip为192.168.1.22的走10号通道
iptables -t mangle -A POSTROUTING -d 192.168.1.22 -j RETURN                        #给每条规则加入RETURN,这样效率会更高.
iptables -t mangle -A POSTROUTING -d 192.168.1.23 -j MARK --set-mark 25      #ip为192.168.1.23的走25号特殊通道,23是我的ip,所以特殊点
iptables -t mangle -A POSTROUTING -d 192.168.1.23 -j RETURN                        #给每条规则加入RETURN,这样效率会更高.

iptables -t mangle -A PREROUTING -s 192.168.1.22 -j MARK --set-mark 50         #ip为22的走50号上行通道
iptables -t mangle -A PREROUTING -s 192.168.1.22 -j RETURN                        #给每条规则加入RETURN,这样效率会更高.
iptables -t mangle -A PREROUTING -s 192.168.1.23 -j MARK --set-mark 35        #ip为23的走35号上行通道,我自己的IP.呵呵
iptables -t mangle -A PREROUTING -s 192.168.1.23 -j RETURN                        #给每条规则加入RETURN,这样效率会更高.

#其他的我就不写了,大家自己换IP吧,想让谁走哪个通道,就把IP改了执行,现在发发慈悲,让大家开网页的时候走我使用25和35号通道吧,当然你也可以不发慈悲
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 80 -j MARK --set-mark 35    #http的端口号80,所以dport是80,这是发起http请求的时候
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 80 -j RETURN
iptables -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j MARK --set-mark 25   #http的端口号80,所以sport是80,这是http响应回来的时候
iptables -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j RETURN

-------------------------

现在来看看如何限制TCP和UDP的连接数吧,很NB的(不知道标准版本和简化版是否支持,一下语句不保证可用,因个人路由器环境而定):
iptables -I FORWARD -p tcp -m connlimit --connlimit-above 100 -j DROP           #看到了吧,在FORWARD转发链的时候,所有tcp连接大于100 的数据包就丢弃!是针对所有IP的限制
iptables -I FORWARD -p udp -m limit --limit 5/sec -j DROP   #UDP是无法控制连接数的, 只能控制每秒多少个UDP包, 这里设置为每秒5个,5个已经不少了,10个就算很高了,这个是封杀P2P的利器,一般设置为每秒3~5个比较合理.
如何查看命令是否生效呢?:
执行  iptables -L FORWARD 就可以看到如下结果:
DROP       tcp  --  anywhere             anywhere            #conn/32 > 100 
DROP       udp  --  anywhere             anywhere            limit: avg 5/sec bu
如果出现了这2个结果,说明限制连接数的语句确实生效了, 如果没有这2个出现,则说明你的dd-wrt不支持connlimit限制连接数模块.

现在我想给自己开个后门,不受连接数的限制该怎么做呢?看下面的:
iptables -I FORWARD -s 192.168.1.23 -j RETURN          #意思是向iptables的FORWARD链的最头插入这个规则,这个规则现在成为第一个规则了,23是我的IP,就是说,只要是我的IP的就不在执行下面的连接数限制的规则语句了,利用了iptables链的执行顺序规则,我的IP被例外了.

告诉大家一个查看所有人的连接数的语句:
sed -n 's%.* src=<span id="MathJax-Span-2" class="mrow"><span id="MathJax-Span-3" class="mn">192.168.<span id="MathJax-Span-4" class="mo">[<span id="MathJax-Span-5" class="mn">0<span id="MathJax-Span-6" class="mo">−<span id="MathJax-Span-7" class="mn">9.<span id="MathJax-Span-8" class="mo">]<span id="MathJax-Span-9" class="mo">∗192.168.[0−9.]∗ .*%\1%p' /proc/net/ip_conntrack | sort | uniq -c    #执行这个就可以看到所有IP当前所占用的连接数

对于上面的脚本,有一些比较疑惑人的地方,现在来讲讲:
br0 : 这个是一个dd wrt的网桥, 这个网桥桥接了无线和有线的接口, 所以在这上面卡流量,就相当于卡了所有无线和有线的用户.具体信息可以输入ifconfig命令进行查看.
规则链顺序问题 : 在br0上iptables规则链的顺序是比较奇怪的, 正常的顺序 入站的数据包先过 PERROUTING链, 出站数据包先过POSTROUTING链,但是 dd wrt的br0网桥顺序与正常的顺序正好相反!
在ddwrt上入站的数据包被当成出站的,出站的数据包被当成入站的,所以上面的脚本会那么写.

不会不知道在哪里敲命令吧?
登陆ddwrt的web管理界面 ,管理里面, 开启SSH
用SSH CLIENT ,这里下载 : http://www.onlinedown.net/soft/20089.htm 
输入路由器IP,用户密码,登陆,开始敲吧.

重要提醒: 大家要用ue这样的编辑器来写脚本,这样的编辑器才支持unix格式,windows下的记事本是不行的,因为这2个系统的换行符不一样,unix/linux下不认

-----------------------

来源3:HTB HOME:http://luxik.cdi.cz/~devik/qos/htb/

HTB Linux queuing discipline manual - user guide
Martin Devera aka devik (devik@cdi.cz)
Manual: devik and Don Cohen
Last updated: 5.5.2002


New text is in red color. Coloring is removed on new text after 3 months. Currently they depicts HTB3 changes

1. Introduction

HTB is meant as a more understandable, intuitive and faster replacement for the CBQ qdisc in Linux. Both CBQ and HTB help you to control the use of the outbound bandwidth on a given link. Both allow you to use one physical link to simulate several slower links and to send different kinds of traffic on different simulated links. In both cases, you have to specify how to divide the physical link into simulated links and how to decide which simulated link to use for a given packet to be sent.

This document shows you how to use HTB. Most sections have examples, charts (with measured data) and discussion of particular problems.

This release of HTB should be also much more scalable. See comparison at HTB home page.

Please read: tc tool (not only HTB) uses shortcuts to denote units of rate. kbps means kilobytes and kbit means kilobits ! This is the most FAQ about tc in linux.

2. Link sharing

Problem: We have two customers, A and B, both connected to the internet via eth0. We want to allocate 60 kbps to B and 40 kbps to A. Next we want to subdivide A's bandwidth 30kbps for WWW and 10kbps for everything else. Any unused bandwidth can be used by any class which needs it (in proportion of its allocated share).

HTB ensures that the amount of service provided to each class is at least the minimum of the amount it requests and the amount assigned to it. When a class requests less than the amount assigned, the remaining (excess) bandwidth is distributed to other classes which request service.

Also see document about HTB internals - it describes goal above in greater details.

Note: In the literature this is called "borrowing" the excess bandwidth. We use that term below to conform with the literature. We mention, however, that this seems like a bad term since there is no obligation to repay the resource that was "borrowed".

The different kinds of traffic above are represented by classes in HTB. The simplest approach is shown in the picture at the right. 
Let's see what commands to use:

tc qdisc add dev eth0 root handle 1: htb default 12

This command attaches queue discipline HTB to eth0 and gives it the "handle"  1:. This is just a name or identifier with which to refer to it below. The  default 12 means that any traffic that is not otherwise classified will be assigned to class 1:12.

Note: In general (not just for HTB but for all qdiscs and classes in tc), handles are written x:y where x is an integer identifying a qdisc and y is an integer identifying a class belonging to that qdisc. The handle for a qdisc must have zero for its y value and the handle for a class must have a non-zero value for its y value. The "1:" above is treated as "1:0".

 

tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps 
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps

The first line creates a "root" class, 1:1 under the qdisc 1:. The definition of a root class is one with the htb qdisc as its parent. A root class, like other classes under an htb qdisc allows its children to borrow from each other, but one root class cannot borrow from another. We could have created the other three classes directly under the htb qdisc, but then the excess bandwidth from one would not be available to the others. In this case we do want to allow borrowing, so we have to create an extra class to serve as the root and put the classes that will carry the real data under that. These are defined by the next three lines. The ceil parameter is described below.

Note: Sometimes people ask me why they have to repeat dev eth0 when they have already used handle or parent. The reason is that handles are local to an interface, e.g., eth0 and eth1 could each have classes with handle 1:1.

We also have to describe which packets belong in which class. This is really not related to the HTB qdisc. See the tc filter documentation for details. The commands will look something like this:

tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
   match ip src 1.2.3.4 match ip dport 80 0xffff flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
   match ip src 1.2.3.4 flowid 1:11

(We identify A by its IP address which we imagine here to be 1.2.3.4.)

Note: The U32 classifier has an undocumented design bug which causes duplicate entries to be listed by "tc filter show" when you use U32 classifiers with different prio values.

You may notice that we didn't create a filter for the 1:12 class. It might be more clear to do so, but this illustrates the use of the default. Any packet not classified by the two rules above (any packet not from source address 1.2.3.4) will be put in class 1:12.

Now we can optionally attach queuing disciplines to the leaf classes. If none is specified the default is pfifo.

tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 5
tc qdisc add dev eth0 parent 1:11 handle 30: pfifo limit 5
tc qdisc add dev eth0 parent 1:12 handle 40: sfq perturb 10

That's all the commands we need. Let's see what happens if we send packets of each class at 90kbps and then stop sending packets of one class at a time. Along the bottom of the graph are annotations like "0:90k". The horizontal position at the center of the label (in this case near the 9, also marked with a red "1") indicates the time at which the rate of some traffic class changes. Before the colon is an identifier for the class (0 for class 1:10, 1 for class 1:11, 2 for class 1:12) and after the colon is the new rate starting at the time where the annotation appears. For example, the rate of class 0 is changed to 90k at time 0, 0 (= 0k) at time 3, and back to 90k at time 6.

Initially all classes generate 90kb. Since this is higher than any of the rates specified, each class is limited to its specified rate. At time 3 when we stop sending class 0 packets, the rate allocated to class 0 is reallocated to the other two classes in proportion to their allocations, 1 part class 1 to 6 parts class 2. (The increase in class 1 is hard to see because it's only 4 kbps.) Similarly at time 9 when class 1 traffic stops its bandwidth is reallocated to the other two (and the increase in class 0 is similarly hard to see.) At time 15 it's easier to see that the allocation to class 2 is divided 3 parts for class 0 to 1 part for class 1. At time 18 both class 1 and class 2 stop so class 0 gets all 90 kbps it requests.

It might be good time to touch concept of quantums now. In fact when more classes want to borrow bandwidth they are each given some number of bytes before serving other competing class. This number is called quantum. You should see that if several classes are competing for parent's bandwidth then they get it in proportion of their quantums. It is important to know that for precise operation quantums need to be as small as possible and larger than MTU. 
Normaly you don't need to specify quantums manualy as HTB chooses precomputed values. It computes classe's quantum (when you add or change it) as its rate divided by r2q global parameter. Its default value is 10 and because typical MTU is 1500 the default is good for rates from 15 kBps (120 kbit). For smaller minimal rates specify r2q 1 when creating qdisc - it is good from 12 kbit which should be enough. If you will need you can specify quantum manualy when adding or changing the class. You can avoid warnings in log if precomputed value would be bad. When you specify quantum on command line the r2q is ignored for that class.

This might seem like a good solution if A and B were not different customers. However, if A is paying for 40kbps then he would probably prefer his unused WWW bandwidth to go to his own other service rather than to B. This requirement is represented in HTB by the class hierarchy.

3. Sharing hierarchy

The problem from the previous chapter is solved by the class hierarchy in this picture. Customer A is now explicitly represented by its own class. Recall from above thatthe amount of service provided to each class is at least the minimum of the amount it requests and the amount assigned to it. This applies to htb classes that are not parents of other htb classes. We call these leaf classes. For htb classes that are parents of other htb classes, which we call interior classes, the rule is that the amount of service is at least the minumum of the amount assigned to it and the sum of the amount requested by its children. In this case we assign 40kbps to customer A. That means that if A requests less than the allocated rate for WWW, the excess will be used for A's other traffic (if there is demand for it), at least until the sum is 40kbps.

Notes: Packet classification rules can assign to inner nodes too. Then you have to attach other filter list to inner node. Finally you should reach leaf or special 1:0 class. The rate supplied for a parent should be the sum of the rates of its children.

The commands are now as follows:

tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:2 htb rate 40kbps ceil 100kbps
tc class add dev eth0 parent 1:2 classid 1:10 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:2 classid 1:11 htb rate 10kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps

We now turn to the graph showing the results of the hierarchical solution. When A's WWW traffic stops, its assigned bandwidth is reallocated to A's other traffic so that A's total bandwidth is still the assigned 40kbps.
If A were to request less than 40kbs in total then the excess would be given to B.

4. Rate ceiling

The ceil argument specifies the maximum bandwidth that a class can use. This limits how much bandwidth that class can borrow. The default ceil is the same as the rate. (That's why we had to specify it in the examples above to show borrowing.) We now change the ceil 100kbps for classes 1:2 (A) and 1:11 (A's other) from the previous chapter to ceil 60kbps and ceil 20kbps.

The graph at right differs from the previous one at time 3 (when WWW traffic stops) because A/other is limited to 20kbps. Therefore customer A gets only 20kbps in total and the unused 20kbps is allocated to B.
The second difference is at time 15 when B stops. Without the ceil, all of its bandwidth was given to A, but now A is only allowed to use 60kbps, so the remaining 40kbps goes unused.

This feature should be useful for ISPs because they probably want to limit the amount of service a given customer gets even when other customers are not requesting service. (ISPs probably want customers to pay more money for better service.) Note that root classes are not allowed to borrow, so there's really no point in specifying a ceil for them.

Notes: The ceil for a class should always be at least as high as the rate. Also, the ceil for a class should always be at least as high as the ceil of any of its children.

5. Burst

Networking hardware can only send one packet at a time and only at a hardware dependent rate. Link sharing software can only use this ability to approximate the effects of multiple links running at different (lower) speeds. Therefore the rate and ceil are not really instantaneous measures but averages over the time that it takes to send many packets. What really happens is that the traffic from one class is sent a few packets at a time at the maximum speed and then other classes are served for a while. The burst and cburst parameters control the amount of data that can be sent at the maximum (hardware) speed without trying to serve another class.

If cburst is smaller (ideally one packet size) it shapes bursts to not exceed ceil rate in the same way as TBF's peakrate does.

When you set burst for parent class smaller than for some child then you should expect the parent class to get stuck sometimes (because child will drain more than parent can handle). HTB will remember these negative bursts up to 1 minute.

You can ask why I want bursts. Well it is cheap and simple way how to improve response times on congested link. For example www traffic is bursty. You ask for page, get it in burst and then read it. During that idle period burst will "charge" again.

Note: The burst and cburst of a class should always be at least as high as that of any of it children.

On graph you can see case from previous chapter where I changed burst for red and yellow (agency A) class to 20kb but cburst remained default (cca 2 kb).
Green hill is at time 13 due to burst setting on SMTP class. A class. It has underlimit since time 9 and accumulated 20 kb of burst. The hill is high up to 20 kbps (limited by ceil because it has cburst near packet size).
Clever reader can think why there is not red and yellow hill at time 7. It is because yellow is already at ceil limit so it has no space for furtner bursts.
There is at least one unwanted artifact - magenta crater at time 4. It is because I intentionaly "forgot" to add burst to root link (1:1) class. It remembered hill from time 1 and when at time 4 blue class wanted to borrow yellow's rate it denied it and compensated itself.

Limitation: when you operate with high rates on computer with low resolution timer you need some minimal burst and cburst to be set for all classes. Timer resolution on i386 systems is 10ms and 1ms on Alphas. The minimal burst can be computed as max_rate*timer_resolution. So that for 10Mbit on plain i386 you needs burst 12kb.

If you set too small burst you will encounter smaller rate than you set. Latest tc tool will compute and set the smallest possible burst when it is not specified.

6. Priorizing bandwidth share

Priorizing traffic has two sides. First it affects how the excess bandwidth is distributed among siblings. Up to now we have seen that excess bandwidth was distibuted according to rate ratios. Now I used basic configuration from chapter 3 (hierarchy without ceiling and bursts) and changed priority of all classes to 1 except SMTP (green) which I set to 0 (higher).
From sharing view you see that the class got all the excess bandwidth. The rule is that classes with higher priority are offered excess bandwidth first. But rules about guaranted rate and ceil are still met.

There is also second face of problem. It is total delay of packet. It is relatively hard to measure on ethernet which is too fast (delay is so neligible). But there is simple help. We can add simple HTB with one class rate limiting to less then 100 kbps and add second HTB (the one we are measuring) as child. Then we can simulate slower link with larger delays.
For simplicity sake I use simple two class scenario:

# qdisc for delay simulation
tc qdisc add dev eth0 root handle 100: htb
tc class add dev eth0 parent 100: classid 100:1 htb rate 90kbps

# real measured qdisc
tc qdisc add dev eth0 parent 100:1 handle 1: htb
AC="tc class add dev eth0 parent"
$AC 1: classid 1:1 htb rate 100kbps
$AC 1:2 classid 1:10 htb rate 50kbps ceil 100kbps prio 1
$AC 1:2 classid 1:11 htb rate 50kbps ceil 100kbps prio 1
tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 2
tc qdisc add dev eth0 parent 1:11 handle 21: pfifo limit 2

Note: HTB as child of another HTB is NOT the same as class under another class within the same HTB. It is because when class in HTB can send it will send as soon as hardware equipment can. So that delay of underlimit class is limited only by equipment and not by ancestors.
In HTB under HTB case the outer HTB simulates new hardware equipment with all consequences (larger delay)

Simulator is set to generate 50 kbps for both classes and at time 3s it executes command:

tc class change dev eth0 parent 1:2 classid 1:10 htb \
 rate 50kbps ceil 100kbps burst 2k prio 0

As you see the delay of WWW class dropped nearly to the zero while SMTP's delay increased. When you priorize to get better delay it always makes other class delays worse.
Later (time 7s) the simulator starts to generate WWW at 60 kbps and SMTP at 40 kbps. There you can observe next interesting behaviour. When class is overlimit (WWW) then HTB priorizes underlimit part of bandwidth first.

What class should you priorize ? Generaly those classes where you really need low delays. The example could be video or audio traffic (and you will really need to use correct rate here to prevent traffic to kill other ones) or interactive (telnet, SSH) traffic which is bursty in nature and will not negatively affect other flows.
Common trick is to priorize ICMP to get nice ping delays even on fully utilized links (but from technical point of view it is not what you want when measuring connectivity).

7. Understanding statistics

The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats.
First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.

# tc -s -d qdisc show dev eth0
 qdisc pfifo 22: limit 5p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc pfifo 21: limit 5p
 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) 

 qdisc pfifo 20: limit 5p
 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) 

 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) 

First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory.
overlimits tells you how many times the discipline delayed a packet.  direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats:

tc -s -d class show dev eth0
class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 10240 level 3 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 6872 borrowed: 0 giants: 0

class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 4096 level 2 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 1017 borrowed: 6872 giants: 0

class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 2867 level 0 
 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) 
 rate 14635bps 29pps 
 lended: 2939 borrowed: 1599 giants: 0

I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are  level and DRR  quantum informations.
overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only).
rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating.
lended is # of packets donated by this class (from its  rate) and  borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).
giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).

8. Making, debugging and sending error reports

If you have kernel 2.4.20 or newer you don't need to patch it - all is in vanilla tarball. The only thing you need is tc tool. Download HTB 3.6 tarball and use tc from it.

You have to patch to make it work with older kernels. Download kernel source and use patch -p1 -i htb3_2.X.X.diff to apply the patch. Then use make menuconfig;make bzImage as before. Don't forget to enable QoS and HTB.
Also you will have to use patched tc tool. The patch is also in downloads or you can download precompiled binary.

If you think that you found an error I will appreciate error report. For oopses I need ksymoops output. For weird qdisc behaviour add parameter debug 3333333 to your tc qdisc add .... htb. It will log many megabytes to syslog facility kern level debug. You will probably want to add line like:
kern.debug -/var/log/debug
to your /etc/syslog.conf. Then bzip and send me the log via email (up to 10MB after bzipping) along with description of problem and its time.

 

==================

一、相关概念

报文分组从输入网卡(入口)接收进来,经过路由的查找, 以确定是发给本机的,还是需要转发的。如果是发给本机的,就直接向上递交给上层的协议,比如TCP,如果是转发的, 则会从输出网卡(出口)发出。网络流量的控制通常发生在输出网卡处。虽然在路由器的入口处也可以进行流量控制,Linux也具有相关的功能, 但一般说来, 由于我们无法控制自己网络之外的设备, 入口处的流量控制相对较难。本文将集中介绍出口处的流量控制。流量控制的一个基本概念是队列(Qdisc),每个网卡都与一个队列(Qdisc)相联系, 每当内核需要将报文分组从网卡发送出去, 都会首先将该报文分组添加到该网卡所配置的队列中, 由该队列决定报文分组的发送顺序。
因此可以说,所有的流量控制都发生在队列中,详细流程图见下图。


有些队列的功能是非常简单的, 它们对报文分组实行先来先走的策略。有些队列则功能复杂,会将不同的报文分组进行排队、分类,并根据不同的原则, 以不同的顺序发送队列中的报文分组。为实现这样的功能,这些复杂的队列需要使用不同的过滤器(Filter)来把报文分组分成不同的类别(Class)。这里把这些复杂的队列称为可分类(ClassfuI)的队列。通常, 要实现功能强大的流量控制, 可分类的队列是必不可少的。因此,类别(class)和过滤器(Filter)也是流量控制的另外两个重要的基本概念。图2所示的是一个可分类队列的例子。

 

 

二、端口限流(TC和IPTABLES)

在Linux中,流量控制都是通过TC这个工具来完成的,但是如果对端口限流需要使用IPTABLES对端口绑定TC队列。通常, 要对网卡端口进行流量控制的配置,需要进行如下的步骤:

  • 为网卡配置一个队列;
  • 在该队列上建立分类;
  • 根据需要建立子队列和子分类;
  • 为每个分类建立过滤器;
  • 使用iptables对端口绑定tc队列。

接下来具体为80,22端口设置限流:

1.使用命令ifconfig查看服务器上的网卡信息,比如网卡eth0是用来对外的网络,也就是用户通过该网卡连接到系统,那么我们就对这个网卡进行带宽限制。

ifconfig

2.建立eth0队列

tc qdisc add dev eth0 root handle 1: htb default 20
命令解释:
add :表示要添加,
dev eth0 :表示要操作的网卡为eth0,
root :表示为网卡eth0添加的是一个根队列,
handle 1: 表示队列的句柄为1:,
htb :表示要添加的队列为HTB队列。
default 20: 是htb特有的队列参数,意思是所有未分类的流量都将分配给类别1:20。

3.建立根分类

tc class add dev eth0 parent 1:0 classid 1:1 htb rate 3Mbit
命令解释:
在队列1:0上创建根分类1:1 限速,类别htb,限速3Mbit。
rate 3Mbit:表示系统将为该类别确保3Mbit的带宽。

4.创建分类

tc class add dev eth0 parent 1:1 classid 1:11 htb rate 2Mbit ceil 3Mbit
命令解释:
以根分类1:1为父类创建分类1:11 ,限速 2Mbit ~3Mbit(htb可借用其它类带宽)。
ceil 3Mbit:表示该类别的最高可占用带宽为3mbit。

5.创建过滤器并制定handle

tc filter add dev eth0 parent 1:0 prio 1 protocol ip handle 11 fw flowid 1:11
prio 1:您可以设置额外的带宽优先级,prio数值越低优先级越高。
protocol ip:表示该过滤器应该检查报文分组的协议字段。
flowid 1:11: 表示将把该数据流分配给类别1:11。

6.使用iptable对端口绑定tc队列

iptables -A OUTPUT -t mangle -p tcp --sport 80 -j MARK --set-mark 11
iptables -A OUTPUT -t mangle -p tcp --sport 22 -j MARK --set-mark 11

三、查看和删除规则

1.列出TC队列

tc -s qdisc ls dev eth0

2.删除TC队列

tc qdisc del dev eth0 root

3.删除iptables的mangle表

单纯的删除TC队列,在下次再次设置的时候,原有的mangle表规则仍会生效,所以如果想重新设置整个限流规则应该同时删除TC队列和mangle表。

列出mangle规则并分配编号:
iptables -L -t mangle --line-numbers
根据编号删除规则(x为规则编号):
iptables -t mangle -D POSTROUTING x

参考文章:

Linux tc QOS 详解
Linux上的TC流量控制几个例子(80端口流量限制)
LINUX中IPTABLES和TC对端口的带宽限制 端口限速

 

==================== End

 

posted @   lsgxeva  阅读(4197)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
历史上的今天:
2018-02-09 如何在CentOS7上改变网络接口名
点击右上角即可分享
微信分享提示