基于 UDP 的数据传输协议
UDT: UDP-based Data Transfer Protocol
UDT: UDP-based Data Transfer Protocol
draft-gg-udt-03
UDT: 基于 UDP 的数据传输协议(初译)
(译者:Jack)
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time.It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 15, 2010.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
Abstract
This document describes UDT, or the UDP based Data Transfer protocol. UDT is designed to be an alternative data transfer protocol for the situations when TCP does not work well. One of the most common cases, and also the original motivation of UDT, is to overcome TCP's inefficiency in high bandwidth-delay product (BDP) networks. Another important target use scenario is to allow networking researchers, students, and application developers to easily implement and deploy new data transfer algorithms and protocols. Furthermore, UDT can also be used to better support firewall traversing.
UDT is completely built on top of UDP. However, UDT is connection oriented, unicast, and duplex. It supports both reliable data streaming and partial reliable messaging. The congestion control module is an open framework that can be used to implement and/or deploy different control algorithms. UDT also has a native/default control algorithm based on AIMD rate control.
Table of Contents
1. Introduction...................................................4 2. Packet Structures..............................................5 3. UDP Multiplexer................................................8 4. Timers.........................................................8 5. Connection Setup and shutdown..................................9 5.1 Client/Server Connection Setup............................10 5.2 Rendezvous Connection Setup...............................10 5.3 Shutdown..................................................11 6. Data Sending and Receiving....................................11 6.1 The Sender's Algorithm....................................11 6.2 The Receiver's Algorithm..................................12 6.3 Flow Control..............................................15 6.4 Loss Information Compression Scheme.......................15 7. Configurable Congestion Control (CCC).........................15 7.1 CCC Interface.............................................15 7.2 UDT's Native Control Algorithm............................16 Security Considerations..........................................18 Normative References.............................................18 Informative References...........................................18 Author's Addresses...............................................19
本文状态:
这个草案已提交给 IETF,完全符合 BCP 78 和 BCP 79 文档。
IETF 和其它工作组成员都可能发布 Internet 草案。 一般 Internet 草案文档一般在超过 6 个月将可能被更新, 或者替换, 或者任何时候都可能被废除。
当前Internet草案信息能在下面站访问:
http://www.ietf.org/download/id-abstract.txt
Internet草案文档能在下面站访问:
http://www.ietf.org/shadow.html
这份文档将在 2010 年 10 月 15 日到期。
著作权
版权归属IETF和文档作者。。。
摘要
本文档介绍 UDT (基于UDP的数据传输协议)。UDT 是设计用来替代在使用 TCP 时的情况并不好时的数据传输协议。其中最常见的情况下,也是UDT动机,就是要克服TCP的在高带宽时网络延时。另一种目标是让网络研究人员,学生,以及应用开发商能够轻松地实施和部署新的数据传输算法和协议。此外,UDT也可以可用于更好地支持防火墙穿越。
UDT是完全建立在UDP的上面。然而,UDT是面向连接,单播,和全双工。它同时支持可靠的数据流和部分可靠的消息传递。拥塞控制模块是一个开放的框架,可用于执行或部署不同的控制算法。UDT也有默认基于AIMD控制算法。
目录:
1. 简介 4 2. 数据包结构 6 3. UDP 多路复用 11 4. 定时器 12 5. 建立连接和关闭 13 6. 数据发送和接收 15 7. 可配置的拥塞控制 22
1. Introduction 简介
The Transmission Control Protocol (TCP) [RFC5681] has been very successful and greatly contributes to the popularity of today's Internet. Today TCP still contributes the majority of the traffic on the Internet.
However, TCP is not perfect and it is not designed for every specific applications. In the last several years, with the rapid advance of optical networks and rich Internet applications, TCP has been found inefficient as the network bandwidth-delay product (BDP) increases. Its AIMD (additive increase multiplicative decrease) algorithm reduces the TCP congestion window drastically but fails to recover it to the available bandwidth quickly. Theoretical flow level analysis has shown that TCP becomes more vulnerable to packet loss as the BDP increases higher [LM97].
To overcome the TCP's inefficiency problem over high speed wide area networks is the original motivation of UDT. Although there are new TCP variants deployed today (for example, BiC TCP [XHR04] on Linux and Compound TCP [TS06] on Windows), certain problems still exist. For example, none of the new TCP variants address RTT unfairness, the situation that connections with shorter RTT consume more bandwidth.
Moreover, as the Internet continues to evolve, new challenges and requirements to the transport protocol will always emerge. Researchers need a platform to rapidly develop and test new algorithms and protocols. Network researchers and students can use UDT to easily implement their ideas on transport protocols, in particular congestion control algorithms, and conduct experiments over real networks.
Finally, there are other situations when UDT can be found more helpful than TCP. For example, UDP-based protocol is usually easier for punching NAT firewalls. For another example, TCP's congestion control and reliability control is not desirable in certain applications of VOIP, wireless communication, etc. Application developers can use (with or without modification) UDT to suit their requirements.
Due to all those reasons and motivations described above, we believe that it is necessary to design a well defined and developed UDP-based data transfer protocol.As its name suggest, UDT is built solely on the top of UDP [RFC768]. Both data and control packets are transferred using UDP. UDT is connection-oriented in order to easily maintain congestion control, reliability, and security. It is a unicast protocol while multicast is not considered here. Finally, data can be transferred over UDT in duplex.
UDT supports both reliable data streaming and partial reliable messaging. The data streaming semantics is similar to that of TCP, while the messaging semantics can be regarded as a subset of SCTP [RFC4960].
This document defines UDT's protocol specification. The detailed description and performance analysis can be found in [GG07],and a fully functional reference implementation can be found at [UDT].
传输控制协议(TCP)[RFC5681]已经非常成功,大大促进了今天的互联网的普及。TCP在现在互联网上仍然做为主要的通信协议。
但是,TCP是不完美的,它不是为每个特定应用而设计。在过去的几年里,随着光纤网络和丰富的互联网应用的快速推进,发现随着网络带宽延迟成倍的增涨,TCP变得效率低下。它的AIMD(additive increase multiplicative decrease)的TCP算法降低拥塞窗口,但不能快速恢复到可用带宽。理论上的流量分析表明TCP在BDP [LM97]增涨到很高的时候,更加容易丢失包。
为了克服以上的高速广域网上TCP的效率低下问题。UDT就是以此作为动机的。虽然有新的TCP方案(例如:Linux 上的BiC TCP [XHR04]和Windows 上的Compound TCP [TS06]),但仍有一些问题存在。例如,新的TCP存在RTT不公平性,有可能导致连接占用更多的带宽。
另外,随着互联网的不断发展,新的传输协议制定将不断出现。研究人员需要一个平台,以迅速开发和测试新的算法和协议。网络研究人员和学生可以方便地使用UDT的传输协议的实施,特别是他们的想法拥塞控制算法,并在实际网络中进行实验。
最后,可以找到很多其它需要UDT辅助TCP的情形。例如,基于UDP协议的NAT防火墙穿透。又例如,VoIP不能控制TCP的拥塞控制和可靠性,无线通信等应用程序开发人员可以使用某些应用理想(或不经修改)UDT的,以适应他们的需要。
由于如上所述的这些原因和动机,我们认为有必要设计一个基于UDP的数据传输协议。正如其名称所示,UDT是单纯的建立在UDP [RFC768]之上。数据包和控制数据包这两者传输都使用UDP传输。 UDT是面向连接,以便轻松维护拥塞控制的可靠性和安全性。它是一个单播协议,而多播并没有作考虑。最后,UDT传输数据是以全双工进行的。
UDT的同时支持可靠的数据流和可靠的消息传递。数据流语义上类同TCP,虽然消息语义可以作为SCTP协议[RFC4960]的子集一样看。
本文档定义了UDT的协议规范,详细的描述和性能分析可以在[GG07]文档中找到,一个完整功能的参考实现可以在udt源码中找到。
2. Packet Structures 数据包结构
UDT has two kinds of packets: the data packets and the control packets. They are distinguished by the 1st bit (flag bit) of the packet header.
The data packet header structure is as following.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| Packet Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |FF |O| Message Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Socket ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The data packet header starts with 0. Packet sequence number uses the following 31 bits after the flag bit. UDT uses packet based sequencing, i.e., the sequence number is increased by 1 for each sent data packet in the order of packet sending. Sequence number is wrapped after it is increased to the maximum number (2^31 - 1). The next 32-bit field in the header is for the messaging. The first two bits "FF" flags the position of the packet is a message. "10" is the first packet, "01" is the last one, "11" is the only packet, and "00" is any packets in the middle. The third bit "O" means if the message should be delivered in order (1) or not (0). A message to be delivered in order requires that all previous messages must be either delivered or dropped. The rest 29 bits is the message number, similar to packet sequence number (but independent). A UDT message may contain multiple UDT packets.
Following are the 32-bit time stamp when the packet is sent and the destination socket ID. The time stamp is a relative value starting from the time when the connection is set up. The time stamp information is not required by UDT or its native control algorithm.
It is included only in case that a user defined control algorithm may require the information (See Section 6).
The Destination ID is used for UDP multiplexer. Multiple UDT socket can be bound on the same UDP port and this UDT socket ID is used to differentiate the UDT connections.
If the flag bit of a UDT packet is 1, then it is a control packet and parsed according to the following structure.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Additional Info | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Socket ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ Control Information Field ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
There are 8 types of control packets in UDT and the type information is put in bit field 1 - 15 of the header. The contents of the following fields depend on the packet type. The first 128 bits must exist in the packet header, whereas there may be an empty control information field, depending on the packet type.
Particularly, UDT uses sub-sequencing for ACK packet. Each ACK packet is assigned a unique increasing 16-bit sequence number, which is independent of the data packet sequence number. The ACK sequence number uses bits 32 - 63 ("Additional Info") in the control packet header. The ACK sequence number ranges from 0 to (2^31 - 1).
UDT的有两种类型的数据包:数据包和控制包。他们的区别是第一位(标志位的报头)。
数据包结构如下图所示:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| 包序号 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |FF |O| 消息编号 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 时间戳 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 目标套接字ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
数据包头始于0。包序列号在数据包标志位后的31位。 UDT数据包基于序列号,即按照每个数据包序列号加1的顺序发送数据包。序列号在封装到数据包后将递增,最大取值是(2 ^ 31 - 1)。(译者注:重传的数据包不导致序列号增加)
接下来的数据包头的32位用于传递信息。开始2位为“FF”标记的是数据包的位置的消息。 “10”是第一个数据包,“01”是最后一个,“11”是唯一的数据包,“00”是在中间的任何数据包。第三位“0”意味着如果该消息应传输顺序(1)否(0)。如果为1,则将必须要求之前所有消息都将传输完成或丢弃。其余29位是消息编号,类似包的序列号(但不相干)。一个UDT消息可能包含多个UDT的数据包。
再以下是32位的时间戳和数据包发送给目标的UDT套接字ID。时间戳是一个从连接时设置的一个相对值。时间戳信息不需依靠UDT或控制算法。这个可能只是包括在用户自定义控制算法的情况下可能需要的信息(见第6条)。
该目标套接字ID是用于UDP的多路通信。 多个UDT套接字可以绑定在同一个UDP端口,UDT的套接字ID是用来区分UDT的连接。
如果一个UDT包标志位为1,那么它是一个控制数据包,并且根据以下解析结构。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| 类型 | 保留 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 附加信息 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 时间戳 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 目标套接字ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ 控制信息字段 ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
UDT控制包中中共有8种类型,类型的信息将在位置1 – 15位中。以下字段的内容取决于数据包类型。包头从开始128位必须存在,但根据数据包类型,控制信息字段则有可能为空。
特别是,UDT的ACK数据包使用子序列号。每个ACK数据包分配一个独一无二的16位递增序列号,这个序列号与数据包序列号无关。ACK包序列号的在32 – 63位(在控制数据包标识“Additional Info”的位置)。ACK序列号取值范围从0到(2 ^ 31 - 1)。
TYPE 0x0: Protocol Connection Handshake Additional Info: Undefined Control Info: 1) 32 bits: UDT version 2) 32 bits: Socket Type (STREAM or DGRAM) 3) 32 bits: initial packet sequence number 4) 32 bits: maximum packet size (including UDP/IP headers) 5) 32 bits: maximum flow window size 6) 32 bits: connection type (regular or rendezvous) 7) 32 bits: socket ID 8) 32 bits: SYN cookie 9) 128 bits: the IP address of the peer's UDP socket TYPE 0x1: Keep-alive Additional Info: Undefined Control Info: None TYPE 0x2: Acknowledgement (ACK) Additional Info: ACK sequence number Control Info: 1) 32 bits: The packet sequence number to which all the previous packets have been received (excluding) [The following fields are optional] 2) 32 bits: RTT (in microseconds) 3) 32 bits: RTT variance 4) 32 bits: Available buffer size (in bytes) 5) 32 bits: Packets receiving rate (in number of packets per second) 6) 32 bits: Estimated link capacity (in number of packets per second) TYPE 0x3: Negative Acknowledgement (NAK) Additional Info: Undefined Control Info: 1) 32 bits integer array of compressed loss information (see section 3.9). TYPE 0x4: Unused TYPE 0x5: Shutdown Additional Info: Undefined Control Info: None TYPE 0x6: Acknowledgement of Acknowledgement (ACK2) Additional Info: ACK sequence number Control Info: None TYPE 0x7: Message Drop Request: Additional Info: Message ID Control Info: 1) 32 bits: First sequence number in the message 2) 32 bits: Last sequence number in the message TYPE 0x7FFF: Explained by bits 16 - 31, reserved for user defined Control Packet
Finally, Time Stamp and Destination Socket ID also exist in the control packets.
TYPE 0x0:连接握手协议 附加信息(Additional Info):未定义 控制信息: 1)32位:UDT的版本 2)32位:UDT的SOCKET类型(STREAM or DGRAM) 3)32位:初始序列号 4)32位:最大数据包大小(包括UDP / IP的报头) 5)32位:最大流量窗口大小 6)32位:连接类型(regular 或 rendezvous) 7)32位:套接字ID 8)32位:SYN Cookie 9)128位:UDP套接字的IP地址 TYPE 0x1:保持存活 附加信息(Additional Info):未定义 控制方式:无 TYPE 0x2:应答(ACK) 附加信息(Additional Info):ACK序列号 控制信息: 1)32位:该数据包的序列号,而不含所有的以前已收到的数据包(不含) [以下字段是可选] 2)32位:RTT(微秒) 3)32位:RTTVar 4)32位:可用缓冲区的大小(字节) 5)32位:数据包接收速率(每秒接收数据包个数) 6)32位:链路容量估值(每秒接收数据包个数) TYPE 0x3:确认应答(NAK) 附加信息(Additional Info):未定义 控制信息: 1)丢失信息的32位整数数组(见节3.9)。 TYPE 0x4:未使用 TYPE 0x5:关闭 附加信息(Additional Info):未定义 控制方式:无 TYPE 0x6:应答一个应答(ACK2) 附加信息(Additional Info):未定义 控制方式:无 TYPE 0x7:消息投递请求: 附加信息(Additional Info):消息ID 控制信息: 1)32位:在消息中最开始的序列号 2)32位:消息中最后的序列号 TYPE 0x7FFF:位16 - 31,用户自定义保留
最后,时间和目标套接字ID也存在于控制包。
3. UDP Multiplexer UDP 多路复用
A UDP multiplexer is used to handle concurrent UDT connections sharing the same UDP port. The multiplexer dispatch incoming UDT packets to the corresponding UDT sockets according to the destination socket ID in the packet header.
One multiplexer is used for all UDT connections bound to the same UDP port. That is, UDT sockets on different UDP port will be handled by different multiplexers.
A multiplexer maintains two queues. The sending queue includes the sockets with at least one packet scheduled for sending. The UDT sockets in the sending queue are ordered by the next packet sending time. A high performance timer is maintained by the sending queue and when it is time for the first socket in the queue to send its packet, the packet will be sent and the socket will be removed. If there are more packets for that socket to be sent, the socket will be re-inserted to the queue.
The receiving queue reads incoming packets and dispatches them to the corresponding sockets. If the destination ID is 0, the packet will be sent to the listening socket (if there is any), or to a socket that is in rendezvous connection phase. (See Section 5.)
Similar to the sending queue, the receiving queue also maintains a list of sockets waiting for incoming packets. The receiving queue scans the list to check if any timer expires for each socket every SYN (SYN = 0.01 second, defined in Section 4).
一个UDP多路复用是用于处理并发UDT的连接共享相同的UDP端口。多路复用调度传入的UDT套接字是根据在包头的目的套接字ID。
一个用于多路复用的同一个UDP端口绑定所有UDT连接。这也是,UDT套接字上的不同的UDP端口将会有不同的多路复用。
多路复用需要维护二个队列。发送队列具有至少能为套接字发送分配一个数据包。UDT的套接字发送数据包是按顺序发送。在发送队列上维护一个高性能的计时器,定时器在第一次套接字发送数据包队列时启动,数据包被发送后套接字将被删除。如果有更多该套接字发送数据包,套接字将重新插入到队列中。
接收队列读取传来的数据包,并调度这些数据包到相应的套接字。如果目标ID是0,该数据包将被发送到监听套接字(如果有),或汇集到一个连接时的套接字。 (见第5节。)
类似发送队列,接收队列也同样维护一个套接字传入等待接收数据包的列表。接收队列扫描列表,检查每一个定时器在每个套接字过期的SYN (SYN = 0.01秒,第4节定义)。
4. Timers 定时器
UDT uses four timers to trigger different periodical events. Each event has its own period and they are all independent. They use the system time as origins and should process wrapping if the system time wraps.
For a certain periodical event E in UDT, suppose the time variable is ET and its period is p. If E is set or reset at system time t0 (ET = t0), then at any time t1, (t1 - ET >= p) is the condition to check if E should be triggered.
The four timers are ACK, NAK, EXP and SND. SND is used in the sender only for rate-based packet sending (see Section 6.1), whereas the other three are used in the receiver only.
ACK is used to trigger an acknowledgement (ACK). Its period is set by the congestion control module. However, UDT will send an ACK no longer than every 0.01 second, even though the congestion control does not need timer-based ACK. Here, 0.01 second is defined as the SYN time, or synchronization time, and it affects many of the other timers used in UDT. NAK is used to trigger a negative acknowledgement (NAK). Its period is dynamically updated to 4 * RTT_+ RTTVar + SYN, where RTTVar is the variance of RTT samples.
EXP is used to trigger data packets retransmission and maintain connection status. Its period is dynamically updated to N * (4 * RTT + RTTVar + SYN), where N is the number of continuous timeouts. To avoid unnecessary timeout, a minimum threshold (e.g., 0.5 second)should be used in the implementation.
The recommended granularity of their periods is microseconds.However, accurate time keeping is not necessary, except for SND.
In the rest of this document, a name of a time variable will be used to represent the associated event, the variable itself, or the value of its period, depending on the context. For example, ACK can mean either the ACK event or the value of ACK period.
UDT使用4个定时器来触发不同的周期性事件。每个事件都有自己的时期,他们都是独立的,他们使用的系统时间作为时间源。 对于UDT的某些周期性事件E,设时间变量为ET和周期为P,如果E设置或重新设置在系统时间T0(ET=T0),然后在任一时间T1,将会检查条件(T1 – ET > = P),满足条件时事件E被触发。
四个定时器是ACK,NAK,EXP和SND。SND仅是用在发送数据包速率(见第6.1节),而另外3个定时器只用于接收。 ACK是用来触发一个确认应答(ACK)。它的周期是由拥塞控制模块设置,UDT将发送一个ACK将不超过每秒0.01秒,尽管拥塞控制模块不需要定时器ACK,0.01秒是定义SYN时间,或者同步时间,还有它会影响UDT中的其它定时器。
NAK是用于触发一个否定应答。它的周期是由4 * RTT_+ RTTVar + SYN 动态更新的,其中RTTVar是数据包的RTTVar。
EXP用于触发数据包重传和保持连接状态。它的周期是根据N * (4 * RTT + RTTVar + SYN)动态更新的,其中N是连接超时值,为了避免不必要的超时,最低下限(例如0.5秒)应根据情况而定。
其推荐的周期单位为微秒,不一定需要很精确的时间单位,除了SND。
在本文档的其它部分,一个时间变量名称将被用来代表相关的事件,变量本身,还是它的周期值,取决于上下文。例如,可能意味着,要么是ACK事件或ACK事件的周期。
5. Connection Setup and shutdown 建立连接和关闭
UDT supports two different connection setup methods, the traditional client/server mode and the rendezvous mode. In the latter mode, both UDT sockets connect to each other at (approximately) the same time.
The UDT client (in rendezvous mode, both peer are clients) sends a handshake request (type 0 control packet) to the server or the peer side. The handshake packet has the following information (suppose UDT socket A sends this handshake to B):
1) UDT version: this value is for compatibility purpose. The current version is 4.
2) Socket Type: STREAM (0) or DGRAM (1).
3) Initial Sequence Number: It is the sequence number for the first data packet that A will send out. This should be a random value.
4) Packet Size: the maximum size of a data packet (including all headers). This is usually the value of MTU.
5) Maximum Flow Window Size: This value may not be necessary; however, it is needed in the current reference implementation.
6) Connection Type. This information is used to differential the connection setup modes and request/response.
7) Socket ID. The client UDT socket ID.
8) Cookie. This is a cookie value used to avoid SYN flooding attack [RFC4987].
9) Peer IP address: B's IP address.
UDT的支持两种不同的连接方式,即传统的client/server连接模式。在后一种模式下,UDT套接字彼此在(大约)同一时间连接。
UDT的client(在rendezvous模式,两个结点都是客户端)发送一个握手请求(TYPE 0x0的控制数据包)到服务器或另一端。握手数据包包含以下数据(假设UDT套接字A发送到B的握手):
1)UDT 版本:这个值为是为了兼容而设置,当前版本为4. 2)套接字类型:STREAM (0) or DGRAM (1). 3)初始序列号:它是A将发送的第一个数据包的序列号。这应该是一个随机值。 4)数据包大小:数据包的最大大小(包括所有头的最大大小)。这是通常的MTU值。 5)最大流量窗口:这个值可能不是必需的,但是,它是需要在当前实现中。 6)连接类型:这个信息是用在不同的连接模式和请求/响应。 7)套接字ID:客户端当前的UDT套接字ID。 8)Cookie:这是一个cookie值,用于避免SYN洪水攻击,参见【RFC4987】。 9)结点IP地址:结点的IP地址。
5.1 Client/Server Connection Setup 客户端/服务端连接设置
One UDT entity starts first as the server (listener). The server accepts and processes incoming connection request, and creates new UDT socket for each new connection. A client that wants to connect to the server will send a handshake packet first. The client should keep on sending the handshake packet every constant interval until it receives a response handshake from the server or a timeout timer expires.
When the server first receives the connection request from a client, it generates a cookie value according to the client address and a secret key and sends it back to the client. The client must then send back the same cookie to the server.
The server, when receiving a handshake packet and the correct cookie, compares the packet size and maximum window size with its own values and set its own values as the smaller ones. The result values are also sent back to the client by a response handshake packet, together with the server's version and initial sequence number. The server is ready for sending/receiving data right after this step is finished.
However, it must send back response packet as long as it receives any further handshakes from the same client.
The client can start sending/receiving data once it gets a response handshake packet from the server. Further response handshake messages, if received any, should be omitted. The connection type from the client should be set to 1 and the response from the server should be set to -1. The client should also check if the response is from the server that the original request was sent to.
首先一个UDT作为实际的服务器(侦听端)。该服务器便接受并处理传入的连接请求,并为每个新的连接创建新的UDT的套接字。一个客户端要连接到该服务器必须首先发送一个握手包。客户端应该继续按照发送周期发送握手数据包,直到它接收来自服务器的响应或握手超时计时器。
当服务器第一次接收来自客户端连接请求,它生成一个根据客户的地址和密钥cookie值并将它发送回客户端。客户端必须再发送回相同的cookie到服务器。
当服务器收到一个握手包和一个正确的cookie,和自己的最大数据包大小和窗口大小值比较,并为自己设置较小的值。结果值也同样以响应握手数据包发回给客户端,并加上服务器的版本和最初的序列号。服务器在完成最后发送数据这一步之后为发送/接收数据就绪。但是,只要它接收来自同一客户端的任何进一步的握手,它必须发送回响应数据包。
一旦从服务器获得响应握手数据包,客户端便可以开始向服务器发送/接收数据。如果再接收到任何响应握手消息应忽略它。从客户端连接类型应设置为1,响应的服务器应设置为-1。客户端也应该检查响应是否是从原来的请求服务器发送来的。
5.2 Rendezvous Connection Setup Rendezvous连接设置
In this mode, both clients send a connect request to each other at the same time. The initial connection type is set to 0. Once a peer receives a connection request, it sends back a response. If the connection type is 0, then the response sends back -1; if the connection type is -1, then the response sends back -2; No response will be sent for -2 request.
The rendezvous peer does the same check on the handshake messages (version, packet size, window size, etc.) as described in Section 5.1. In addition, the peer only process the connection request from the address it has sent a connection request to. Finally, rendezvous connection should be rejected by a regular UDT server (listener).
A peer initializes the connection when it receives -1 response. The rendezvous connection setup is useful when both peers are behind firewalls. It can also provide better security and usability when a listening server is not desirable.
在这个模式中,这两个客户端同时发送一个连接请求。最初的连接类型设置为0。一旦一个结点收到连接请求,它将发回一个响应。如果连接类型是0,那么发送回响应-1;如果连接类型是-1,那么发送回响应-2;无回应将发送-2请求。
Rendezvour结点不同的是在握手消息上的检查(版本,数据包大小,窗口大小等),如5.1节所述。此外,结点只处理来自该地址的连接请求它发出了一个连接请求。最后,服务器应该丢弃这个Rendezvour的连接。
一个结点初始化连接时接收到-1的响应。Rendezvour连接设置在对于防火墙后面两个同结点比较有用。在没有服务器时,它也可以提供很好的安全性和可用性时。
5.3 Shutdown 关闭
If one of the connected UDT entities is being closed, it will send a shutdown message to the peer side. The peer side, after received this message, will also be closed. This shutdown message, delivered using UDP, is only sent once and not guaranteed to be received. If the message is not received, the peer side will be closed after 16 continuous EXP timeout (see section 3.5). However, the total timeout value should be between a minimum threshold and a maximum threshold. In our reference implementation, we use 3 seconds and 30 seconds, respectively.
如果一个存在连接UDT套接字已关闭,它将发送一个关闭消息给另一个结点。另一个结点接收到这个消息,将同样也关闭。这个关闭消息使用UDP发送,而且只发送一次,所以不保证另一端能收到。如果这个消息没有接收到,则另一端将在EXP定时器超时16次后关闭(见3.5节)。但是,这个超时值应该在最小下限和最大上限之间。在实际实现中,我们分别使用3秒和30秒。
6. Data Sending and Receiving 数据发送和接收
Each UDT entity has two logical parts: the sender and the receiver. The sender sends (and retransmits) application data according to the flow control and congestion control. The receiver receives both data packets and control packets, and sends out control packets according to the received packets and the timers. The receiver is responsible for triggering and processing all control events, including congestion control and reliability control, and their related mechanisms.
UDT always tries to pack application data into fixed size packets (the maximum packet size negotiated during connection setup), unless there is not enough data to be sent. We explained the rationale of some of the UDT data sending/receiving schemes in [GHG04b].
每个UDT实现有两个逻辑部分:发送和接收。这个发送部分(和重传)是根据应用程序数据的流量控制和拥塞控制。接收部分接收数据包和控制的数据包,并根据接收到的数据包来控制定时器来发出控制数据包。接收部分负责触发和处理所有控制事件,包括拥塞控制和可靠控制以及它们的相关机制。
除发送的数据不足时,UDT 总是将应用程序数据包打包成固定的大小的数据包(最大数据包大小协商在连接过程中确定)。解释UDT发送/接收的这些基本原理在[GHG04b]。
6.1 The Sender's Algorithm 发送算法
Data Structures and Variables:
1. Sender's Loss List: The sender's loss list is used to store the sequence numbers of the lost packets fed back by the receiver through NAK packets or inserted in a timeout event. The numbers are stored in increasing order.
Data Sending Algorithm:
1) If the sender's loss list is not empty, retransmit the first packet in the list and remove it from the list. Go to 5).
2) In messaging mode, if the packets has been the loss list for a time more than the application specified TTL (time-to-live), send a message drop request and remove all related packets from the loss list. Go to 1).
3) Wait until there is application data to be sent.
4) a. If the number of unacknowledged packets exceeds the flow/congestion window size, wait until an ACK comes. Go to 1).
b. Pack a new data packet and send it out.
5) If the sequence number of the current packet is 16n, where n is an integer, go to 2).
6) Wait (SND - t) time, where SND is the inter-packet interval updated by congestion control and t is the total time used by step 1 to step 5. Go to 1).
发送端的各算法数据结构和变量:
1)发送端丢失列表:发件端的丢失列表用于保存通过接收端接收到 NAK 数据包或插入超时事件中丢失的数据包的序列号。列表中的序列号以升序排列。
数据发送算法如下:
1)如果发件端的丢失列表不为空,重新传输列表中的第一个数据包,并从列表中删除。然后转5)。
2)在消息传递模式下,如果数据包一直是比应用程序指定的TTL(往返时间)更大,发送一个丢弃的消息的请求,和删除列表中的所有丢失相关的数据包。并前往1)。
3)等待应用程序数据被发送。
4)a.如果未答应的数据包的数目超出了流量/挤塞窗口大小,转到 1)。
b.打包一个新的数据包,并发送它。
5)如果当前包的序列号是 16n,其中 n 是一个整数,转到 2)。
6)等待时间到(SND – t),其中SND是拥塞控制在 inter-packet 间隔,t 是步骤 1 到步骤 5 所用的总时间的时间。 转到 1)。
6.2 The Receiver's Algorithm 接收算法
Data Structures and Variables:
1) Receiver's Loss List:
It is a list of tuples whose values include:
the sequence numbers of detected lost data packets, the latest feedback time of each tuple, and a parameter k that is the number of times each one has been fed back in NAK. Values are stored in the increasing order of packet sequence numbers.
2) ACK History Window:
A circular array of each sent ACK and the time it is sent out. The most recent value will overwrite the oldest one if no more free space in the array.
3) PKT History Window:
A circular array that records the arrival time of each data packet.
4) Packet Pair Window:
A circular array that records the time interval between each probing packet pair.
5) LRSN:
A variable to record the largest received data packet sequence number. LRSN is initialized to the initial sequence number minus 1.
6) ExpCount:
A variable to record number of continuous EXP time-out events.
数据结构和变量:
1)接收丢失列表:
它的元素是tuple。值包括下面内容: 检测到的丢失数据的序列号,每个元组是最新的反馈时间,这是一个参数k的次数每个反馈在NAK的序列号。列表中的序列号以升序排列。
2)ACK历史窗口: 每发送一个ACK和它发出的时间是循环数组(译者注:环形缓冲)。如果数组没有可用空间,最近的值将覆盖第一个。
3)PKT历史窗口: 一个环形数组记录每个数据包到达时间。
4)数据包对窗口: 一个环形数组记录每个探测包对之间的时间间隔。
5)LRSN: 一个变量来记录最大接收数据包的序列号。 LRSN被初始化为初始序列数减1。
6)ExpCount:
一个变量来记录Exp连续超时事件的数量。
Data Receiving Algorithm:
1) Query the system time to check if ACK, NAK, or EXP timer has expired. If there is any, process the event (as described below in this section) and reset the associated time variables. For ACK, also check the ACK packet interval.
2) Start time bounded UDP receiving. If no packet arrives, go to 1).
1) Reset the ExpCount to 1. If there is no unacknowledged data packet, or if this is an ACK or NAK control packet, reset the EXP timer.
3) Check the flag bit of the packet header. If it is a control packet, process it according to its type and go to 1).
4) If the sequence number of the current data packet is 16n + 1, where n is an integer, record the time interval between this packet and the last data packet in the Packet Pair Window.
5) Record the packet arrival time in PKT History Window.
6) a. If the sequence number of the current data packet is greater than LRSN + 1, put all the sequence numbers between (but excluding) these two values into the receiver's loss list and send them to the sender in an NAK packet.
b. If the sequence number is less than LRSN, remove it from the receiver's loss list.
7) Update LRSN. Go to 1).
数据接收算法:
1)查询系统时间来检查ACK,NAK,或EXP定时器是否超时,如果有任何处理事件和重置相关时间变量,同样检查ACK包的时间。
2)开始定时UDP接收。如果没有数据包到达,到1)。
1)将ExpCount重置为1。如果没有未确认数据包,或者如果这是一个ACK或NAK控制数据包,重置Exp定时器。
3)检查数据包的报头标志位。如果它是一个控制分组,根据其类型处理它,并转1)。
4)如果当前数据包的序列号是16n + 1,其中n是一个整数,记录当前包和最后数据包在包数据包对窗口的时间间隔。
5)记录数据包到达时间到PKT历史窗口。
6)a.如果当前的数据数据包序列号大于 LRSN + 1,把丢失列表中所有序列号之间 (但不包括) 这两个值到接收者的,并将它们以 NAK数据包发送到发送端。
b.如果序列号小于LRSN,从接收丢失清单中删除。
7)更新LRSN。转到1)。
ACK Event Processing:
1) Find the sequence number prior to which all the packets have been received by the receiver (ACK number) according to the following rule:
if the receiver's loss list is empty, the ACK number is LRSN + 1; otherwise it is the smallest sequence number in the receiver's loss list.
2) If (a) the ACK number equals to the largest ACK number ever acknowledged by ACK2, or (b) it is equal to the ACK number in the last ACK and the time interval between this two ACK packets is less than 2 RTTs, stop (do not send this ACK).
3) Assign this ACK a unique increasing ACK sequence number. Pack the ACK packet with RTT, RTT Variance, and flow window size (available receiver buffer size). If this ACK is not triggered by ACK timers,send out this ACK and stop.
4) Calculate the packet arrival speed according to the following algorithm:
Calculate the median value of the last 16 packet arrival intervals (AI) using the values stored in PKT History Window.In these 16 values, remove those either greater than AI*8 or less than AI/8. If more than 8 values are left, calculate the average of the left values AI', and the packet arrival speed is 1/AI' (number of packets per second). Otherwise, return 0.
5) Calculate the estimated link capacity according to the following algorithm:
Calculate the median value of the last 16 packet pair intervals (PI) using the values in Packet Pair Window, and the link capacity is 1/PI (number of packets per second).
6) Pack the packet arrival speed and estimated link capacity into the ACK packet and send it out.
7) Record the ACK sequence number, ACK number and the departure time of this ACK in the ACK History Window.
ACK事件处理:
1) 找到序列号之前,已由接收端 (ACK 号) 根据以下规则为接收所有数据包: 如果接收丢失列表是空,ACK编号是LRSN+1,否则为接收列表中的最小的序列号。
2) 如果(a)的ACK等于的之前最大的ACK应答ACK2值, ACK值和这两个 ACK 数据包之间的时间间隔是少于 2 RTTs (不发送此 ACK)。
3)指定这个ACK应答增加一个不重复的序列号。将RTT值打包到ACK包,RTT的变动,和流量窗口大小(可接收缓冲区大小)的ACK数据包。如果这个应答是不会触发的应答定时器,发出此ACK并停止。
4)计算包达到速度算法:
计算过去16个包的到达时间间隔(AI)使用PKT历史窗口保存这16个值,删除这16个值中大于AI*8或小于AI/8的修正,计算平均值,平均值AI'和数据包到达速度1/AI'(每秒数据包数)。否则,返回0。
5)预计链路容量算法: 计算过去的16对包之间的时间间隔(PI)的在窗口中的中间值,而链路中的容量是1/PI(每秒数据包数)
6)根据数据包到达速度计算出的带宽来打包和发送ACK 包。
7)记录ACK序列号,ACK历史窗口用于记录ACK编号和发送出时的时间。
NAK Event Processing:
Search the receiver's loss list, find out all those sequence numbers whose last feedback time is k*RTT before, where k is initialized as 2 and increased by 1 each time the number is fed back. Compress (according to section 6.4) and send these numbers back to the sender in an NAK packet.
EXP Event Processing: 1) Put all the unacknowledged packets into the sender's loss list. 2) If (ExpCount > 16) and at least 3 seconds has elapsed since that last time when ExpCount is reset to 1, or, 3 minutes has elapsed, close the UDT connection and exit. 3) If the sender's loss list is empty, send a keep-alive packet to the peer side. 4) Increase ExpCount by 1.
NAK事件处理:
搜索接收丢失表,找出所有序列号反馈时间是K*RTT之前的,这里K是初始化成2和在每次反馈时按1递增,压缩(按照6.4节)以NAK数据包发送这个数值给发送端。
EXP 事件处理: 1)将所有未应答放入发送丢失列表中。 2)如果(ExpCount > 16)且在最少3秒中将ExpCount设置为1,或经过3分钟,关闭UDT连接和退出。 3)如果发送端丢失列表为空,发送一个心跳包到这个结点。 4)自增ExpCount。 On ACK packet received: 1) Update the largest acknowledged sequence number. 2) Send back an ACK2 with the same ACK sequence number in this ACK. 3) Update RTT and RTTVar. 4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 5) Update flow window size. 6) If this is a Light ACK, stop. 7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the value carried in the ACK. 8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is the value carried in the ACK. 9) Update sender's buffer (by releasing the buffer that has been acknowledged). 10) Update sender's loss list (by removing all those that has been acknowledged). 收到ACK数据包: 1) 更新应答序列号。 2)按照ACK的序列号发回一个ACK2。 3)更新RTT和RTTVar。 4)更新ACK和NAK周期为4 * RTT + RTTVar + SYN。 5)更新流量窗口大小。 6)如果这是一个Light ACK,则停止。 7)更新包到达速率A = (A * 7 + a) / 8,其中a的值取自ACK。 8)更新预计带宽:B = (B * 7 + b) / 8,其中b的值取自ACK。 9)更新发送端缓冲(释放的应答后的缓冲)。 10)更新发送丢失列表(移除已经答应的)。 On NAK packet received: 1) Add all sequence numbers carried in the NAK into the sender's loss list. 2) Update the SND period by rate control (see section 3.6). 3) Reset the EXP time variable. 收到NAK数据 1)添加所有序列号在NAK到发送丢失列表。 2)根据速率更新SND周期(见3.6节)。 3)重置EXP时间变量。 On ACK2 packet received: 1) Locate the related ACK in the ACK History Window according to the ACK sequence number in this ACK2. 2) Update the largest ACK number ever been acknowledged. 3) Calculate new rtt according to the ACK2 arrival time and the ACK departure time, and update the RTT value as: RTT = (RTT * 7 + rtt) / 8. 4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4. 5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 接收到ACK2: 1)根据ACK2的序列号从ACK历史窗口中找出。 2)更新最大ACK编号在应答后。 3)计算新的RTT根据ACK2到达的时间和ACK发送出的时间,以及更新RTT值:RTT = (RTT * 7 + rtt)/ 8。 4)更新RTTVar:RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4。 5)更新ACK和NAK周期到 4 * RTT + RTTVar + SYN。 On message drop request received: 1) Tag all packets belong to the message in the receiver buffer so that they will not be read. 2) Remove all corresponding packets in the receiver's loss list. 接收到丢弃消息请求 1)标记所有在接收缓冲中相应的数据包不再读取。 2)移除所有相应的接收丢失列表。 On Keep-alive packet received: Do nothing. On Handshake/Shutdown packet received: See Section 5.
接收到心跳包:
不做任何事。
在握手/关闭时接收见5节。
6.3 Flow Control 流量窗口
The flow control window size is 16 initially.
On ACK packet received: The flow window size is updated to the receiver's available buffer size.
流量控制窗口大小开始为16。在接收到ACK包后:流量窗口大小更新至接收到的缓冲大小。
6.4 Loss Information Compression Scheme 丢失信息压缩方案
The loss information carried in an NAK packet is an array of 32-bit integers. If an integer in the array is a normal sequence number (1st bit is 0), it means that the packet with this sequence number is lost; if the 1st bit is 1, it means all the packets starting from (including) this number to (including) the next number in the array (whose 1st bit must be 0) are lost.
For example, the following information carried in an NAK:
0x00000002, 0x80000006, 0x0000000B, 0x0000000E
means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are lost.
量丢失信息在NAK数据包中是以一个32位的整数数组保存。如果这个数组中的整数是一个正常的序列号(第一位是 0),这表示这个数据包丢失了,如果第一位是1意味着从这个序列号开始到下一个数组中的序列号之间的数据包都丢失了(包括首尾)。
例如:
0x00000002, 0x80000006, 0x0000000B, 0x0000000E
里面包含的数据包序列号是2,6,7,8,9,10,11和14都丢失了。
7. Configurable Congestion Control (CCC)
7. 可配置的拥塞控制
The congestion control in UDT is an open framework so that user-defined control algorithm can be easily implemented and switched. Particularly, the native control algorithm is also implemented by this framework. The user-defined algorithm may redefine several control routines to read and adjust several UDT parameters. The routines will be called when certain event occurs. For example, when an ACK is received, the control algorithm may increase the congestion window size.
UDT里的拥塞控制是一个简单开放的用户自定算法义框架。另外UDT自带的控制算法也是基于这个框架。用户定义控制算法可能只需要重定义几个UDT的路由(成员函数)和参数即可。路由将在事件发生时被调用。例如,当ACK接收时,这个控制算法可能自增拥塞窗口大小。
7.1 CCC Interface CCC接口
UDT allow users to access two congestion control parameters: the congestion window size and the inter-packet sending interval. Users may adjust these two parameters to realize window-based control,rate-based control, or a hybrid approach.
In addition, the following parameters should also be exposed.
1) RTT 2) Maximum Segment/Packet Size 3) Estimated Bandwidth 4) The latest packet sequence number that has been sent so far 5) Packet arriving rate at the receiver side
UDT允许用户访问两个拥塞控制中的参数:拥塞窗口大小和inter-packet发送间隔。用户可能改变这两个参数以达到控制拥塞窗口大小和发送速率,或混合处理。
另外,下面参数应该也同样暴露。
1)RTT。 2)最大分片数据包大小。 3)带宽估值 4)最新的发送过的序列号。 5)接收数据包的到达速率。
A UDT implementation may expose additional parameters as well. This information can be used in user-defined congestion control algorithms to adjust the packet sending rate.
The following control events can be redefined via CCC (e.g., by a callback function).
1) init: when the UDT socket is connected. 2) close: when the UDT socket is closed. 3) onACK: when ACK is received. 4) onLOSS: when NACK is received. 5) onTimeout: when timeout occurs. 6) onPktSent: when a data packet is sent. 7) onPktRecv: when a data packet is received.
一个UDT的实现可能需要添加其它的参数。这信息能在用户自定义拥塞算法中校正发送数据包速率。
在CCC中下面控制事件能重定义
1)init,当UDT套接字连接上时。 2)close,当UDT套接字关闭时。 3)onACK,当接收到ACK时。 4)onLOSS,当接收到NACK时。 5)onTimeout,发生超时时。 6)onPktSent,当数据发送后。 7)onPktRecv,当数据接收后。
Users can also adjust the following parameters in the user-defined control algorithms.
1) ACK interval: An ACK may be sent every fixed number of packets. User may define this interval. If this value is -1, then it means no ACK will be sent based on packet interval.
2) ACK Timer: An ACK will also be sent every fixed time interval. This is mandatory in UDT. The maximum and default ACK time interval is SYN.
3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine this. Detailed description and discussion of UDT/CCC can be found in [GG05].
用户还可以根据下面参数调整拥塞控制算法。
1)ACK周期:每隔一个时间将可能发送一个ACK包。用户可能定义这个周期。如果它的值为-1,那么意味着没有ACK将发送基于分组间隔。
2)ACK定时器:发送一个ACK也是在固守时间间隔发送。这是UDT强制的,最大的和默认的时间间隔是SYN和ACK。
3)RTO:UDT使用4 * RTT * RTTVar计算RTO。用户可能自定义它。详细说明和讨论在[GG05]。
7.2 UDT's Native Control Algorithm UDT 默认控制算法
UDT has a native and default control algorithm, which will be used if no user-defined algorithm is implemented and configured. The native UDT algorithm should be implemented using CCC.
UDT's native algorithm is a hybrid congestion control algorithm, hence it adjusts both the congestion window size and the inter-packet interval. The native algorithm uses timer-based ACK and the ACK interval is SYN.
The initial congestion window size is 16 packets and the initial inter-packet interval is 0. The algorithm start with Slow Start phase until the first ACK or NAK arrives.
UDT 有一个默认的控制算法,该算法如果没有用户自定义算法实现和配置,UDT的CCC将使用它做为默认算法。
UDT 的默认算法是一种混合拥塞控制算法,因为它即调整拥塞窗口的大小,也控制发包时间间隔。默认算法使用ACK定时器和ACK周期是SYN。
初始化时拥塞窗口大小是16个数据包,初始化时时间间隔是0,直到第一个ACK或NAK到达算法便开始运行慢启动阶段。
On ACK packet received:
1) If the current status is in the slow start phase, set the congestion window size to the product of packet arrival rate and (RTT + SYN). Slow Start ends. Stop.
2) Set the congestion window size (CWND) to: CWND = A * (RTT + SYN) + 16.
3) The number of sent packets to be increased in the next SYN period (inc) is calculated as: if (B <= C) inc = 1/PS; else inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, 1/PS); where B is the estimated link capacity and C is the current sending speed. All are counted as packets per second. PS is the fixed size of UDT packet counted in bytes. Beta is a constant value of 0.0000015.
4) The SND period is updated as: SND = (SND * SYN) / (SND * inc + SYN). 当接收到一个ACK包:
1)如果当前状态是慢启动阶段,设置拥塞窗口大小为包到达速度和(RTT + SYN)。慢启动完成则停止。
2)设置拥塞窗口大小(CWND)为:CWND = A * (RTT + SYN) + 16。
3)发送数据包数量是递增,下一个SYN时间周期计算方法如下: If (B <= C) Inc = 1 / PS; else inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta / PS,1 / PS); 其中B是链路带宽估值,C是当前发送速率。都计算为每秒数据包个数。PS是固定的UDT包的大小是以字节为单位。Bate是一个常数0.000015。 4)SND周期更新计算方法: SND = (SND * SYN) / (SND * inc + SYN)。 These four parameters are used in rate decrease, and their initial values are in the parentheses: AvgNAKNum (1), NAKCount (1), DecCount(1), LastDecSeq (initial sequence number - 1). We define a congestion period as the period between two NAKs in which the first biggest lost packet sequence number is greater than the LastDecSeq, which is the biggest sequence number when last time the packet sending rate is decreased. AvgNAKNum is the average number of NAKs in a congestion period. NAKCount is the current number of NAKs in the current period. 这四个参数用于降低速率,他们的初始值是括号中的值: AvgNAKNum (1) NAKCount (1) DecCount(1),LastDecSeq (初始序列号为-1)。 我们定义的拥塞周期为两个NAKs的第一个最大的丢失的数据包序列号,是大于该的 LastDecSeq 数据包发送率下跌的最大序列号时最后时间之间的时间段。AvgNAKNum 拥塞周期是 NAKs 的平均数。NAKCount 是目前周期的 NAKs 当前周期。 On NAK packet received: 1) If it is in slow start phase, set inter-packet interval to 1/recvrate. Slow start ends. Stop. 2) If this NAK starts a new congestion period, increase inter-packet interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset NAKCount to 1, and compute DecRandom to a random (average distribution) number between 1 and AvgNAKNum. Update LastDecSeq. Stop. 3) If DecCount <= 5, and NAKCount == DecCount * DecRandom: a. Update SND period: SND = SND * 1.125; b. Increase DecCount by 1; c. Record the current largest sent sequence number (LastDecSeq). The native UDT control algorithm is designed for bulk data transfer over high BDP networks. [GHG04a] 接收到NAK包: 1)如果它处于慢启动阶段,设置inter-packet周期为1 / recvrate。慢启动结束则停止。 2)如果与这个 NAK 开始一个新的拥塞周期。增加 inter-packet 周期 (snd) = snd * 1.125 ; 更新 AvgNAKNum,重置 NAKCount 为 1,并计算 DecRandom为 1 和 AvgNAKNum 之间的随机 (平均分布) 数。更新 LastDecSeq。停止。 3)如果DecCount <= 5,和 NAKCount == DecCount * DecRandom: a.更新SND周期:SND = SND * 1.125; b.递增DecCount; c.记录当前最大发送的序列号(LastDecSeq) 默认UDT控制算法是专为容量数据大高速BDP网络传输设计[GHG04a]。
Security Considerations
UDT's security mechanism is similar to that of TCP. Most of TCP's approach to counter security attack should also be implemented in UDT. IANA Considerations This document has no actions for IANA.
安全考虑
UDT 的安全机制类似 TCP,TCP 大多数针对安全攻击的方案也能在 UDT 中实施。
Normative References 引用标准
[RFC768] J. Postel, User Datagram Protocol, Aug. 1980.
Informative References
[RFC4987] W. Eddy, TCP SYN Flooding Attacks and Common Mitigations.
[GG07] Yunhong Gu and Robert L. Grossman, UDT: UDP-based Data Transfer for High-Speed Wide Area Networks, Computer Networks (Elsevier). Volume 51, Issue 7. May 2007.
[GG05] Yunhong Gu and Robert L. Grossman, Supporting Configurable Congestion Control in Data Transport Services, SC 2005, Nov 12 - 18, Seattle, WA, USA.
[GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences in Design and Implementation of a High Performance Transport Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA.
[GHG04a] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, An Analysis of AIMD Algorithms with Decreasing Increases, First Workshop on Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose, CA, USA.
[LM97] T. V. Lakshman and U. Madhow, The Performance of TCP/IP for Networks with High Bandwidth-Delay Products and Random Loss, IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336- 350.
[RFC5681] Allman, M., Paxson, V. and E. Blanton, TCP Congestion Control, September 2009.
[RFC4960] R. Stewart, Ed. Stream Control Transmission Protocol. September 2007.
[TS06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, A Compound TCP Approach for High-speed and Long Distance Networks, in IEEE Infocom, April 2006, Barcelona, Spain.
[UDT] UDT: UDP-based Data Transfer, URL http://udt.sf.net.
[XHR04] Lisong Xu, Khaled Harfoush, and Injong Rhee, Binary Increase Congestion Control for Fast Long-Distance Networks, INFOCOM 2004.
Author's Addresses 作者地址
Yunhong Gu National Center for Data Mining University of Illinois at Chicago 713 SEO, M/C 249, 851 S Morgan St Chicago, IL 60607, USA Phone: +1 (312) 413-9576 Email: yunhong@lac.uic.edu
译者注: 水平有限, 译错之处在所难免, 欢迎指出.