Multipath TCP: an overview


https://lwn.net/Articles/544399/


By Jonathan Corbet
March 26, 2013


The world was a simpler place when the TCP/IP network protocol suite wasfirst designed. The net was slow and primitive and it was often a triumphto get a connection to a far-away host at all. The machines at either endof a TCP session normally did not have to concern themselves with how thatconnection was made; such details were left to routers. As a result, TCPis built around the notion of a (single) connection between two hosts. TheMultipath TCP (MPTCP) project looksto change that view of networking by adding support for multiple transportpaths to the endpoints; it offers a lot of benefits, but designing adeployable protocol for today's Internet is surprisingly hard.

Things have gotten rather more complicated in the years since TCP was firstdeployed. Connections to multiple networks, once the province of large serversystems, are now ubiquitous; a smartphone, for example, can have separate,simultaneous interfaces to a cellular network, a WiFi network, and,possibly, other networks via Bluetooth or USB ports. Each of those networksprovides a possible way to reach a remote host, but any givenTCP session will use only one of them. That leads to obvious policyconsiderations (which interface should be used when) and operationaldifficulties: most handset users are familiar with how a WiFi-based TCPsession will be broken if the device moves out of range of the accesspoint, for example.

What if a TCP session could make use of all of the available paths betweenthe two endpoints at any given time? There would be performanceimprovements, since each of the paths could carry data in parallel, andcongested paths could be avoided in favor of faster paths at any giventime. Sessions could also be more robust. Imagine a video stream that isestablished over both WiFi and cellular networks; if the watcher leaves thehouse (one hopes somebody else is driving), the stream would shifttransparently to the cellular connection without interruption. Datacenters, where multiple paths between systems and variable congestion areboth common, could also make use of a multipath-capable transport protocol.

The problem is that TCP does not work that way. Enter MPTCP, whichis designed to work that way.

How it works

A TCP session is normally set up by way of a three-way handshake. Theinitiating host sends a packet with the SYN flag set, the receiving host,if it is amenable to the connection, responds with a packet containing boththe SYN and ACK flags. The final ACK packet sent by the initiator putsthe connection into the "established" state; after that, data can betransferred in either direction.

An MPTCP session starts in the same way, with one change: the initiatoradds the new MP_CAPABLE option to the SYN packet. If the receiving hostsupports MPTCP, it will add that option to its SYN-ACK reply; the two hostswill also include cryptographic keys in these packets for later use. Thefinal ACK (which must also carry the MP_CAPABLE option) establishes amultipath session, albeit a session using a single path just liketraditional TCP.

When MPTCP is in use, both sides recognize a distinction between thesession itself and any specific "subflow" used by that session. So, atany point, either party to the session can initiate another TCP connectionto the other side, with the proviso that the address and/or port at one end or theother of the connection must differ. So, if a smartphone has initiated anMPTCP connection to a server using its WiFi interface:

[Cheesy diagram]

It can add anothersubflow at any time by connecting to the same server by way of its cellularinterface:

[Cheesy diagram]

That subflow is added by sending a SYN packet with the MP_JOIN option; italso includes information on which MPTCP session is to be joined. Needlessto say, the protocol designers are concerned that a hostile party might tryto join somebody else's session; the previously-exchanged cryptographickeys are used to prevent such attacks from succeeding. If the receivingserver is amenable to adding the subflow, it will allow the establishmentof the new TCP connection and add it to the MPTCP session.

Once a session has more than one subflow, it is up to the systems on eachend to decide how to split traffic between them (though it is possible tomark a specific subflow for use only when any others no longer work). Asingle receive window applies to the session as a whole. Each subflowlooks like a normal TCP connection, with its own sequence numbers, but thesession as a whole has a separate sequence number; there is another TCPoption (DSS, or "Data Sequence Signal") which is used to inform the otherend how data on each subflow fits into the overall stream.

Subflows can come and go over the life of an MPTCP connection. They can beexplicitly closed by either end, or they can simply vanish if one of thepaths becomes unavailable. If the underlying machinery is working well,applications should not even notice these changes. Just like IP can hiderouting changes, MPTCP can hide the details of which paths it is using atany given time. It should, from an application's point of view, just work.

Needless to say, there are vast numbers of details that have been glossedover here. Making a protocol extension like this work requires thinkingabout issues like congestion control, how to manage retransmissions over adifferent path, how one party can tell the other about additional addresses(paths) it could use, how to decide when setting up multiple subflows isworth the expense, and so on. The MPTCP designers have done much of that thinking; seeRFC 6824 for the details.

The dreaded middlebox

One set of details merits a closer look, though. The designers of MPTCPare not interested in going through an idle academic exercise; they want tocreate a solution to real problems that will be deployed on the existingInternet. And that means designing something that will function with thenet as it exists now. At one level, that means making things worktransparently for TCP-based applications. But there isan entire section inthe RFC that is concerned with "middleboxes" and how they can sabotageany attempt to introduce a new protocol.

Middleboxes are routers that impose some sort of constraint ortransformation on network traffic passing through them. Network addresstranslation (NAT) boxes are one example: they hide an entire network behinda translation layer that will change the address and port of a connectionon its way through. NAT boxes can also insert data into a stream — addingcommands to make FTP work, for example. Some boxes will acknowledge dataon its way through, well before it arrives at the real destination, in anattempt to increase pipelining. Some routers will drop packets withunknown options; that behavior made the rollout of the selectiveacknowledgment (SACK) feature much harder than it needed to be. Firewallswill kill connections with holes in the sequence number stream; they willalso, sometimes, transform sequence numbers on the way through. Splittingand coalescing of segments can cause options to be dropped or duplicated.And so on; the list of potential problems is impressive.

On top of that, anybody trying to introduce an entirely new transport-layeris likely to discover that it will not make it across the Internet at all.Much of the routing infrastructure on the net assumes that TCP and UDP areall there is; anything else has a poor chance of making it through.

Working around these issues drove the design of MPTCP at all levels. TCPwas never designed for multiple subflows; rather than bolting that ideaonto the protocol, it might well have been better to start over. One could have incorporated the lessons learned from TCP in all ways — includingdoing things entirely differently where it made sense. But the resultingprotocol would not work on today's Internet, so the designers had no choicebut to create a protocol that, to almost every middlebox out there, lookslike plain old TCP.

So every subflow is an independent TCP connection in every respect. Sinceholes in sequence numbers can cause problems, each subflow has its ownsequence and a mapping layer must be added on top. That mapping layer usesrelative sequence numbers because some middlebox may have changed thosenumbers as they passed through. The two sides assign "address identifiers"to the IP addresses of their interfaces and use those identifiers tocommunicate about those interfaces, since the addresses themselves may bechanged by a NAT box in the middle. When one side tells the other about anavailable interface, it adds an "address identifier" to be used in futuremessages because a NAT box might change the visible address of thatinterface. Special checks exist for subflows that corrupt data, insertpreemptive acknowledgments, or strip unknown options; such subflows willnot be used. And the whole thing is designed to fall back gracefully toordinary TCP if the interference is too strong to overcome.

It is all a clever bit of design on the part of the MPTCP developers, butit also highlights an area of concern: the "dumb" Internet with end-to-endtransparent routing of data is a thing of the distant past. What we havenow is inflexible and somewhat hostile to the deployment of new technologies. TheMPTCP developers have been able to work around these limitations, but theeffort required was considerable. In the future, we may find that the netis broken in fundamental ways and it simply cannot be fixed; some might saythat the difficulties in moving to IPv6 show that this has alreadyhappened.

Future directions

The current MPTCP code can be found at the MPTCP githubrepository; it adds a good 10,000 lines to the mainline kernel'snetworking subtree. While it has apparently been the subject ofdiscussions with various networking developers, it has not, yet,been posted for public review or inclusion into the mainline. It does,however, seem to work: the MPTCP developers claim to have implementedthe fastest TCPconnection ever by transmitting at a rate of 51.8Gb/s over six 10Gblinks.

MPTCP is still relatively young, so there is almost certainly quite a bitof work yet to be done before it is ready for mainline merging orproduction use. There is also some thinking to be done on the applicationside; it may be possible for MPTCP-aware applications to make better use ofthe available paths. Projects like this are arguably never finished (we arestill refining TCP, after all), but MPTCP does seem to have reached thepoint where more users may want to start experimenting with it.

Anybody wanting to play with this code can grab the project's kernelrepository and build a custom kernel. For those who are not up to thatlevel of effort, the project offers a number ofotheroptions, including a Debian repository, instructions for running MPTCPon Amazon's EC2, and kernels for a handful of Android-based handsets.Needless to say, the developers are highly interested in hearing bugreports or other testing results.


posted @ 2016-09-04 21:33  张同光  阅读(160)  评论(0编辑  收藏  举报