3关于tcp delayedack实践(三)服务间调用

上一篇关于tcp delayedack实践(二)http中,实践了一条http长连接,4次请求响应,应证了关于tcp delayedack实践(一)tcp中 “http短链接不会deleyed ack,但请注意,微服务中的长连接就有这样的情况了”,比如包太长被分了,这个例子好在http request未像tcp篇章中被分为2段,所以delayed ack与response一起返回给客户端,未造成时间浪费,在下面的案例中,就没那么幸运了

 

案例(1)

一次简单的 HTTP 调用,为什么时延这么大?抓个包分析下

http server response被分为2段(为什么不得而知,有可能超过MSS https://www.cnblogs.com/wlei/archive/2011/12/23/2299239.html),且未开tcpnodelay,客户端delayed ack

通过开启服务端tcpnodelay解决

 

案例(2)

https://gocardless.com/blog/in-search-of-performance-how-we-shaved-200ms-off-every-post-request/

Ruby's Net::HTTP splits POST requests across two TCP packets - one for the headers, and another for the body. curl, by contrast, combines the two if they'll fit in a single packet.

客户端 http client request被分为2段,且未开tcpnodelay,服务端delayed ack

通过客户端开启tcpnodelay解决

 

In search of performance - how we shaved 200ms off every POST request

By Chris SinjakliNorberto Lopes — Aug 2015 — 4 min read

While doing some work on our Pro dashboard, we noticed that search requests were taking around 300ms. We've got some people in the team who have used Elasticsearch for much larger datasets, and they were surprised by how slow the requests were, so we decided to take a look.

Today, we'll show how that investigation led to a 200ms improvement on all internal POST requests.

What we did

We started by taking a typical search request from the app and measuring how long it took. We tried this with both Ruby's Net::HTTP and from the command line using curl. The latter was visibly faster. Timing the requests showed that the request from Ruby took around 250ms, whereas the one from curl took only 50ms.

We were confident that whatever was going on was isolated to Ruby1, but we wanted to dig deeper, so we moved over to our staging environment. At that point, the problem disappeared entirely.

For a while, we were stumped. We run the same versions of Ruby and Elasticsearch in staging and production. It didn't make any sense! We took a step back, and looked over our stack, piece by piece. There was something in the middle which we hadn't thought about - HAProxy.

We quickly discovered that, due to an ongoing Ubuntu upgrade2, we were using different versions of HAProxy in staging (1.4.24) and production (1.4.18). Something in those 6 patch revisions was responsible, so we turned our eyes to the commit logs. There were a few candidates, but one patch stood out in particular.

We did a custom build of HAProxy 1.4.18, with just that patch added, and saw request times drop by around 200ms. Job done.

Under the hood

Since this issue was going to be fixed by the Ubuntu upgrades we were doing, we decided it wasn't worth shipping a custom HAProxy package. Before calling it a day, we decided to take a look at the whole request cycle using tcpdump, to really understand what was going on.

What we found was that Ruby's Net::HTTP splits POST requests across two TCP packets - one for the headers, and another for the body. curl, by contrast, combines the two if they'll fit in a single packet. To make things worse, Net::HTTP doesn't set TCP_NODELAY on the TCP socket it opens, so it waits for acknowledgement of the first packet before sending the second. This behaviour is a consequence of Nagle's algorithm.

Moving to the other end of the connection, HAProxy has to choose how to acknowledge those two packets. In version 1.4.18 (the one we were using), it opted to use TCP delayed acknowledgement.

Delayed acknowledgement interacts badly with Nagle's algorithm, and causes the request to pause until the server reaches its delayed acknowledgement timeout3.

 

 

blog > images > haproxy-200ms-diagram.svg

HAProxy 1.4.19 adds a special case for incomplete HTTP POST requests - if it receives a packet which only contains the first part of the request, it enables TCP_QUICKACK on the socket, and immediately acknowledges that packet.

 

More than just search

Having understood what was happening, we realised the fix had a far wider reach than our search endpoint. We run all of our services behind HAProxy, and it's no secret that we write a lot of Ruby. This combination meant that almost every POST request made inside our infrastructure incurred a 200ms delay. We took some measurements before and after rolling out the new version of HAProxy:

POST /endpointA
average (ms/req) before HAProxy upgrade: 271.13
average (ms/req) after HAProxy upgrade: 19.08

POST /endpointB
average (ms/req) before HAProxy upgrade: 323.78
average (ms/req) after HAProxy upgrade: 66.47

Quite the improvement!

Wrap-up

Even though the fix was as simple as upgrading a package, the knowledge we gained along the way is invaluable in the long-term.

It would have been easy to say search was fast enough and move on, but by diving into the problem we got to know more about how our applications run in production.

Doing this work really reinforced our belief that it's worth taking time to understand your stack.

posted on 2019-12-02 21:30  silyvin  阅读(293)  评论(0编辑  收藏  举报