控制grpc 流量 GRPC 性能 管理gRPC协议示例流量

Performance best practices with gRPC | Microsoft Docs https://docs.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-7.0

Performance best practices with gRPC

By James Newton-King

gRPC is designed for high-performance services. This document explains how to get the best performance possible from gRPC.

Reuse gRPC channels

A gRPC channel should be reused when making gRPC calls. Reusing a channel allows calls to be multiplexed through an existing HTTP/2 connection.

If a new channel is created for each gRPC call then the amount of time it takes to complete can increase significantly. Each call will require multiple network round-trips between the client and the server to create a new HTTP/2 connection:

  1. Opening a socket
  2. Establishing TCP connection
  3. Negotiating TLS
  4. Starting HTTP/2 connection
  5. Making the gRPC call

Channels are safe to share and reuse between gRPC calls:

  • gRPC clients are created with channels. gRPC clients are lightweight objects and don't need to be cached or reused.
  • Multiple gRPC clients can be created from a channel, including different types of clients.
  • A channel and clients created from the channel can safely be used by multiple threads.
  • Clients created from the channel can make multiple simultaneous calls.

gRPC client factory offers a centralized way to configure channels. It automatically reuses underlying channels. For more information, see gRPC client factory integration in .NET.

Connection concurrency

HTTP/2 connections typically have a limit on the number of maximum concurrent streams (active HTTP requests) on a connection at one time. By default, most servers set this limit to 100 concurrent streams.

A gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on that connection. When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent. Applications with high load, or long running streaming gRPC calls, could see performance issues caused by calls queuing because of this limit.

.NET 5 introduces the SocketsHttpHandler.EnableMultipleHttp2Connections property. When set to true, additional HTTP/2 connections are created by a channel when the concurrent stream limit is reached. When a GrpcChannel is created its internal SocketsHttpHandler is automatically configured to create additional HTTP/2 connections. If an app configures its own handler, consider setting EnableMultipleHttp2Connections to true:

C#
var channel = GrpcChannel.ForAddress("https://localhost", new GrpcChannelOptions
{
    HttpHandler = new SocketsHttpHandler
    {
        EnableMultipleHttp2Connections = true,

        // ...configure other handler settings
    }
});

There are a couple of workarounds for .NET Core 3.1 apps:

  • Create separate gRPC channels for areas of the app with high load. For example, the Logger gRPC service might have a high load. Use a separate channel to create the LoggerClient in the app.
  • Use a pool of gRPC channels, for example, create a list of gRPC channels. Random is used to pick a channel from the list each time a gRPC channel is needed. Using Random randomly distributes calls over multiple connections.

 Important

Increasing the maximum concurrent stream limit on the server is another way to solve this problem. In Kestrel this is configured with MaxStreamsPerConnection.

Increasing the maximum concurrent stream limit is not recommended. Too many streams on a single HTTP/2 connection introduces new performance issues:

  • Thread contention between streams trying to write to the connection.
  • Connection packet loss causes all calls to be blocked at the TCP layer.

ServerGarbageCollection in client apps

The .NET garbage collector has two modes: workstation garbage collection (GC) and server garbage collection. Each is each tuned for different workloads. ASP.NET Core apps use server GC by default.

Highly concurrent apps generally perform better with server GC. If a gRPC client app is sending and receiving a high number of gRPC calls at the same time, then there may be a performance benefit in updating the app to use server GC.

To enable server GC, set <ServerGarbageCollection> in the app's project file:

XML
<PropertyGroup>
  <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

For more information about garbage collection, see Workstation and server garbage collection.

 Note

ASP.NET Core apps use server GC by default. Enabling <ServerGarbageCollection> is only useful in non-server gRPC client apps, for example in a gRPC client console app.

Load balancing

Some load balancers don't work effectively with gRPC. L4 (transport) load balancers operate at a connection level, by distributing TCP connections across endpoints. This approach works well for loading balancing API calls made with HTTP/1.1. Concurrent calls made with HTTP/1.1 are sent on different connections, allowing calls to be load balanced across endpoints.

Because L4 load balancers operate at a connection level, they don't work well with gRPC. gRPC uses HTTP/2, which multiplexes multiple calls on a single TCP connection. All gRPC calls over that connection go to one endpoint.

There are two options to effectively load balance gRPC:

  • Client-side load balancing
  • L7 (application) proxy load balancing

 Note

Only gRPC calls can be load balanced between endpoints. Once a streaming gRPC call is established, all messages sent over the stream go to one endpoint.

Client-side load balancing

With client-side load balancing, the client knows about endpoints. For each gRPC call, it selects a different endpoint to send the call to. Client-side load balancing is a good choice when latency is important. There's no proxy between the client and the service, so the call is sent to the service directly. The downside to client-side load balancing is that each client must keep track of the available endpoints that it should use.

Lookaside client load balancing is a technique where load balancing state is stored in a central location. Clients periodically query the central location for information to use when making load balancing decisions.

For more information, see gRPC client-side load balancing.

Proxy load balancing

An L7 (application) proxy works at a higher level than an L4 (transport) proxy. L7 proxies understand HTTP/2, and are able to distribute gRPC calls multiplexed to the proxy on one HTTP/2 connection across multiple endpoints. Using a proxy is simpler than client-side load balancing, but can add extra latency to gRPC calls.

There are many L7 proxies available. Some options are:

Inter-process communication

gRPC calls between a client and service are usually sent over TCP sockets. TCP is great for communicating across a network, but inter-process communication (IPC) is more efficient when the client and service are on the same machine.

Consider using a transport like Unix domain sockets or named pipes for gRPC calls between processes on the same machine. For more information, see Inter-process communication with gRPC.

Keep alive pings

Keep alive pings can be used to keep HTTP/2 connections alive during periods of inactivity. Having an existing HTTP/2 connection ready when an app resumes activity allows for the initial gRPC calls to be made quickly, without a delay caused by the connection being reestablished.

Keep alive pings are configured on SocketsHttpHandler:

C#
var handler = new SocketsHttpHandler
{
    PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
    KeepAlivePingDelay = TimeSpan.FromSeconds(60),
    KeepAlivePingTimeout = TimeSpan.FromSeconds(30),
    EnableMultipleHttp2Connections = true
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
    HttpHandler = handler
});

The preceding code configures a channel that sends a keep alive ping to the server every 60 seconds during periods of inactivity. The ping ensures the server and any proxies in use won't close the connection because of inactivity.

Flow control

HTTP/2 flow control is a feature that prevents apps from being overwhelmed with data. When using flow control:

  • Each HTTP/2 connection and request has an available buffer window. The buffer window is how much data the app can receive at once.
  • Flow control activates if the buffer window is filled up. When activated, the sending app pauses sending more data.
  • Once the receiving app has processed data, then space in the buffer window is available. The sending app resumes sending data.

Flow control can have a negative impact on performance when receiving large messages. If the buffer window is smaller than incoming message payloads or there's latency between the client and server, then data can be sent in start/stop bursts.

Flow control performance issues can be fixed by increasing buffer window size. In Kestrel, this is configured with InitialConnectionWindowSize and InitialStreamWindowSize at app startup:

C#
builder.WebHost.ConfigureKestrel(options =>
{
    var http2 = options.Limits.Http2;
    http2.InitialConnectionWindowSize = 2 * 1024 * 1024 * 2; // 2 MB
    http2.InitialStreamWindowSize = 1024 * 1024; // 1 MB
});

Recommendations:

  • If a gRPC service often receives messages larger than 96 KB, Kestrel's default stream window size, then consider increasing the connection and stream window size.
  • The connection window size should always be equal to or greater than the stream window size. A stream is part of the connection, and the sender is limited by both.

For more information about how flow control works, see HTTP/2 Flow Control (blog post).

 Important

Increasing Kestrel's window size allows Kestrel to buffer more data on behalf of the app, which possibly increases memory usage. Avoid configuring an unnecessarily large window size.

Streaming

gRPC bidirectional streaming can be used to replace unary gRPC calls in high-performance scenarios. Once a bidirectional stream has started, streaming messages back and forth is faster than sending messages with multiple unary gRPC calls. Streamed messages are sent as data on an existing HTTP/2 request and eliminates the overhead of creating a new HTTP/2 request for each unary call.

Example service:

C#
public override async Task SayHello(IAsyncStreamReader<HelloRequest> requestStream,
    IServerStreamWriter<HelloReply> responseStream, ServerCallContext context)
{
    await foreach (var request in requestStream.ReadAllAsync())
    {
        var helloReply = new HelloReply { Message = "Hello " + request.Name };

        await responseStream.WriteAsync(helloReply);
    }
}

Example client:

C#
var client = new Greet.GreeterClient(channel);
using var call = client.SayHello();

Console.WriteLine("Type a name then press enter.");
while (true)
{
    var text = Console.ReadLine();

    // Send and receive messages over the stream
    await call.RequestStream.WriteAsync(new HelloRequest { Name = text });
    await call.ResponseStream.MoveNext();

    Console.WriteLine($"Greeting: {call.ResponseStream.Current.Message}");
}

Replacing unary calls with bidirectional streaming for performance reasons is an advanced technique and is not appropriate in many situations.

Using streaming calls is a good choice when:

  1. High throughput or low latency is required.
  2. gRPC and HTTP/2 are identified as a performance bottleneck.
  3. A worker in the client is sending or receiving regular messages with a gRPC service.

Be aware of the additional complexity and limitations of using streaming calls instead of unary:

  1. A stream can be interrupted by a service or connection error. Logic is required to restart stream if there is an error.
  2. RequestStream.WriteAsync is not safe for multi-threading. Only one message can be written to a stream at a time. Sending messages from multiple threads over a single stream requires a producer/consumer queue like Channel<T> to marshall messages.
  3. A gRPC streaming method is limited to receiving one type of message and sending one type of message. For example, rpc StreamingCall(stream RequestMessage) returns (stream ResponseMessage) receives RequestMessage and sends ResponseMessage. Protobuf's support for unknown or conditional messages using Any and oneof can work around this limitation.

Binary payloads

Binary payloads are supported in Protobuf with the bytes scalar value type. A generated property in C# uses ByteString as the property type.

ProtoBuf
syntax = "proto3";

message PayloadResponse {
    bytes data = 1;
}  

Protobuf is a binary format that efficiently serializes large binary payloads with minimal overhead. Text based formats like JSON require encoding bytes to base64 and add 33% to the message size.

When working with large ByteString payloads there are some best practices to avoid unnecessary copies and allocations that are discussed below.

Send binary payloads

ByteString instances are normally created using ByteString.CopyFrom(byte[] data). This method allocates a new ByteString and a new byte[]. Data is copied into the new byte array.

Additional allocations and copies can be avoided by using UnsafeByteOperations.UnsafeWrap(ReadOnlyMemory<byte> bytes) to create ByteString instances.

C#
var data = await File.ReadAllBytesAsync(path);

var payload = new PayloadResponse();
payload.Data = UnsafeByteOperations.UnsafeWrap(data);

Bytes are not copied with UnsafeByteOperations.UnsafeWrap so they must not be modified while the ByteString is in use.

UnsafeByteOperations.UnsafeWrap requires Google.Protobuf version 3.15.0 or later.

Read binary payloads

Data can be efficiently read from ByteString instances by using ByteString.Memory and ByteString.Span properties.

C#
var byteString = UnsafeByteOperations.UnsafeWrap(new byte[] { 0, 1, 2 });
var data = byteString.Span;

for (var i = 0; i < data.Length; i++)
{
    Console.WriteLine(data[i]);
}

These properties allow code to read data directly from a ByteString without allocations or copies.

Most .NET APIs have ReadOnlyMemory<byte> and byte[] overloads, so ByteString.Memory is the recommended way to use the underlying data. However, there are circumstances where an app might need to get the data as a byte array. If a byte array is required then the MemoryMarshal.TryGetArray method can be used to get an array from a ByteString without allocating a new copy of the data.

C#
var byteString = GetByteString();

ByteArrayContent content;
if (MemoryMarshal.TryGetArray(byteString.Memory, out var segment))
{
    // Success. Use the ByteString's underlying array.
    content = new ByteArrayContent(segment.Array, segment.Offset, segment.Count);
}
else
{
    // TryGetArray didn't succeed. Fall back to creating a copy of the data with ToByteArray.
    content = new ByteArrayContent(byteString.ToByteArray());
}

var httpRequest = new HttpRequestMessage();
httpRequest.Content = content;

The preceding code:

  • Attempts to get an array from ByteString.Memory with MemoryMarshal.TryGetArray.
  • Uses the ArraySegment<byte> if it was successfully retrieved. The segment has a reference to the array, offset and count.
  • Otherwise, falls back to allocating a new array with ByteString.ToByteArray().

gRPC services and large binary payloads

gRPC and Protobuf can send and receive large binary payloads. Although binary Protobuf is more efficient than text-based JSON at serializing binary payloads, there are still important performance characteristics to keep in mind when working with large binary payloads.

gRPC is a message-based RPC framework, which means:

  • The entire message is loaded into memory before gRPC can send it.
  • When the message is received, the entire message is deserialized into memory.

Binary payloads are allocated as a byte array. For example, a 10 MB binary payload allocates a 10 MB byte array. Messages with large binary payloads can allocate byte arrays on the large object heap. Large allocations impact server performance and scalability.

Advice for creating high-performance applications with large binary payloads:

 

 

GRPC 性能最佳做法 | Microsoft Docs https://docs.microsoft.com/zh-cn/aspnet/core/grpc/performance?view=aspnetcore-7.0

GRPC 性能最佳做法

作者:James Newton-King

gRPC 专用于高性能服务。 本文档介绍如何从 gRPC 获得最佳性能。

重用 gRPC 通道

进行 gRPC 调用时,应重新使用 gRPC 通道。 重用通道后通过现有的 HTTP/2 连接对调用进行多路复用。

如果为每个 gRPC 调用创建一个新通道,则完成此操作所需的时间可能会显著增加。 每次调用都需要在客户端和服务器之间进行多个网络往返,以创建新的 HTTP/2 连接:

  1. 打开套接字
  2. 建立 TCP 连接
  3. 协商 TLS
  4. 启动 HTTP/2 连接
  5. 进行 gRPC 调用

在 gRPC 调用之间可以安全地共享和重用通道:

  • gRPC 客户端是使用通道创建的。 gRPC 客户端是轻型对象,无需缓存或重用。
  • 可从一个通道创建多个 gRPC 客户端(包括不同类型的客户端)。
  • 通道和从该通道创建的客户端可由多个线程安全使用。
  • 从通道创建的客户端可同时进行多个调用。

GRPC 客户端工厂提供了一种集中配置通道的方法。 它会自动重用基础通道。 有关详细信息,请参阅 .NET 中的 gRPC 客户端工厂集成

连接并发

HTTP/2 连接通常会限制一个连接上同时存在的最大并发流(活动 HTTP 请求)数。 默认情况下,大多数服务器将此限制设置为 100 个并发流。

gRPC 通道使用单个 HTTP/2 连接,并且并发调用在该连接上多路复用。 当活动调用数达到连接流限制时,其他调用会在客户端中排队。 排队调用等待活动调用完成后再发送。 由于此限制,具有高负载或长时间运行的流式处理 gRPC 调用的应用程序可能会因调用排队而出现性能问题。

.NET 5 引入 SocketsHttpHandler.EnableMultipleHttp2Connections 属性。 如果设置为 true,则当达到并发流限制时,通道会创建额外的 HTTP/2 连接。 创建 GrpcChannel 时,会自动将其内部 SocketsHttpHandler 配置为创建额外的 HTTP/2 连接。 如果应用配置其自己的处理程序,请考虑将 EnableMultipleHttp2Connections 设置为 true

C#
var channel = GrpcChannel.ForAddress("https://localhost", new GrpcChannelOptions
{
    HttpHandler = new SocketsHttpHandler
    {
        EnableMultipleHttp2Connections = true,

        // ...configure other handler settings
    }
});

.NET Core 3.1 应用有几种解决方法:

  • 为具有高负载的应用的区域创建单独的 gRPC 通道。 例如,Logger gRPC 服务可能具有高负载。 使用单独的通道在应用中创建 LoggerClient
  • 使用 gRPC 通道池,例如创建 gRPC 通道列表。 每次需要 gRPC 通道时,使用 Random 从列表中选取一个通道。 使用 Random 在多个连接上随机分配调用。

 重要

提升服务器上的最大并发流限制是解决此问题的另一种方法。 在 Kestrel 中,这是用 MaxStreamsPerConnection 配置的。

不建议提升最大并发流限制。 单个 HTTP/2 连接上的流过多会带来新的性能问题:

  • 尝试写入连接的流之间发生线程争用。
  • 连接数据包丢失导致在 TCP 层阻止所有调用。

客户端应用中的 ServerGarbageCollection

.NET 垃圾回收器有两种模式:工作站垃圾回收 (GC) 和服务器垃圾回收。 每种模式都针对不同的工作负荷进行了微调。 ASP.NET Core 应用默认使用服务器 GC。

高并发应用通常在服务器 GC 下性能更佳。 如果 gRPC 客户端应用同时发送和接收大量 gRPC 调用,则在更新应用以使用服务器 GC 方面可能会有性能优势。

若要启用服务器 GC,请在应用的项目文件中设置 <ServerGarbageCollection>

XML
<PropertyGroup>
  <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

有关垃圾回收的详细信息,请参阅工作站和服务器垃圾回收

 备注

ASP.NET Core 应用默认使用服务器 GC。 启用 <ServerGarbageCollection> 仅对非服务器 gRPC 客户端应用有用,例如在 gRPC 客户端控制台应用中。

负载均衡

一些负载均衡器不能与 gRPC 一起高效工作。 通过在终结点之间分布 TCP 连接,L4(传输)负载均衡器在连接级别上运行。 这种方法非常适合使用 HTTP / 1.1 进行的负载均衡 API 调用。 使用 HTTP/1.1 进行的并发调用在不同的连接上发送,实现调用在终结点之间的负载均衡。

由于 L4 负载均衡器是在连接级别运行的,它们不太适用于 gRPC。 GRPC 使用 HTTP/2,在单个 TCP 连接上多路复用多个调用。 通过该连接的所有 gRPC 调用都将前往一个终结点。

有两种方法可以高效地对 gRPC 进行负载均衡:

  • 客户端负载均衡
  • L7(应用程序)代理负载均衡

 备注

只有 gRPC 调用可以在终结点之间进行负载均衡。 一旦建立了流式 gRPC 调用,通过流发送的所有消息都将前往一个终结点。

客户端负载均衡

对于客户端负载均衡,客户端了解终结点。 对于每个 gRPC 调用,客户端会选择一个不同的终结点作为将该调用发送到的目的地。 如果延迟很重要,那么客户端负载均衡是一个很好的选择。 客户端和服务之间没有代理,因此调用直接发送到服务。 客户端负载均衡的缺点是每个客户端必须跟踪它应该使用的可用终结点。

Lookaside 客户端负载均衡是一种将负载均衡状态存储在中心位置的技术。 客户端定期查询中心位置以获取在作出负载均衡决策时要使用的信息。

有关详细信息,请参阅 gRPC 客户端负载均衡

代理负载均衡

L7(应用程序)代理的工作级别高于 L4(传输)代理。 L7 代理了解 HTTP/2,并且能够在多个终结点之间的一个 HTTP/2 连接上将多路复用的 gRPC 调用分发给代理。 使用代理比客户端负载均衡更简单,但会增加 gRPC 调用的额外延迟。

有很多 L7 代理可用。 一些选项包括:

进程内通信

客户端和服务之间的 gRPC 调用通常通过 TCP 套接字发送。 TCP 非常适用于网络中的通信,但当客户端和服务在同一台计算机上时,进程间通信 (IPC) 的效率更高。

考虑在同一台计算机上的进程之间使用 Unix 域套接字或命名管道之类的传输进行 gRPC 调用。 有关详细信息,请参阅使用 gRPC 进行进程内通信

保持活动 ping

保持活动 ping 可用于在非活动期间使 HTTP/2 连接保持为活动状态。 如果在应用恢复活动时已准备好现有 HTTP/2 连接,则可以快速进行初始 gRPC 调用,而不会因重新建立连接而导致延迟。

在 SocketsHttpHandler 上配置保持活动 ping:

C#
var handler = new SocketsHttpHandler
{
    PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
    KeepAlivePingDelay = TimeSpan.FromSeconds(60),
    KeepAlivePingTimeout = TimeSpan.FromSeconds(30),
    EnableMultipleHttp2Connections = true
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
    HttpHandler = handler
});

前面的代码配置了一个通道,该通道在非活动期间每 60 秒向服务器发送一次保持活动 ping。 ping 确保服务器和使用中的任何代理不会由于不活动而关闭连接。

流量控制

HTTP/2 流量控制是一项防止应用被数据阻塞的功能。 使用流量控制时:

  • 每个 HTTP/2 连接和请求都有可用的缓冲区窗口。 缓冲区窗口是应用一次可以接收的数据量。
  • 如果填充缓冲区窗口,流量控制功能将激活。 激活后,发送应用会暂停发送更多数据。
  • 接收应用处理完数据后,缓冲区窗口中的空间将变为可用。 发送应用将恢复发送数据。

流量控制在接收大消息时可能会对性能产生负面影响。 如果缓冲区窗口小于传入消息有效负载或客户端和服务器之间出现延迟,则可以在启动/停止突发中发送数据。

流量控制性能问题可以通过增加缓冲区窗口大小来解决。 在 Kestrel 中,这是在应用启动时使用 InitialConnectionWindowSize 和 InitialStreamWindowSize 配置的:

C#
builder.WebHost.ConfigureKestrel(options =>
{
    var http2 = options.Limits.Http2;
    http2.InitialConnectionWindowSize = 2 * 1024 * 1024 * 2; // 2 MB
    http2.InitialStreamWindowSize = 1024 * 1024; // 1 MB
});

建议:

  • 如果 gRPC 服务通常接收大于 96 KB 的消息,即 Kestrel 的默认流窗口大小,请考虑增加连接和流窗口大小。
  • 连接窗口大小应始终等于或大于流窗口大小。 流是连接的一部分,发送方受到两者的限制。

有关流量控制工作原理的详细信息,请参阅 HTTP/2 流量控制(博客文章)

 重要

增加 Kestrel 的窗口大小允许 Kestrel 代表应用缓冲更多数据,这可能会增加内存使用量。 避免配置不必要的大型窗口大小。

流式处理

在高性能方案中,可使用 gRPC 双向流式处理取代一元 gRPC 调用。 双向流启动后,来回流式处理消息比使用多个一元 gRPC 调用发送消息更快。 流式处理消息作为现有 HTTP/2 请求上的数据发送,节省了为每个一元调用创建新的 HTTP/2 请求的开销。

示例服务:

C#
public override async Task SayHello(IAsyncStreamReader<HelloRequest> requestStream,
    IServerStreamWriter<HelloReply> responseStream, ServerCallContext context)
{
    await foreach (var request in requestStream.ReadAllAsync())
    {
        var helloReply = new HelloReply { Message = "Hello " + request.Name };

        await responseStream.WriteAsync(helloReply);
    }
}

示例客户端:

C#
var client = new Greet.GreeterClient(channel);
using var call = client.SayHello();

Console.WriteLine("Type a name then press enter.");
while (true)
{
    var text = Console.ReadLine();

    // Send and receive messages over the stream
    await call.RequestStream.WriteAsync(new HelloRequest { Name = text });
    await call.ResponseStream.MoveNext();

    Console.WriteLine($"Greeting: {call.ResponseStream.Current.Message}");
}

将一元调用替换为双向流式处理是一种高级技术,由于性能原因,这在许多情况下并不适用。

有以下情况时,使用流式处理调用是一个不错的选择:

  1. 需要高吞吐量或低延迟。
  2. gRPC 和 HTTP/2 被标识为性能瓶颈。
  3. 客户端的辅助程序使用 gRPC 服务发送或接收常规消息。

请注意使用流式处理调用而不是一元调用的其他复杂性和限制:

  1. 流可能会因服务或连接错误而中断。 需要在出现错误时重启流的逻辑。
  2. 对于多线程处理,RequestStream.WriteAsync 并不安全。 一次只能将一条消息写入流中。 通过单个流从多个线程发送消息需要制造者/使用者队列(如 Channel<T>)来整理消息。
  3. gRPC 流式处理方法仅限于接收一种类型的消息并发送一种类型的消息。 例如,rpc StreamingCall(stream RequestMessage) returns (stream ResponseMessage) 接收 RequestMessage 并发送 ResponseMessage。 Protobuf 对使用 Any 和 oneof 支持未知消息或条件消息,可以解决此限制。

二进制有效负载

Protobuf 支持标量值类型为 bytes 的二进制有效负载。 C# 中生成的属性使用 ByteString 作为属性类型。

ProtoBuf
syntax = "proto3";

message PayloadResponse {
    bytes data = 1;
}  

Protobuf 是一种二进制格式,它以最小开销有效地序列化大型二进制有效负载。 基于文本的格式(如 JSON)需要将字节编码为 base64,并将 33% 添加到消息大小。

使用大型 ByteString 有效负载时,有一些最佳做法可以避免下面所讨论的不必要副本和分配。

发送二进制有效负载

ByteString 实例通常使用 ByteString.CopyFrom(byte[] data) 创建。 此方法会分配新的 ByteString 和新的 byte[]。 数据会复制到新的字节数组中。

通过使用 UnsafeByteOperations.UnsafeWrap(ReadOnlyMemory<byte> bytes) 创建 ByteString 实例,可以避免其他分配和复制操作。

C#
var data = await File.ReadAllBytesAsync(path);

var payload = new PayloadResponse();
payload.Data = UnsafeByteOperations.UnsafeWrap(data);

字节不会通过 UnsafeByteOperations.UnsafeWrap 进行复制,因此在使用 ByteString 时,不得修改字节。

UnsafeByteOperations.UnsafeWrap 要求使用 Google.Protobuf 版本 3.15.0 或更高版本。

读取二进制有效负载

通过使用 ByteString.Memory 和 ByteString.Span 属性,可以有效地从 ByteString 实例读取数据。

C#
var byteString = UnsafeByteOperations.UnsafeWrap(new byte[] { 0, 1, 2 });
var data = byteString.Span;

for (var i = 0; i < data.Length; i++)
{
    Console.WriteLine(data[i]);
}

这些属性允许代码直接从 ByteString 读取数据,而无需分配或副本。

大多数 .NET API 具有 ReadOnlyMemory<byte> 和 byte[] 重载,因此建议使用 ByteString.Memory 来使用基础数据。 但是,在某些情况下,应用可能需要将数据作为字节数组获取。 如果需要字节数组,则 MemoryMarshal.TryGetArray 方法可用于从 ByteString 获取数组,而无需分配数据的新副本。

C#
var byteString = GetByteString();

ByteArrayContent content;
if (MemoryMarshal.TryGetArray(byteString.Memory, out var segment))
{
    // Success. Use the ByteString's underlying array.
    content = new ByteArrayContent(segment.Array, segment.Offset, segment.Count);
}
else
{
    // TryGetArray didn't succeed. Fall back to creating a copy of the data with ToByteArray.
    content = new ByteArrayContent(byteString.ToByteArray());
}

var httpRequest = new HttpRequestMessage();
httpRequest.Content = content;

前面的代码:

  • 尝试使用 MemoryMarshal.TryGetArray 从 ByteString.Memory 获取数组。
  • 如果成功检索,则使用 ArraySegment<byte>。 段具有对数组、偏移和计数的引用。
  • 否则,将回退到使用 ByteString.ToByteArray() 分配新数组。

gRPC 服务和大型二进制有效负载

gRPC 和 Protobuf 可以发送和接收大型二进制有效负载。 尽管二进制 Protobuf 在序列化二进制有效负载时比基于文本的 JSON 更有效,但在处理大型二进制有效负载时仍然需要牢记重要的性能特征。

gRPC 是一个基于消息的 RPC 框架,这意味着:

  • 在 gRPC 可以发送整个消息之前,将整个消息加载到内存中。
  • 收到消息后,整个消息将反序列化为内存。

二进制有效负载被分配为字节数组。 例如,10 MB 二进制有效负载分配了一个 10 MB 的字节数组。 具有大型二进制有效负载的消息可以在大型对象堆上分配字节数组。 大型分配会影响服务器性能和可伸缩性。

有关创建具有大型二进制有效负载的高性能应用程序的建议:

 

 

管理gRPC协议示例流量 https://help.aliyun.com/document_detail/187134.html

 

 

 

 

GRPC 性能最佳做法

作者:James Newton-King

gRPC 专用于高性能服务。 本文档介绍如何从 gRPC 获得最佳性能。

重用 gRPC 通道

进行 gRPC 调用时,应重新使用 gRPC 通道。 重用通道后通过现有的 HTTP/2 连接对调用进行多路复用。

如果为每个 gRPC 调用创建一个新通道,则完成此操作所需的时间可能会显著增加。 每次调用都需要在客户端和服务器之间进行多个网络往返,以创建新的 HTTP/2 连接:

  1. 打开套接字
  2. 建立 TCP 连接
  3. 协商 TLS
  4. 启动 HTTP/2 连接
  5. 进行 gRPC 调用

在 gRPC 调用之间可以安全地共享和重用通道:

  • gRPC 客户端是使用通道创建的。 gRPC 客户端是轻型对象,无需缓存或重用。
  • 可从一个通道创建多个 gRPC 客户端(包括不同类型的客户端)。
  • 通道和从该通道创建的客户端可由多个线程安全使用。
  • 从通道创建的客户端可同时进行多个调用。

GRPC 客户端工厂提供了一种集中配置通道的方法。 它会自动重用基础通道。 有关详细信息,请参阅 .NET 中的 gRPC 客户端工厂集成

连接并发

HTTP/2 连接通常会限制一个连接上同时存在的最大并发流(活动 HTTP 请求)数。 默认情况下,大多数服务器将此限制设置为 100 个并发流。

gRPC 通道使用单个 HTTP/2 连接,并且并发调用在该连接上多路复用。 当活动调用数达到连接流限制时,其他调用会在客户端中排队。 排队调用等待活动调用完成后再发送。 由于此限制,具有高负载或长时间运行的流式处理 gRPC 调用的应用程序可能会因调用排队而出现性能问题。

.NET 5 引入 SocketsHttpHandler.EnableMultipleHttp2Connections 属性。 如果设置为 true,则当达到并发流限制时,通道会创建额外的 HTTP/2 连接。 创建 GrpcChannel 时,会自动将其内部 SocketsHttpHandler 配置为创建额外的 HTTP/2 连接。 如果应用配置其自己的处理程序,请考虑将 EnableMultipleHttp2Connections 设置为 true

C#
var channel = GrpcChannel.ForAddress("https://localhost", new GrpcChannelOptions
{
    HttpHandler = new SocketsHttpHandler
    {
        EnableMultipleHttp2Connections = true,

        // ...configure other handler settings
    }
});

.NET Core 3.1 应用有几种解决方法:

  • 为具有高负载的应用的区域创建单独的 gRPC 通道。 例如,Logger gRPC 服务可能具有高负载。 使用单独的通道在应用中创建 LoggerClient
  • 使用 gRPC 通道池,例如创建 gRPC 通道列表。 每次需要 gRPC 通道时,使用 Random 从列表中选取一个通道。 使用 Random 在多个连接上随机分配调用。

 重要

提升服务器上的最大并发流限制是解决此问题的另一种方法。 在 Kestrel 中,这是用 MaxStreamsPerConnection 配置的。

不建议提升最大并发流限制。 单个 HTTP/2 连接上的流过多会带来新的性能问题:

  • 尝试写入连接的流之间发生线程争用。
  • 连接数据包丢失导致在 TCP 层阻止所有调用。

客户端应用中的 ServerGarbageCollection

.NET 垃圾回收器有两种模式:工作站垃圾回收 (GC) 和服务器垃圾回收。 每种模式都针对不同的工作负荷进行了微调。 ASP.NET Core 应用默认使用服务器 GC。

高并发应用通常在服务器 GC 下性能更佳。 如果 gRPC 客户端应用同时发送和接收大量 gRPC 调用,则在更新应用以使用服务器 GC 方面可能会有性能优势。

若要启用服务器 GC,请在应用的项目文件中设置 <ServerGarbageCollection>

XML
<PropertyGroup>
  <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

有关垃圾回收的详细信息,请参阅工作站和服务器垃圾回收

 备注

ASP.NET Core 应用默认使用服务器 GC。 启用 <ServerGarbageCollection> 仅对非服务器 gRPC 客户端应用有用,例如在 gRPC 客户端控制台应用中。

负载均衡

一些负载均衡器不能与 gRPC 一起高效工作。 通过在终结点之间分布 TCP 连接,L4(传输)负载均衡器在连接级别上运行。 这种方法非常适合使用 HTTP / 1.1 进行的负载均衡 API 调用。 使用 HTTP/1.1 进行的并发调用在不同的连接上发送,实现调用在终结点之间的负载均衡。

由于 L4 负载均衡器是在连接级别运行的,它们不太适用于 gRPC。 GRPC 使用 HTTP/2,在单个 TCP 连接上多路复用多个调用。 通过该连接的所有 gRPC 调用都将前往一个终结点。

有两种方法可以高效地对 gRPC 进行负载均衡:

  • 客户端负载均衡
  • L7(应用程序)代理负载均衡

 备注

只有 gRPC 调用可以在终结点之间进行负载均衡。 一旦建立了流式 gRPC 调用,通过流发送的所有消息都将前往一个终结点。

客户端负载均衡

对于客户端负载均衡,客户端了解终结点。 对于每个 gRPC 调用,客户端会选择一个不同的终结点作为将该调用发送到的目的地。 如果延迟很重要,那么客户端负载均衡是一个很好的选择。 客户端和服务之间没有代理,因此调用直接发送到服务。 客户端负载均衡的缺点是每个客户端必须跟踪它应该使用的可用终结点。

Lookaside 客户端负载均衡是一种将负载均衡状态存储在中心位置的技术。 客户端定期查询中心位置以获取在作出负载均衡决策时要使用的信息。

有关详细信息,请参阅 gRPC 客户端负载均衡

代理负载均衡

L7(应用程序)代理的工作级别高于 L4(传输)代理。 L7 代理了解 HTTP/2,并且能够在多个终结点之间的一个 HTTP/2 连接上将多路复用的 gRPC 调用分发给代理。 使用代理比客户端负载均衡更简单,但会增加 gRPC 调用的额外延迟。

有很多 L7 代理可用。 一些选项包括:

进程内通信

客户端和服务之间的 gRPC 调用通常通过 TCP 套接字发送。 TCP 非常适用于网络中的通信,但当客户端和服务在同一台计算机上时,进程间通信 (IPC) 的效率更高。

考虑在同一台计算机上的进程之间使用 Unix 域套接字或命名管道之类的传输进行 gRPC 调用。 有关详细信息,请参阅使用 gRPC 进行进程内通信

保持活动 ping

保持活动 ping 可用于在非活动期间使 HTTP/2 连接保持为活动状态。 如果在应用恢复活动时已准备好现有 HTTP/2 连接,则可以快速进行初始 gRPC 调用,而不会因重新建立连接而导致延迟。

在 SocketsHttpHandler 上配置保持活动 ping:

C#
var handler = new SocketsHttpHandler
{
    PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
    KeepAlivePingDelay = TimeSpan.FromSeconds(60),
    KeepAlivePingTimeout = TimeSpan.FromSeconds(30),
    EnableMultipleHttp2Connections = true
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
    HttpHandler = handler
});

前面的代码配置了一个通道,该通道在非活动期间每 60 秒向服务器发送一次保持活动 ping。 ping 确保服务器和使用中的任何代理不会由于不活动而关闭连接。

流量控制

HTTP/2 流量控制是一项防止应用被数据阻塞的功能。 使用流量控制时:

  • 每个 HTTP/2 连接和请求都有可用的缓冲区窗口。 缓冲区窗口是应用一次可以接收的数据量。
  • 如果填充缓冲区窗口,流量控制功能将激活。 激活后,发送应用会暂停发送更多数据。
  • 接收应用处理完数据后,缓冲区窗口中的空间将变为可用。 发送应用将恢复发送数据。

流量控制在接收大消息时可能会对性能产生负面影响。 如果缓冲区窗口小于传入消息有效负载或客户端和服务器之间出现延迟,则可以在启动/停止突发中发送数据。

流量控制性能问题可以通过增加缓冲区窗口大小来解决。 在 Kestrel 中,这是在应用启动时使用 InitialConnectionWindowSize 和 InitialStreamWindowSize 配置的:

C#
builder.WebHost.ConfigureKestrel(options =>
{
    var http2 = options.Limits.Http2;
    http2.InitialConnectionWindowSize = 2 * 1024 * 1024 * 2; // 2 MB
    http2.InitialStreamWindowSize = 1024 * 1024; // 1 MB
});

建议:

  • 如果 gRPC 服务通常接收大于 96 KB 的消息,即 Kestrel 的默认流窗口大小,请考虑增加连接和流窗口大小。
  • 连接窗口大小应始终等于或大于流窗口大小。 流是连接的一部分,发送方受到两者的限制。

有关流量控制工作原理的详细信息,请参阅 HTTP/2 流量控制(博客文章)

 重要

增加 Kestrel 的窗口大小允许 Kestrel 代表应用缓冲更多数据,这可能会增加内存使用量。 避免配置不必要的大型窗口大小。

流式处理

在高性能方案中,可使用 gRPC 双向流式处理取代一元 gRPC 调用。 双向流启动后,来回流式处理消息比使用多个一元 gRPC 调用发送消息更快。 流式处理消息作为现有 HTTP/2 请求上的数据发送,节省了为每个一元调用创建新的 HTTP/2 请求的开销。

示例服务:

C#
public override async Task SayHello(IAsyncStreamReader<HelloRequest> requestStream,
    IServerStreamWriter<HelloReply> responseStream, ServerCallContext context)
{
    await foreach (var request in requestStream.ReadAllAsync())
    {
        var helloReply = new HelloReply { Message = "Hello " + request.Name };

        await responseStream.WriteAsync(helloReply);
    }
}

示例客户端:

C#
var client = new Greet.GreeterClient(channel);
using var call = client.SayHello();

Console.WriteLine("Type a name then press enter.");
while (true)
{
    var text = Console.ReadLine();

    // Send and receive messages over the stream
    await call.RequestStream.WriteAsync(new HelloRequest { Name = text });
    await call.ResponseStream.MoveNext();

    Console.WriteLine($"Greeting: {call.ResponseStream.Current.Message}");
}

将一元调用替换为双向流式处理是一种高级技术,由于性能原因,这在许多情况下并不适用。

有以下情况时,使用流式处理调用是一个不错的选择:

  1. 需要高吞吐量或低延迟。
  2. gRPC 和 HTTP/2 被标识为性能瓶颈。
  3. 客户端的辅助程序使用 gRPC 服务发送或接收常规消息。

请注意使用流式处理调用而不是一元调用的其他复杂性和限制:

  1. 流可能会因服务或连接错误而中断。 需要在出现错误时重启流的逻辑。
  2. 对于多线程处理,RequestStream.WriteAsync 并不安全。 一次只能将一条消息写入流中。 通过单个流从多个线程发送消息需要制造者/使用者队列(如 Channel<T>)来整理消息。
  3. gRPC 流式处理方法仅限于接收一种类型的消息并发送一种类型的消息。 例如,rpc StreamingCall(stream RequestMessage) returns (stream ResponseMessage) 接收 RequestMessage 并发送 ResponseMessage。 Protobuf 对使用 Any 和 oneof 支持未知消息或条件消息,可以解决此限制。

二进制有效负载

Protobuf 支持标量值类型为 bytes 的二进制有效负载。 C# 中生成的属性使用 ByteString 作为属性类型。

ProtoBuf
syntax = "proto3";

message PayloadResponse {
    bytes data = 1;
}  

Protobuf 是一种二进制格式,它以最小开销有效地序列化大型二进制有效负载。 基于文本的格式(如 JSON)需要将字节编码为 base64,并将 33% 添加到消息大小。

使用大型 ByteString 有效负载时,有一些最佳做法可以避免下面所讨论的不必要副本和分配。

发送二进制有效负载

ByteString 实例通常使用 ByteString.CopyFrom(byte[] data) 创建。 此方法会分配新的 ByteString 和新的 byte[]。 数据会复制到新的字节数组中。

通过使用 UnsafeByteOperations.UnsafeWrap(ReadOnlyMemory<byte> bytes) 创建 ByteString 实例,可以避免其他分配和复制操作。

C#
var data = await File.ReadAllBytesAsync(path);

var payload = new PayloadResponse();
payload.Data = UnsafeByteOperations.UnsafeWrap(data);

字节不会通过 UnsafeByteOperations.UnsafeWrap 进行复制,因此在使用 ByteString 时,不得修改字节。

UnsafeByteOperations.UnsafeWrap 要求使用 Google.Protobuf 版本 3.15.0 或更高版本。

读取二进制有效负载

通过使用 ByteString.Memory 和 ByteString.Span 属性,可以有效地从 ByteString 实例读取数据。

C#
var byteString = UnsafeByteOperations.UnsafeWrap(new byte[] { 0, 1, 2 });
var data = byteString.Span;

for (var i = 0; i < data.Length; i++)
{
    Console.WriteLine(data[i]);
}

这些属性允许代码直接从 ByteString 读取数据,而无需分配或副本。

大多数 .NET API 具有 ReadOnlyMemory<byte> 和 byte[] 重载,因此建议使用 ByteString.Memory 来使用基础数据。 但是,在某些情况下,应用可能需要将数据作为字节数组获取。 如果需要字节数组,则 MemoryMarshal.TryGetArray 方法可用于从 ByteString 获取数组,而无需分配数据的新副本。

C#
var byteString = GetByteString();

ByteArrayContent content;
if (MemoryMarshal.TryGetArray(byteString.Memory, out var segment))
{
    // Success. Use the ByteString's underlying array.
    content = new ByteArrayContent(segment.Array, segment.Offset, segment.Count);
}
else
{
    // TryGetArray didn't succeed. Fall back to creating a copy of the data with ToByteArray.
    content = new ByteArrayContent(byteString.ToByteArray());
}

var httpRequest = new HttpRequestMessage();
httpRequest.Content = content;

前面的代码:

  • 尝试使用 MemoryMarshal.TryGetArray 从 ByteString.Memory 获取数组。
  • 如果成功检索,则使用 ArraySegment<byte>。 段具有对数组、偏移和计数的引用。
  • 否则,将回退到使用 ByteString.ToByteArray() 分配新数组。

gRPC 服务和大型二进制有效负载

gRPC 和 Protobuf 可以发送和接收大型二进制有效负载。 尽管二进制 Protobuf 在序列化二进制有效负载时比基于文本的 JSON 更有效,但在处理大型二进制有效负载时仍然需要牢记重要的性能特征。

gRPC 是一个基于消息的 RPC 框架,这意味着:

  • 在 gRPC 可以发送整个消息之前,将整个消息加载到内存中。
  • 收到消息后,整个消息将反序列化为内存。

二进制有效负载被分配为字节数组。 例如,10 MB 二进制有效负载分配了一个 10 MB 的字节数组。 具有大型二进制有效负载的消息可以在大型对象堆上分配字节数组。 大型分配会影响服务器性能和可伸缩性。

有关创建具有大型二进制有效负载的高性能应用程序的建议:

Performance best practices with gRPC

By James Newton-King

gRPC is designed for high-performance services. This document explains how to get the best performance possible from gRPC.

Reuse gRPC channels

A gRPC channel should be reused when making gRPC calls. Reusing a channel allows calls to be multiplexed through an existing HTTP/2 connection.

If a new channel is created for each gRPC call then the amount of time it takes to complete can increase significantly. Each call will require multiple network round-trips between the client and the server to create a new HTTP/2 connection:

  1. Opening a socket
  2. Establishing TCP connection
  3. Negotiating TLS
  4. Starting HTTP/2 connection
  5. Making the gRPC call

Channels are safe to share and reuse between gRPC calls:

  • gRPC clients are created with channels. gRPC clients are lightweight objects and don't need to be cached or reused.
  • Multiple gRPC clients can be created from a channel, including different types of clients.
  • A channel and clients created from the channel can safely be used by multiple threads.
  • Clients created from the channel can make multiple simultaneous calls.

gRPC client factory offers a centralized way to configure channels. It automatically reuses underlying channels. For more information, see gRPC client factory integration in .NET.

Connection concurrency

HTTP/2 connections typically have a limit on the number of maximum concurrent streams (active HTTP requests) on a connection at one time. By default, most servers set this limit to 100 concurrent streams.

A gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on that connection. When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent. Applications with high load, or long running streaming gRPC calls, could see performance issues caused by calls queuing because of this limit.

.NET 5 introduces the SocketsHttpHandler.EnableMultipleHttp2Connections property. When set to true, additional HTTP/2 connections are created by a channel when the concurrent stream limit is reached. When a GrpcChannel is created its internal SocketsHttpHandler is automatically configured to create additional HTTP/2 connections. If an app configures its own handler, consider setting EnableMultipleHttp2Connections to true:

C#
var channel = GrpcChannel.ForAddress("https://localhost", new GrpcChannelOptions
{
    HttpHandler = new SocketsHttpHandler
    {
        EnableMultipleHttp2Connections = true,

        // ...configure other handler settings
    }
});

There are a couple of workarounds for .NET Core 3.1 apps:

  • Create separate gRPC channels for areas of the app with high load. For example, the Logger gRPC service might have a high load. Use a separate channel to create the LoggerClient in the app.
  • Use a pool of gRPC channels, for example, create a list of gRPC channels. Random is used to pick a channel from the list each time a gRPC channel is needed. Using Random randomly distributes calls over multiple connections.

 Important

Increasing the maximum concurrent stream limit on the server is another way to solve this problem. In Kestrel this is configured with MaxStreamsPerConnection.

Increasing the maximum concurrent stream limit is not recommended. Too many streams on a single HTTP/2 connection introduces new performance issues:

  • Thread contention between streams trying to write to the connection.
  • Connection packet loss causes all calls to be blocked at the TCP layer.

ServerGarbageCollection in client apps

The .NET garbage collector has two modes: workstation garbage collection (GC) and server garbage collection. Each is each tuned for different workloads. ASP.NET Core apps use server GC by default.

Highly concurrent apps generally perform better with server GC. If a gRPC client app is sending and receiving a high number of gRPC calls at the same time, then there may be a performance benefit in updating the app to use server GC.

To enable server GC, set <ServerGarbageCollection> in the app's project file:

XML
<PropertyGroup>
  <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

For more information about garbage collection, see Workstation and server garbage collection.

 Note

ASP.NET Core apps use server GC by default. Enabling <ServerGarbageCollection> is only useful in non-server gRPC client apps, for example in a gRPC client console app.

Load balancing

Some load balancers don't work effectively with gRPC. L4 (transport) load balancers operate at a connection level, by distributing TCP connections across endpoints. This approach works well for loading balancing API calls made with HTTP/1.1. Concurrent calls made with HTTP/1.1 are sent on different connections, allowing calls to be load balanced across endpoints.

Because L4 load balancers operate at a connection level, they don't work well with gRPC. gRPC uses HTTP/2, which multiplexes multiple calls on a single TCP connection. All gRPC calls over that connection go to one endpoint.

There are two options to effectively load balance gRPC:

  • Client-side load balancing
  • L7 (application) proxy load balancing

 Note

Only gRPC calls can be load balanced between endpoints. Once a streaming gRPC call is established, all messages sent over the stream go to one endpoint.

Client-side load balancing

With client-side load balancing, the client knows about endpoints. For each gRPC call, it selects a different endpoint to send the call to. Client-side load balancing is a good choice when latency is important. There's no proxy between the client and the service, so the call is sent to the service directly. The downside to client-side load balancing is that each client must keep track of the available endpoints that it should use.

Lookaside client load balancing is a technique where load balancing state is stored in a central location. Clients periodically query the central location for information to use when making load balancing decisions.

For more information, see gRPC client-side load balancing.

Proxy load balancing

An L7 (application) proxy works at a higher level than an L4 (transport) proxy. L7 proxies understand HTTP/2, and are able to distribute gRPC calls multiplexed to the proxy on one HTTP/2 connection across multiple endpoints. Using a proxy is simpler than client-side load balancing, but can add extra latency to gRPC calls.

There are many L7 proxies available. Some options are:

Inter-process communication

gRPC calls between a client and service are usually sent over TCP sockets. TCP is great for communicating across a network, but inter-process communication (IPC) is more efficient when the client and service are on the same machine.

Consider using a transport like Unix domain sockets or named pipes for gRPC calls between processes on the same machine. For more information, see Inter-process communication with gRPC.

Keep alive pings

Keep alive pings can be used to keep HTTP/2 connections alive during periods of inactivity. Having an existing HTTP/2 connection ready when an app resumes activity allows for the initial gRPC calls to be made quickly, without a delay caused by the connection being reestablished.

Keep alive pings are configured on SocketsHttpHandler:

C#
var handler = new SocketsHttpHandler
{
    PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
    KeepAlivePingDelay = TimeSpan.FromSeconds(60),
    KeepAlivePingTimeout = TimeSpan.FromSeconds(30),
    EnableMultipleHttp2Connections = true
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
    HttpHandler = handler
});

The preceding code configures a channel that sends a keep alive ping to the server every 60 seconds during periods of inactivity. The ping ensures the server and any proxies in use won't close the connection because of inactivity.

Flow control

HTTP/2 flow control is a feature that prevents apps from being overwhelmed with data. When using flow control:

  • Each HTTP/2 connection and request has an available buffer window. The buffer window is how much data the app can receive at once.
  • Flow control activates if the buffer window is filled up. When activated, the sending app pauses sending more data.
  • Once the receiving app has processed data, then space in the buffer window is available. The sending app resumes sending data.

Flow control can have a negative impact on performance when receiving large messages. If the buffer window is smaller than incoming message payloads or there's latency between the client and server, then data can be sent in start/stop bursts.

Flow control performance issues can be fixed by increasing buffer window size. In Kestrel, this is configured with InitialConnectionWindowSize and InitialStreamWindowSize at app startup:

C#
builder.WebHost.ConfigureKestrel(options =>
{
    var http2 = options.Limits.Http2;
    http2.InitialConnectionWindowSize = 2 * 1024 * 1024 * 2; // 2 MB
    http2.InitialStreamWindowSize = 1024 * 1024; // 1 MB
});

Recommendations:

  • If a gRPC service often receives messages larger than 96 KB, Kestrel's default stream window size, then consider increasing the connection and stream window size.
  • The connection window size should always be equal to or greater than the stream window size. A stream is part of the connection, and the sender is limited by both.

For more information about how flow control works, see HTTP/2 Flow Control (blog post).

 Important

Increasing Kestrel's window size allows Kestrel to buffer more data on behalf of the app, which possibly increases memory usage. Avoid configuring an unnecessarily large window size.

Streaming

gRPC bidirectional streaming can be used to replace unary gRPC calls in high-performance scenarios. Once a bidirectional stream has started, streaming messages back and forth is faster than sending messages with multiple unary gRPC calls. Streamed messages are sent as data on an existing HTTP/2 request and eliminates the overhead of creating a new HTTP/2 request for each unary call.

Example service:

C#
public override async Task SayHello(IAsyncStreamReader<HelloRequest> requestStream,
    IServerStreamWriter<HelloReply> responseStream, ServerCallContext context)
{
    await foreach (var request in requestStream.ReadAllAsync())
    {
        var helloReply = new HelloReply { Message = "Hello " + request.Name };

        await responseStream.WriteAsync(helloReply);
    }
}

Example client:

C#
var client = new Greet.GreeterClient(channel);
using var call = client.SayHello();

Console.WriteLine("Type a name then press enter.");
while (true)
{
    var text = Console.ReadLine();

    // Send and receive messages over the stream
    await call.RequestStream.WriteAsync(new HelloRequest { Name = text });
    await call.ResponseStream.MoveNext();

    Console.WriteLine($"Greeting: {call.ResponseStream.Current.Message}");
}

Replacing unary calls with bidirectional streaming for performance reasons is an advanced technique and is not appropriate in many situations.

Using streaming calls is a good choice when:

  1. High throughput or low latency is required.
  2. gRPC and HTTP/2 are identified as a performance bottleneck.
  3. A worker in the client is sending or receiving regular messages with a gRPC service.

Be aware of the additional complexity and limitations of using streaming calls instead of unary:

  1. A stream can be interrupted by a service or connection error. Logic is required to restart stream if there is an error.
  2. RequestStream.WriteAsync is not safe for multi-threading. Only one message can be written to a stream at a time. Sending messages from multiple threads over a single stream requires a producer/consumer queue like Channel<T> to marshall messages.
  3. A gRPC streaming method is limited to receiving one type of message and sending one type of message. For example, rpc StreamingCall(stream RequestMessage) returns (stream ResponseMessage) receives RequestMessage and sends ResponseMessage. Protobuf's support for unknown or conditional messages using Any and oneof can work around this limitation.

Binary payloads

Binary payloads are supported in Protobuf with the bytes scalar value type. A generated property in C# uses ByteString as the property type.

ProtoBuf
syntax = "proto3";

message PayloadResponse {
    bytes data = 1;
}  

Protobuf is a binary format that efficiently serializes large binary payloads with minimal overhead. Text based formats like JSON require encoding bytes to base64 and add 33% to the message size.

When working with large ByteString payloads there are some best practices to avoid unnecessary copies and allocations that are discussed below.

Send binary payloads

ByteString instances are normally created using ByteString.CopyFrom(byte[] data). This method allocates a new ByteString and a new byte[]. Data is copied into the new byte array.

Additional allocations and copies can be avoided by using UnsafeByteOperations.UnsafeWrap(ReadOnlyMemory<byte> bytes) to create ByteString instances.

C#
var data = await File.ReadAllBytesAsync(path);

var payload = new PayloadResponse();
payload.Data = UnsafeByteOperations.UnsafeWrap(data);

Bytes are not copied with UnsafeByteOperations.UnsafeWrap so they must not be modified while the ByteString is in use.

UnsafeByteOperations.UnsafeWrap requires Google.Protobuf version 3.15.0 or later.

Read binary payloads

Data can be efficiently read from ByteString instances by using ByteString.Memory and ByteString.Span properties.

C#
var byteString = UnsafeByteOperations.UnsafeWrap(new byte[] { 0, 1, 2 });
var data = byteString.Span;

for (var i = 0; i < data.Length; i++)
{
    Console.WriteLine(data[i]);
}

These properties allow code to read data directly from a ByteString without allocations or copies.

Most .NET APIs have ReadOnlyMemory<byte> and byte[] overloads, so ByteString.Memory is the recommended way to use the underlying data. However, there are circumstances where an app might need to get the data as a byte array. If a byte array is required then the MemoryMarshal.TryGetArray method can be used to get an array from a ByteString without allocating a new copy of the data.

C#
var byteString = GetByteString();

ByteArrayContent content;
if (MemoryMarshal.TryGetArray(byteString.Memory, out var segment))
{
    // Success. Use the ByteString's underlying array.
    content = new ByteArrayContent(segment.Array, segment.Offset, segment.Count);
}
else
{
    // TryGetArray didn't succeed. Fall back to creating a copy of the data with ToByteArray.
    content = new ByteArrayContent(byteString.ToByteArray());
}

var httpRequest = new HttpRequestMessage();
httpRequest.Content = content;

The preceding code:

  • Attempts to get an array from ByteString.Memory with MemoryMarshal.TryGetArray.
  • Uses the ArraySegment<byte> if it was successfully retrieved. The segment has a reference to the array, offset and count.
  • Otherwise, falls back to allocating a new array with ByteString.ToByteArray().

gRPC services and large binary payloads

gRPC and Protobuf can send and receive large binary payloads. Although binary Protobuf is more efficient than text-based JSON at serializing binary payloads, there are still important performance characteristics to keep in mind when working with large binary payloads.

gRPC is a message-based RPC framework, which means:

  • The entire message is loaded into memory before gRPC can send it.
  • When the message is received, the entire message is deserialized into memory.

Binary payloads are allocated as a byte array. For example, a 10 MB binary payload allocates a 10 MB byte array. Messages with large binary payloads can allocate byte arrays on the large object heap. Large allocations impact server performance and scalability.

Advice for creating high-performance applications with large binary payloads:

posted @ 2022-06-09 14:28  papering  阅读(448)  评论(0编辑  收藏  举报