Effective C# 原则34:创建大容量的Web API(译)
Effective C# 原则34:创建大容量的Web API
Item 34: Create Large-Grain Web APIs
交互协议的开销与麻烦就是对数据媒体的如何使用。在交互过程中可能要不同的使用媒体,例如在交流中要不同的使用电话号码,传真,地址,和电子邮件地址。让我们再回头来看看上次的订购目录,当你用电话订购时,你要回答售货员的一系列问题:
“你可以把第一项填一下吗?”
“这一项的号码是123-456”
"您想订购多少呢?"
"三件"
这样的问题一直要问到销售人员填写完所有的信息为止,例如还要知道你的订购地址,信用卡信息,运送地址,以及其它一些必须的信息来完成这比交易。在电话上完成这样一来一回的讨论还是令人鼓舞的。因为你不会是一个人长时间的自言自语,而且你也不会长时间忍受销售人员是否还要哪里的安静状态。
与传真订购相比,你要填写整个订购文档,然后把整个文档发给公司。一个文件一次性传输完成,你不用很填写产品编号,发传真,然后填写地址,然后再传真,填写信用卡号,然后再发传真。
这里演示了一个定义糟糕的web方法接口会遇到的常见缺陷。当你使用web服务,或者.Net远程交互时,你必须记住:最昂贵的开销是在两台远程机器之间进行对象传输时出现。你不应该只是通过重新封装一下原来在本地计算机上使用的接口来创建远程API。虽然这样是可以工作的,但效率是很低的。
这就有点类似是用电话的方式来完成用传真订购的任务。你的应用程序大部份时间都在每次向信道上发送一段数据后等待网络。使用越是小块的API,应用程序在等待服务器数据返回的时间应用比就更高。
相反,我们在创建基于web的接口时,应该把服务器与客户端的一系列对象进行序列化,然后基于这个序列化后的文档进行传输。你的远程交流应该像用传真订购时使用的表单一样:客户端应该有一个不与服务器进行通信的扩展运行时间段。这时,当所用的信息已经填写完成时,用户就可以一次性的提交这个文档到服务器上。服务器上还是做同样的事情:当服务器上返回到客户上的信息到达时,客户的手头上就得到了完成订购任务必须的所有信息。
比喻说我们要粘贴一个客户订单,我们要设计一个客户的订购处理系统,而且它要与中心服务器和桌面用户通过网络访问信息保持一致。系统其中的一个类就是客户类。如果你忽略传输问题,那么客户类可能会像这样设计,这充许用户取回或者修改姓名,运输地址,以及账号信息:
public class Customer
{
public Customer( )
{
}
// Properties to access and modify customer fields:
public string Name
{
// get and set details elided.
}
public Address shippingAddr
{
// get and set details elided.
}
public Account creditCardInfo
{
// get and set details elided.
}
}
这个客户类不包含远程调用的API,在服务器和客户之间调用一个远程的用户会产生严重的交通阻塞:
// create customer on the server.
Customer c = new Server.Customer( );
// round trip to set the name.
c.Name = dlg.Name.Text;
// round trip to set the addr.
c.shippingAddr = dlg.Addr;
// round trip to set the cc card.
c.creditCardInfo = dlg.credit;
相反,你应该在本机创建一个完整的客户对象,然后等用户填写完所有的信息后,再输送这个客户对象到服务器:
// create customer on the client.
Customer c = new Customer( );
// Set local copy
c.Name = dlg.Name.Text;
// set the local addr.
c.shippingAddr = dlg.Addr;
// set the local cc card.
c.creditCardInfo = dlg.credit;
// send the finished object to the server. (one trip)
Server.AddCustomer( c );
这个客户的例子清楚简单的演示了这个问题:在服务器与客户端之间一来一回的传输整个对象。但为了写出高效的代码,你应该扩展这个简单的例子,应该让它包含正确的相关对象集合。在远程请求中,使用对象的单个属性就是使用太小的粒子(译注:这里的粒子就是指一次交互时所包含的信息量)。但,对于每次在服务器与客户之间传输来说,一个客户实例可能不是大小完全正确的粒子。
让我们来再扩展一下这个例子,让它更接近现实设计中会遇到的一些问题,我们再对系统做一些假设。这个软件主要支持一个拥有1百万客户的在线卖主。假设每个用户有一个订购房子的主要目录,平均一点,去年有15个订单。
每个电话接线员使用一台机器轮班操作,而且不管电话订单者是否回答电话,他们都要查找或者创建这条订单记录。你的设计任务是决定大多数在客户和服务器之间传输的高效对象集合。
你一开始可能消除一些显而易见的选择,例如取回每一个客户以及每次的订单信息是应该明确禁止的:1百万客户以及15百万(1千5百万)订单记录显然是太大了而不应该反回到做一个客户那里去。这样很容易在另一个用户上遇到瓶颈问题。在每次可能要更新数据时,都会给服务器施加轰炸式打击,你要发送一个包含15百万对象的请求。当然,这只是一次事务,但它确实太低效了。
相反,考虑如何可以最好的取回一个对象的集合,你可以创建一个好的数据集合代理,处理一些在后来几分钟一定会使用的对象。一个接线员回复一个电话,而且可能对某个客户有兴趣。在电话交谈的过程中,接线员可能添加或者移除订单,修改订单,或者修改一个客户的账号信息。明显的选择就是取回一个客户,以及这个用户的所有订单。服务器上的方法可能会是这样的:
public OrderData FindOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
}
对的吗?传送到客户而且客户已经接收到的订单很可能在客户机上是不须要的。一个更好的做法就是为每个请求的用户只取回一条订单。服务器的方法可能修改成这个样子:
public OrderData FindOpenOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
// Filter out those that have already
// been received.
}
这样你还是要让客户机为每个电话订单创建一个新的请求。有一个方法来优化通信吗?比下载用户包含的所有订单更好的方法。我们会在业务处理中添加一些新的假设,从而给你一些方法。假设呼叫中心是分布的,这样每个工作组收到的电话具有不同的区号。现在你就可以修改你的设计了,从而对交互进行一个不小的优化。
每个区域的接线员可能在一开始轮班时,就取回并且更新客户以及订单信息。在每次电话后,客户应用程序应该把修改后的数据返回到服务上,而且服务器应该响应上次客户请求数据以后的所有修改。结果就是,在每次电话后,接线员发送所有的修改,这些修改包含这个组中其它接线员所做的所有修改。这样的设计就是说,每一个电话只有一次会话,而且每一个接线员应该在每次回复电话时,手里有数据集合访问权。这样服务器上可能就有两个这样的方法:
public CustomerSet RetrieveCustomerData(
AreaCode theAreaCode )
{
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer.
// Filter out those that have already
// been received.
// Return the result.
}
public CustomerSet UpdateCustomer( CustomerData
updates, DataTime lastUpdate, AreaCode theAreaCode )
{
// First, save any updates, marking each update
// with the current time.
// Next, get the updates:
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer that have been
// updated since the last time. Add those to the result.
// Return the result.
}
但这样可能还是要浪费一些带宽。当每个已知客户每天都有电话时,最后一个设计是最有效。但这很可能是不对的。如果是的,那么你的公司应该在客户服务上存在很大的问题,而这个问题应该用软件是无法解决的。
如何更进一步限制传输大小呢,要求不增加会话次数和及服务器的响应延时?你可以对数据库里的一些准备打电话的客户进行一些假设。你可以跟踪一些统计表,然后可以发现,如果一些客户已经有6个月没有订单了,那么他们很可能就不会再有订单了。这时你就应该在那一天的一开始就停止取回这些客户以及他们的订单。这可以收缩传输的初始大小,你同样可以发现,很多客户在通过一个简短电话下了订单过后,经常会再打电话来询问上次订单的事。因此,你可以修改订单列表,只传输最后的一些订单而不是所有的订单。这可能不用修改服务器上的方法签名,但这会收缩传输给客户上的包的大小。
这些假设的讨论焦点是要给你一些关于远程交互的想法:你减少两机器间的会话频率和会话时数据包的大小。这两个目标是矛盾的,你要在这两者中做一个平衡的选择。你应该取两个极端的中点,而不是错误的选择过大,或者过小的会话。
============================
Item 34: Create Large-Grain Web APIs
The cost and inconvenience of a communication protocol dictates how you should use the medium. You communicate differently using the phone, fax, letters, and email. Think back on the last time you ordered from a catalog. When you order by phone, you engage in a question-and-answer session with the sales staff:
"Can I have your first item?"
"Item number 123-456."
"How many would you like?"
"Three."
This conversation continues until the sales staff has your entire order, your billing address, your credit-card information, your shipping address, and any other information necessary to complete the transaction. It's comforting on the phone to have this back-and-forth discussion. You never give long soliloquies with no feedback. You never endure long periods of silence wondering if the salesperson is still there.
Contrast that with ordering by fax. You fill out the entire document and fax the completed document to the company. One document, one transaction. You do not fill out one product line, fax it, add your address, fax again, add your credit number, and fax again.
This illustrates the common pitfalls of a poorly defined web method interface. Whether you use a web service or .NET Remoting,you must remember that the most expensive part of the operation comes when you transfer objects between distant machines. You must stop creating remote APIs that are simply a repackaging of the same local interfaces that you use. It works, but it reeks of inefficiency. It's using the phone call metaphor to process your catalog request via fax. Your application waits for the network each time you make a round trip to pass a new piece of information through the pipe. The more granular the API is, the higher percentage of time your application spends waiting for data to return from the server.
Instead, create web-based interfaces based on serializing documents or sets of objects between client and server. Your remote communications should work like the order form you fax to the catalog company: The client machine should be capable of working for extended periods of time without contacting the server. Then, when all the information to complete the transaction is filled in, the client can send the entire document to the server. The server's responses work the same way: When information gets sent from the server to the client, the client receives all the information necessary to complete all the tasks at hand.
Sticking with the customer order metaphor, we'll design a customer order-processing system that consists of a central server and desktop clients accessing information via web services. One class in the system is the customer class. If you ignore the transport issues, the customer class might look something like this, which allows client code to retrieve or modify the name, shipping address, and account information:
public class Customer
{
public Customer( )
{
}
// Properties to access and modify customer fields:
public string Name
{
// get and set details elided.
}
public Address shippingAddr
{
// get and set details elided.
}
public Account creditCardInfo
{
// get and set details elided.
}
}
The customer class does not contain the kind of API that should be called remotely. Calling a remote customer results in excessive traffic between the client and the server:
// create customer on the server.
Customer c = new Server.Customer( );
// round trip to set the name.
c.Name = dlg.Name.Text;
// round trip to set the addr.
c.shippingAddr = dlg.Addr;
// round trip to set the cc card.
c.creditCardInfo = dlg.credit;
Instead, you would create a local Customer object and transfer the Customer to the server after all the fields have been set:
// create customer on the client.
Customer c = new Customer( );
// Set local copy
c.Name = dlg.Name.Text;
// set the local addr.
c.shippingAddr = dlg.Addr;
// set the local cc card.
c.creditCardInfo = dlg.credit;
// send the finished object to the server. (one trip)
Server.AddCustomer( c );
The customer example illustrates an obvious and simple example: transfer entire objects back and forth between client and server. But to write efficient programs, you need to extend that simple example to include the right set of related objects. Making remote invocations to set a single property of an object is too small of a granularity. But one customer might not be the right granularity for transactions between the client and server, either.
To extend this example into the real-world design issues you'll encounter in your programs, we'll make a few assumptions about the system. This software system supports a major online vendor with more than 1 million customers. Imagine that it is a major catalog ordering house and that each customer has, on average, 15 orders in the last year. Each telephone operator uses one machine during the shift and must lookup or create customer records whenever he or she answers the phone. Your design task is to determine the most efficient set of objects to transfer between client machines and the server.
You can begin by eliminating some obvious choices. Retrieving every customer and every order is clearly prohibitive: 1 million customers and 15 million order records are just too much data to bring to each client. You've simply traded one bottleneck for another. Instead of constantly bombarding your server with every possible data update, you send the server a request for more than 15 million objects. Sure, it's only one transaction, but it's a very inefficient transaction.
Instead, consider how you can best retrieve a set of objects that can constitute a good approximation of the set of data that an operator must use for the next several minutes. An operator will answer the phone and be interacting with one customer. During the course of the phone call, that operator might add or remove orders, change orders, or modify a customer's account information. The obvious choice is to retrieve one customer, with all orders that have been placed by that customer. The server method would be something like this:
public OrderData FindOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
}
Or is that right? Orders that have been shipped and received by the customer are almost certainly not needed at the client machine. A better answer is to retrieve only the open orders for the requested customer. The server method would change to something like this:
public OrderData FindOpenOrders( string customerName )
{
// Search for the customer by name.
// Find all orders by that customer.
// Filter out those that have already
// been received.
}
You are still making the client machine create a new request for each customer phone call. Are there ways to optimize this communication channel more than including orders in the customer download? We'll add a few more assumptions on the business processes to give you some more ideas. Suppose that the call center is partitioned so that each working team receives calls from only one area code. Now you can modify your design to optimize the communication quite a bit more.
Each operator would retrieve the updated customer and order information for that one area code at the start of the shift. After each call, the client application would push the modified data back to the server, and the server would respond with all changes since the last time this client machine asked for data. The end result is that after every phone call, the operator sends any changes made and retrieves all changes made by any other operator in the same work group. This design means that there is one transaction per phone call, and each operator should always have the right set of data available when he or she answers a call. Now the server contains two methods that would look something like this:
public CustomerSet RetrieveCustomerData(
AreaCode theAreaCode )
{
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer.
// Filter out those that have already
// been received.
// Return the result.
}
public CustomerSet UpdateCustomer( CustomerData
updates, DataTime lastUpdate, AreaCode theAreaCode )
{
// First, save any updates, marking each update
// with the current time.
// Next, get the updates:
// Find all customers for a given area code.
// Foreach customer in that area code:
// Find all orders by that customer that have been
// updated since the last time. Add those to the result.
// Return the result.
}
But you might still be wasting some bandwidth. Your last design works best when every known customer calls every day. That's probably not true. If it is, your company has customer service problems that are far outside of the scope of a software program.
How can we further limit the size of each transaction without increasing the number of transactions or the latency of the service rep's responsiveness to a customer? You can make some assumptions about which customers in the database are going to place calls. You track some statistics and find that if customers go six months without ordering, they are very unlikely to order again. So you stop retrieving those customers and their orders at the beginning of the day. That shrinks the size of the initial transaction. You also find that any customer who calls shortly after placing an order is usually inquiring about the last order. So you modify the list of orders sent down to the client to include only the last order rather than all orders. This would not change the signatures of the server methods, but it would shrink the size of the packets sent back to the client.
This hypothetical discussion focused on getting you to think about the communication between remote machines: You want to minimize both the frequency and the size of the transactions sent between machines. Those two goals are at odds, and you need to make trade-offs between them. You should end up close to the center of the two extremes, but err toward the side of fewer, larger transactions.
/\_/\
(=^o^=) Wu.Country@侠缘
(~)@(~) 一辈子,用心做一件事!
--------------------------------
学而不思则罔,思而不学则怠!
================================
posted on 2007-03-25 22:25 Wu.Country@侠缘 阅读(856) 评论(0) 编辑 收藏 举报