Windows Azure 应用程序短暂性故障处理

　　这两天在做一个Windows Azure blob存储备份的的一个小功能，但是每次使用CloudBlockBlob.UploadFromStream上传本地文件到Blob Storage，总是不成功报出一个“Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host."的异常来。在网上Google了一下，听有人说是上传的文件太大了，把FileStream分段上传就OK了，于是做了如此尝试，但是发现仍然报出了同样的错误。一时觉得无解了，MSDN上一个偶然的发现，问题突然间有了转机。

　　“Transient Fault Handling”，也就是偶然发现的关键字，中文姑且翻译成“短暂性故障处理”。

　　那什么是“Transient Faults”?

　　MSDN大致的定义是基于云的应用使用基于云的相关服务，往往会因为网络的问题、或是间歇性的服务基础水平的错误等临时性条件导致一些错误，常常在一段时间之后又会恢复正常。比如说SQL Azure可能会因为过度的资源使用、长时间的工作、因为失效切换到备用的SQL Azure、或者是负载平衡的考虑、网络的不佳等原因而短暂的限制database的连接，甚至使中断已有的连接。对于这类临时性的错误，把它叫做“Transient Faults”。对于这种错误可以通过少量的重试来解决。

　　对于这种Transient Faults，我们一般使用Retry Policy来缓和它（这里不能保证一定可以解决，只是在一定程度上减轻错误的出现频率，是你的程序更加健壮）。怎么样使用Retry Policy首先我们看一张图。

　　从图中可以看出Retry Policy是由Detection strategy和Retry Strategy结合起来的，它通过调用ExecuteAction方法去使用你想要的云服务，ExecuteAction方法包裹你使用云服务调用的具体的方法，后边会有代码的演示。

　　Detection strategy能够识别可能导致transient fault的Exceptions，主要针对以下几个服务。　

SQL Azure
Windows Azure Service Bus
Windows Azure Storage Service
Windows Azure Caching Service

　　接下来说一下这个Retry strategy，按照种类，这里分成了三种，Example给出了每种Retry strategy使用的时间间隔，Retry的次数都是5次。

Retry strategy	Example (intervals between retries in seconds)
Fixed interval	2,2,2,2,2,2
Incremental intervals	2,4,6,8,10,12
Random exponential back-off intervals	2, 3.755, 9.176, 14.306, 31.895

　　下边我以上传文件到Blob-Store为例，来具体讲述如何使用Retry Policy来缓解transient fault，来增强Windows Azure云应用的健壮性、稳定性。

　　首先，加入Transient Fault Handling Application Block assemblies的引用。

　　选中Solution右击工程节点，点弹出菜单中选中"Manage NuGet Packages", 在弹出窗口中选中“Online”，然后在“Search Online”中输入“topaz”，点击安装“Enterprise Library 5.0 - Transient Fault Handing Application Block” package. 这样我们就可以导入需要用到的命名空间了。

　　我们可以在代码或是Application Configuration File里边定义retry policies，如果你只是一个小的程序，调用retry logic的次数不多，你可以直接在代码里边定义，反之在配置文件里边定义。我们的示例是直接在程序里定义。如何在Configuration 里边配置可以点击这里查看。

　　太啰嗦了，直接上代码。

            // Define your retry strategy: retry 5 times, starting 1 second apart
            // and adding 2 seconds to the interval each retry.
            var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));

            // Define your retry policy using the retry strategy and the Windows Azure storage
            // transient fault detection strategy.
            var retryPolicy = new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);

            // Receive notifications about retries.
            retryPolicy.Retrying += (sender, args) =>
                {
                    // Log details of the retry.
                    var msg = String.Format("Retry - Count:{0}, Delay:{1}, Exception:{2}",
                        args.CurrentRetryCount, args.Delay, args.LastException);

                    Trace.WriteLine(msg, "Information");
                };

            try
            {
                retryPolicy.ExecuteAction(
                    () =>
                    {
                        // Call a method that uses Windows Azure storage and which may
                        // throw a transient exception.
                        backupBlob.UploadFromStream(fileStream);
                    }
                );
            }
            catch (Exception ex)
            {
                Trace.WriteLine(ex, "Information");
            }

　　这样当UploadFromStream报Transient Fault时，Retry Policy会每隔2S重新调用此方法，直到调用成功，或者超过所定义的5次尝试。我自己的blob-store备份本地文件，在未使用之前UploadFromStream尝试了8~9次都没有成功，使用了之后Retry了1次就成功了，不排除是因为上传的文件过大，或是网络的原因，总之它是的程序更加健壮了。

　　最后，说一下那些情况可以用Transient Fault Handling。

　　Detection strategy中提到的四个Windows Azure Services，如果你的应用使用了它们，你可以使用Transient Fault Handling。

　　还有一种情况是使用你自定义的Service也可以使用Transient Fault Handling，具体的使用方法可以点击这里参考

　　附上资料参考的来源：

　　http://msdn.microsoft.com/en-us/library/hh680901(v=pandp.50).aspx

posted @ 2014-02-20 10:30 wuminxss 阅读(380) 评论(0) 收藏举报

刷新页面返回顶部

wuminxss

Windows Azure 应用程序短暂性故障处理

公告