并行 Webclient(一)
2018-09-11 16:28 音乐让我说 阅读(346) 评论(0) 编辑 收藏 举报在 Stackoverflow 上看到了一个提问,关于并行的 WebClient,觉得回答者的代码很有参考性,下面记录一下,以便日后用到:
html=RetriveHTML(int index); returnColection = RegexProcess(html, index);
通常我用最多20000个索引来调用它。第一个subfuntcion是网络相关的(使用webclient.downloadstring从一个服务器获取几个URL HTML),第二个子功能主要是CPU。
我迷失在并行foreach和Tasks(继续,继续,fromasync)世界,我遇到麻烦来解决问题。我首先尝试使用Parallel foreach,但是我发现其性能即网络I / O在连续调用时会降级(第一个循环很快,其他循环变慢)。解决方案将释放html对象,因为它们很多很大。我正在使用.net 4.0
class Program { static void Main(string[] args) { ProcessInParallell(); } private static Regex _regex = new Regex("net"); private static void ProcessInParallell() { Uri[] resourceUri = new Uri[] { new Uri("http://www.microsoft.com"), new Uri("http://www.google.com"), new Uri("http://www.amazon.com") }; //1. Stage 1: Download HTML //Use the blocking collection for concurrent tasks BlockingCollection<string> htmlDataList = new BlockingCollection<string>(); Parallel.For(0, resourceUri.Length, index => { var html = RetrieveHTML(resourceUri[index]); htmlDataList.TryAdd(html); //If we reach to the last index, signal the completion if (index == (resourceUri.Length - 1)) { htmlDataList.CompleteAdding(); } }); //2. Get matches //This concurrent bags will be used to store the result of the matching stage ConcurrentBag<string> matchedHtml = new ConcurrentBag<string>(); IList<Task> processingTasks = new List<Task>(); //Enumerate through each downloaded HTML document foreach (var html in htmlDataList.GetConsumingEnumerable()) { //Create a new task to match the downloaded HTML var task = Task.Factory.StartNew((data) => { var downloadedHtml = data as string; if (downloadedHtml == null) return; if (_regex.IsMatch(downloadedHtml)) { matchedHtml.Add(downloadedHtml); } },html); //Add the task to the waiting list processingTasks.Add(task); } //wait for the all tasks to complete Task.WaitAll(processingTasks.ToArray()); foreach (var html in matchedHtml) { //Do something with the matched result } } private static string RetrieveHTML(Uri uri) { using (WebClient webClient = new WebClient()) { //set this to null if there is no proxy webClient.Proxy = null; byte[] data = webClient.DownloadData(uri); return Encoding.UTF8.GetString(data); } } }
作者:音乐让我说(音乐让我说 - 博客园)