关于StreamReader.ReadToEnd方法
以前写抓取网页的代码喜欢用ReadToEnd,因为简单省事,后来发现,在爬取网页的时候,如果网速很慢,ReadToEnd超时的几率很大。使用Read改写后,超时几率大大减小,完整代码如下:
/// <summary> /// HttpPost /// </summary> public static string HttpPost(string url, string data) { byte[] bArr = ASCIIEncoding.UTF8.GetBytes(data); // 设置参数 HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; request.CookieContainer = m_Cookie; request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = bArr.Length; request.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)"; Stream postStream = request.GetRequestStream(); postStream.Write(bArr, 0, bArr.Length); postStream.Close(); //发送请求并获取相应回应数据 HttpWebResponse response = request.GetResponse() as HttpWebResponse; //直到request.GetResponse()程序才开始向目标网页发送Post请求 Stream responseStream = response.GetResponseStream(); //返回结果网页(html)代码 MemoryStream memoryStream = new MemoryStream(); bArr = new byte[1024]; int size = responseStream.Read(bArr, 0, (int)bArr.Length); while (size > 0) { memoryStream.Write(bArr, 0, size); size = responseStream.Read(bArr, 0, (int)bArr.Length); Thread.Sleep(1); } string content = Encoding.UTF8.GetString(memoryStream.ToArray()); return content; }
代码中Thread.Sleep(1);也可以去掉。