当一个被采集的网页是开启压缩了的话,如果使用HtmlAgilityPack 的HtmlWeb默认配置去下载,下载回来的HTML代码是乱码,应该进行如下操作
HtmlWeb web = new HtmlWeb(); HtmlAgilityPack.HtmlWeb.PreRequestHandler handler = delegate(HttpWebRequest request) { request.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate"; request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip; request.CookieContainer = new System.Net.CookieContainer(); return true; }; web.PreRequest += handler; web.OverrideEncoding = Encoding.Default;
而如果仅仅只是网页的编码问题,则只需要配置这个参数:
web.OverrideEncoding = Encoding.Default;