解决下载经过GZip压缩后的网页乱码问题
目前很多网站默认采用GZip压缩,如果不进行解压缩,下载后生成的html页面打开后会出现中文乱码
乱码前:
string url = "http://quote.eastmoney.com/stocklist.html"; using (var client = new HttpClient()) { client.BaseAddress = new Uri(url); var response = client.GetAsync(url).Result; var content = response.Content.ReadAsStringAsync().Result; File.WriteAllText(@"C:\stock.html", content, Encoding.Default); }
乱码效果:
解决代码:
string url = "http://quote.eastmoney.com/stocklist.html"; using (var client = new HttpClient()) { client.BaseAddress = new Uri(url); //关键代码1:设置请求头采用GZip和deflate两种压缩算法 client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate"); var response = client.GetAsync(url).Result; var fileStream = response.Content.ReadAsStreamAsync().Result; //关键代码2:对文件流采用GZip算法解压 GZipStream gzip = new GZipStream(fileStream, CompressionMode.Decompress); using (StreamReader reader = new StreamReader(gzip, Encoding.GetEncoding("gb2312")))//中文编码处理 { File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.Default); } }
解决后效果:
乱码有的时候不能单单靠转File.WriteAllText(@"C:\stock.html", reader.ReadToEnd(), Encoding.GetEncoding("gb2312"));方式解决,具体情况具体分析,思维多发散发散。
作者:江宁织造
博客:http://www.cnblogs.com/wgx0428/
博客:http://www.cnblogs.com/wgx0428/