Python+requests 爬取网站遇到中文乱码解绝方案？

以上标题的文章，先前照搬别人的，没注意加原创链接，也找不到了！

现在觉得好的内容都会加上原创链接:https://www.2cto.com/kf/201207/142453.html

"""
字符串在Python内部的表示是unicode编码
因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，
再从unicode编码（encode）成另一种编码。

decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode('gb2312')，表示将gb2312编码的字符串str1转换成unicode编码。
encode的作用是将unicode编码转换成其他编码的字符串，如str2.encode('gb2312')，表示将unicode编码的字符串str2转换成gb2312编码。

因此，转码的时候一定要先搞明白，字符串str是什么编码，然后decode成unicode，然后再encode成其他编码

"""

header={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"}
yuan = requests.get("http://www.hebfda.gov.cn/CL0384/", timeout=30)
html = yuan.content.decode('utf-8','ignore')
htmls  =yuan.text.encode('gbk','ignore')
print(html)

posted @ 2017-11-16 23:39 诡道！！！阅读(504) 评论(0) 收藏举报

刷新页面返回顶部

诡道！！！

让我们将事前的忧虑，转换为思考，计划和行动

Python+requests 爬取网站遇到中文乱码解绝方案？

公告