随笔档案「2022年11月」 - kuba8

scrapy采集—爬取中文乱码，gb2312转为utf-8

摘要：有段时间没怎么使用scrapy了，最近采集一个网页，发现网页编码是gb2312, 一开始就取搜索了下，发现各种操作都有，有在settings中设置 # FEED_EXPORT_ENCODING = 'utf-8'FEED_EXPORT_ENCODING = 'GB2312' 有在spider中设置r 阅读全文

posted @ 2022-11-24 12:30 kuba8 阅读(378) 评论(0) 推荐(0)

scrapy爬取后中文乱码,解决word转为html 时cp1252编码问题

摘要：解决思路1、循环暴力寻找编码，但是不如思路3 def parse(self, response): print(response.text[:100]) body = response.body#直接是bytes,response.text是str encodings = ['utf-8', 'g 阅读全文

posted @ 2022-11-23 14:45 kuba8 阅读(326) 评论(0) 推荐(0)

scrapy xpath遇见乱码解决

摘要：首先查看页面的编码模式 response.encoding 显示为'cp1252' response.xpath("//title/text()").getall()[0].encode('cp1252').decode('gbk') 解决。阅读全文

posted @ 2022-11-23 11:15 kuba8 阅读(301) 评论(0) 推荐(0)

kuba8

11 2022 档案

公告