Scrapy 返回中文乱码

对于scrpay乱码的数据,刚开始在settings.py中配置了FEED_EXPORT_ENCODING = 'utf-8',发现还是不起作用,

于是想到了中间件,在请求返回的时候,对返回的内容进行转码处理

1
2
3
4
5
6
7
8
9
10
11
12
def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.
        # 修改页面编码为指定的utf-8格式
        # import pdb
        # pdb.set_trace()
        # print("当前的编码是:", response.encoding)
        # # return
        # # TextResponse(url=response.url,status=200,request=request,body=self.browser.page_source.encode('utf-8'))
        response = HtmlResponse(
            url=response.url, status=200, request=request, body=response.body,
            encoding='utf-8')
        return response

 在spider的custom_settings中的 DOWNLOADER_MIDDLEWARES中开启这个中间件即可对返回内容进行转码操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
custom_settings = {
       'ITEM_PIPELINES': {'AggProject.pipelines.AggprojectPipeline': 100},
       'DOWNLOADER_MIDDLEWARES': {
           'AggProject.middlewares.ZyZhanDownloaderMiddleware': 543,
       },
       'DEFAULT_REQUEST_HEADERS': {
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
           'Accept-Encoding': 'gzip, deflate',
           'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,en-GB;q=0.6',
           'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
       },
       'REDIRECT_ENABLED': True,
       'COOKIES_ENABLED': False,
       'DOWNLOAD_DELAY': 1.5,
       'CONCURRENT_REQUESTS': 6,
       'RETRY_ENABLED': True,
       'RETRY_TIMES': 2,
       'DEPTH_LIMIT': 3
   }

  

posted @   kakaok  阅读(356)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 一起来玩mcp_server_sqlite,让AI帮你做增删改查!!
点击右上角即可分享
微信分享提示