Scrapy 设置代理IP并查看

1.设置代理可参考文章

 https://blog.csdn.net/qq_42712552/article/details/88906955

2.在middlewares.py文件中设置

  找到xxx_DownloaderMiddleware下载器中间件类,我创建的项目为scrapy_sample,所以名称为ScrapySampleDownloaderMiddleware,在process_request方面中设置,代码如下所示:

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.
        abuyun_proxy="http://xxxxx.com:9020"
        proxy_user=b"Hxxxxxxxxx"
        proxy_pass=b"48xxxxxxxx"
        #设置代理认证
        proxyAuth = "Basic " + base64.b64encode(proxy_user + b":" + proxy_pass).decode()
     #meta是一个字典类型 request.meta[
'proxy']=abuyun_proxy request.headers['Proxy-Authorization']=proxyAuth request.headers["Connection"] = "close" # Must either: # - return None: continue processing this request # - or return a Response object # - or return a Request object # - or raise IgnoreRequest: process_exception() methods of # installed downloader middleware will be called return None

3.激活下载器中间件

  在settings.py中找到DOWNLOADER_MIDDLEWARES,如下所示:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_sample.middlewares.ScrapySampleDownloaderMiddleware': 543,
}

4.验证

  使用shell命令来请求

F:\python_work\scrapy_sample> scrapy  shell  https://www.cnblogs.com/MrHSR/p/16386803.html

#查看请求头信息
In [1]: request.headers
Out[1]: 
{b'Proxy-Authorization': b'Basic xxxxx',
 b'Connection': b'close',
 b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 b'Accept-Language': b'en',
 b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36',
 b'Accept-Encoding': b'gzip, deflate'}

#查看meta信息
In [2]: request.meta['proxy']
Out[2]: 'http://xxxxx.com:9020'

 

posted on 2022-08-01 17:14  花阴偷移  阅读(551)  评论(0编辑  收藏  举报

导航