代理操作
下载中间件作用: 拦截请求,可以将请求的ip进行更换
流程:
1.下载中间件类的自制定
a) object
b) 重写process_request(self, request, spider)的方法
2.配置文件中进行下载中间价的开启
middlewares.py
# -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in: # https://doc.scrapy.org/en/latest/topics/spider-middleware.html from scrapy import signals class middleadd(object): def process_request(self, request, spider): request.meta["proxy"] = "157.65.31.220:3128"
settings.py里开启中间件
spider/midtest.py
import scrapy class MidtestSpider(scrapy.Spider): name = 'midtest' # allowed_domains = ['www.baidu.com'] start_urls = ["https://www.baidu.com/s?wd=ip"] def parse(self, response): fp = open("record.html", "w",encoding="utf-8") fp.write(response.text)
获取免费代理从 www.goubanjia.com