Scrapy定制起始请求
Scrapy引擎来爬虫中取起始的URL
1、调用start_requests方法(父类),并获取返回值
2、将放回值变成迭代器,通过iter()
3、执行__next__()方法取值
4、把返回值全部放到调度器中
在爬虫类中重写start_requests方法
from scrapy import Request, Spider from urllib.parse import quote class XXSpider(Spider): name = 'XX' allowed_domains = ['www.xx.com'] base_url = 'https://xx.com/search?q=' def start_requests(self): for key in selector.settings.get('KEYWORDS'): for page in range(1, self.settings.get('MAX_PAGE') + 1): url = self.base_url + quote(key) yield Request(url=url, callback=self.parse, meta={'page': page}, dont_filter=True)
注意:原来的start_urls要删除