使用scrapy-redis搭建分布式爬虫环境
详细内容见原文:https://www.cnblogs.com/pythoner6833/p/9148937.html
在settings文件中需要添加5项:
1.DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
2.SCHEDULER = "scrapy_redis.scheduler.Scheduler"
3.SCHEDULER_PERSIST = True
4.ITEM_PIPELINES = {
'scrapy_redis.piplines.RedisPipeline':100,
}
5.REDIS_URL = "redis://127.0.0.1:6379"
或者写成:REDIS_HOST='127.0.0.1'
REDIS_PORT=6379