scrapy 自定义扩展
1、新建一个扩展文件,定义一个类,必须包含from_crawler方法:
from scrapy import signals class MyExtend: def __init__(self, crawler): self.crawler = crawler # 给钩子挂操作 crawler.signals.connect(self.start, signals.engine_started) @classmethod def from_crawler(cls, crawler): return cls(crawler) def start(self): # 自定义操作 print('signals.engine_started')
2、设置settings
EXTENSIONS = { 'day96.extensions.MyExtend': 300, }
3、可以挂钩子的地方
# 引擎开始运行的时候 engine_started = object() # 引擎结束运行的时候 engine_stopped = object() spider_opened = object() spider_idle = object() spider_closed = object() spider_error = object() request_scheduled = object() request_dropped = object() response_received = object() response_downloaded = object() # yield Item的时候 item_scraped = object() # Item丢弃的时候 item_dropped = object()