scrapy多个爬虫并行运行
有的时候需要在scrapy工程里面运行多个爬虫,试用了网上两种方法。
运行环境:scrapy:2.3.0 + python3.8
第一种:
# coding:utf-8
from scrapy import cmdline
cmdline.execute("scrapy crawl spider1".split())
cmdline.execute("scrapy crawl spider2".split())
无效,只能运行第一个。
运行多个
# -*- coding: utf-8 -*- import scrapy from scrapy import cmdline from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings process = CrawlerProcess(get_project_settings()) process.crawl("gov_mof_tuwenzhibo") # scrapy项目中spider的name值 process.crawl("gov_mof_caizhengxinwen") # scrapy项目中spider的name值 process.start()
特别说明:
from scrapy.utils.project import get_project_settings 这个为scrapy自带的,非自建的,直接引用即可;
参考:
[python - Running Multiple spiders in scrapy for 1 website in parallel? - Stack Overflow](https://stackoverflow.com/questions/39365131/running-multiple-spiders-in-scrapy-for-1-website-in-parallel)
[backtest/spider_runner.py at e5e7af64fac54fdd57033deaae8650461442e4b7 · futurecoming/backtest](https://github.com/futurecoming/backtest/blob/e5e7af64fac54fdd57033deaae8650461442e4b7/utils/spider_runner.py)