网站更新内容:请访问: https://bigdata.ministep.cn/

scrapy多个爬虫并行运行

有的时候需要在scrapy工程里面运行多个爬虫,试用了网上两种方法。

运行环境:scrapy:2.3.0 + python3.8

第一种:

# coding:utf-8

from scrapy import cmdline
cmdline.execute("scrapy crawl spider1".split())
cmdline.execute("scrapy crawl spider2".split())

无效,只能运行第一个。

 

运行多个

# -*- coding: utf-8 -*-
import scrapy
from scrapy import cmdline
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())
process.crawl("gov_mof_tuwenzhibo")  # scrapy项目中spider的name值
process.crawl("gov_mof_caizhengxinwen")  # scrapy项目中spider的name值

process.start()

 

特别说明:

from scrapy.utils.project import get_project_settings 这个为scrapy自带的,非自建的,直接引用即可;


参考:

[python - Running Multiple spiders in scrapy for 1 website in parallel? - Stack Overflow](https://stackoverflow.com/questions/39365131/running-multiple-spiders-in-scrapy-for-1-website-in-parallel)

[backtest/spider_runner.py at e5e7af64fac54fdd57033deaae8650461442e4b7 · futurecoming/backtest](https://github.com/futurecoming/backtest/blob/e5e7af64fac54fdd57033deaae8650461442e4b7/utils/spider_runner.py)

posted @ 2022-06-20 19:20  ministep88  阅读(297)  评论(0编辑  收藏  举报
网站更新内容:请访问:https://bigdata.ministep.cn/