scrapy 爬虫顺序
创建项目
scrapy startproject weather
cd weather
scrapy genspider SZtianqi suzhou.tianqi.com
目录结构
├── scrapy.cfg
└── weather
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-36.pyc
│ ├── items.cpython-36.pyc
│ ├── pipelines.cpython-36.pyc
│ └── settings.cpython-36.pyc
├── data #我自己添加保存数据的文件夹,你们还没有,不用在意
│ └── weather.json
├── items.py
├── middlewares.py
├── pipelines.py
├── settings.py
└── spiders
├── SZtianqi.py
├── __init__.py
└── __pycache__
├── SZtianqi.cpython-36.pyc
└── __init__.cpython-36.pyc
item是存储需要爬取的字段设定
Spider: 数据请求处理部分
PIPELINE: 数据保存部分 选择保存到数据库还是文本文件
把PIPELINE编写的放到setting里面
BOT_NAME = 'weather'
SPIDER_MODULES = ['weather.spiders']
NEWSPIDER_MODULE = 'weather.spiders'
ITEM_PIPELINES = {'weather.pipelines.W2mysql': 300,
'weather.pipelines.WeatherPipeline': 500,
'weather.pipelines.W2json': 400}
ROBOTSTXT_OBEY = True
运行项目