scrapy 爬虫顺序

创建项目

scrapy startproject weather
cd weather
scrapy genspider SZtianqi suzhou.tianqi.com

目录结构

├── scrapy.cfg
└── weather
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-36.pyc
    │   ├── items.cpython-36.pyc
    │   ├── pipelines.cpython-36.pyc
    │   └── settings.cpython-36.pyc
    ├── data    #我自己添加保存数据的文件夹,你们还没有,不用在意
    │   └── weather.json
    ├── items.py
    ├── middlewares.py
    ├── pipelines.py
    ├── settings.py
    └── spiders
        ├── SZtianqi.py
        ├── __init__.py
        └── __pycache__
            ├── SZtianqi.cpython-36.pyc
            └── __init__.cpython-36.pyc

item是存储需要爬取的字段设定

Spider: 数据请求处理部分

PIPELINE: 数据保存部分 选择保存到数据库还是文本文件

把PIPELINE编写的放到setting里面

BOT_NAME = 'weather'

SPIDER_MODULES = ['weather.spiders']
NEWSPIDER_MODULE = 'weather.spiders'
ITEM_PIPELINES = {'weather.pipelines.W2mysql': 300,
                  'weather.pipelines.WeatherPipeline': 500,
                  'weather.pipelines.W2json': 400}

ROBOTSTXT_OBEY = True

运行项目

scrapy crawl Sutianqi

posted @ 2017-11-08 11:41  点||点  阅读(1252)  评论(0编辑  收藏  举报