scrapy框架之comand line tool

一 Global Command

  1 startproject

  https://docs.scrapy.org/en/latest/topics/commands.html#startproject

  2 scrapy genspider -t basic baidu www.baidu.com

  https://docs.scrapy.org/en/latest/topics/commands.html#genspider

  3 settings

  https://docs.scrapy.org/en/latest/topics/commands.html#settings

  4 runspider

  https://docs.scrapy.org/en/latest/topics/commands.html#runspider

  5 shell

  https://docs.scrapy.org/en/latest/topics/commands.html#shell

In [1]: response
Out[1]: <200 https://www.baidu.com>

In [2]: request
Out[2]: <GET https://www.baidu.com>

In [3]: view(response)
Out[3]: True

 

  6 fetch

  https://docs.scrapy.org/en/latest/topics/commands.html#fetch

  7view 

  https://docs.scrapy.org/en/latest/topics/commands.html#view

  8 version

  https://docs.scrapy.org/en/latest/topics/commands.html#version

二 Project-only Command 

  1 crawl

  https://docs.scrapy.org/en/latest/topics/commands.html#crawl

  2 check

  https://docs.scrapy.org/en/latest/topics/commands.html#check

  3 list

  https://docs.scrapy.org/en/latest/topics/commands.html#list

  4 edit(没啥用)

  https://docs.scrapy.org/en/latest/topics/commands.html#edit

  5 parse

  https://docs.scrapy.org/en/latest/topics/commands.html#parse

  6 bench

  https://docs.scrapy.org/en/latest/topics/commands.html#bench

三 自定义命令

  官方文档

  https://docs.scrapy.org/en/latest/topics/commands.html#custom-project-commands

  定义一个类,继承ScrapyCommand,实现run方法。

 

四 添加命令行参数

  在命令行用crawl控制spider爬取的时候,加上-a选项,如

scrapy crawl WangyiSpider -a category=打车

  然后在 spider 的构造函数里加上带入的参数:

import scrapy
class WangyiSpider(BaseSpider):
    name = "Wangyi"
    def __init__(self, category=None, *args, **kwargs):
        super(WangyiSpider, self).__init__(*args, **kwargs)

 

posted @ 2018-04-17 20:19  骑者赶路  阅读(92)  评论(0编辑  收藏  举报