scrapy使用记录

　1 进入pip安装目录

python -m pip install --upgrade pip

pip install Scrapy

2. 创建一个项目

scrapy startproject test

3. 验证是否可用

scrapy shell https://blog.csdn.net/oscer2016/article/details/78007472

view(response)会用浏览器打开网页

4 开始爬虫

scrapy crawl quotes

5 谷歌xpath-helper 可用验证xpath获取的内容

http://quotes.toscrape.com/

/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1] 获取所有的标题

response.xpath(‘/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1]‘).extract()

import scrapy

class FirstSpider(scrapy.Spider):
    name = 'first'
    start_urls = ['http://quotes.toscrape.com/page/2/']

    def parse(self, response):
        print(response)
        content_list =  response.xpath('/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1]/text()').extract()
        author_list =  response.xpath('/html/body/div/div[2]/div[1]/div[@class="quote"]/span[2]/small/text()').extract()
        for i,j in zip(content_list,author_list):
            print(i,":",j)

　scrapy crawl first　

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
设置ua setting文件里面配置

posted @ 2018-04-11 12:26 brady-wang 阅读(306) 评论(0) 编辑收藏举报

刷新页面返回顶部

风行天下

天地不仁以万物为刍狗

scrapy使用记录

公告