scrapy使用记录

 1 进入pip安装目录 

python -m pip install --upgrade pip

pip install Scrapy

2. 创建一个项目

scrapy startproject test

 

3. 验证是否可用

scrapy shell https://blog.csdn.net/oscer2016/article/details/78007472

view(response)会用浏览器打开网页

 

4 开始爬虫

scrapy crawl quotes

5 谷歌xpath-helper 可用验证xpath获取的内容

http://quotes.toscrape.com/

/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1] 获取所有的标题

response.xpath(‘/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1]‘).extract()

import scrapy

class FirstSpider(scrapy.Spider):
    name = 'first'
    start_urls = ['http://quotes.toscrape.com/page/2/']

    def parse(self, response):
        print(response)
        content_list =  response.xpath('/html/body/div/div[2]/div[1]/div[@class="quote"]/span[1]/text()').extract()
        author_list =  response.xpath('/html/body/div/div[2]/div[1]/div[@class="quote"]/span[2]/small/text()').extract()
        for i,j in zip(content_list,author_list):
            print(i,":",j)

 scrapy crawl first 

 

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
设置ua setting文件里面配置
posted @ 2018-04-11 12:26  brady-wang  阅读(304)  评论(0编辑  收藏  举报