scrapy——终端持久化存储

笔记

- 基于终端指令:
    - 要求:只可以将parse方法的返回值存储到本地的文本文件中  scrapy crawl dou -o ./douban.csv
    - 注意:持久化存储的类型只可以是'json', 'jsonlines', 'jsonl', 'jl', 'csv', 'xml', 'marshal', 'pickle'这些文件
    - 好处:简洁高效
    - 缺点:局限性比较强(数据只可以存储到指定后缀的文本文件中)

代码

import scrapy
from douban.items import DoubanItem
class DouSpider(scrapy.Spider):
    name = "dou"
    #allowed_domains = ["www.douban.com"]
    start_urls = ["https://www.douban.com/doulist/113652271/"]
   #def parse(self, response):
   #    all_data = [] #存储所有解析到的数据
   #    div_ = response.xpath('/html/body/div[3]/div[1]/div/div[1]')
   #    div_list = div_.xpath('./div[contains(@class, "doulist-item")]')
   #    for div in div_list:
   #        # print(div)
   #        title = div.xpath('./div/div[2]/div[2]/a/text()')[0].extract()
   #        content = div.xpath('./div/div[2]/div[4]/text()').extract_first()
   #        dic = {
   #            'title':title,
   #            'content': content
   #        }
   #        all_data.append(dic)
   #    return all_data

 

posted @ 2024-03-02 12:04  会秃头的小白  阅读(3)  评论(0编辑  收藏  举报