Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

Scrapy Architecture

Creating a Spider.

　　Spiders are classes that you define that Scrapy uses to scrape(extract) information from a website(s).

import scrapy

class QuoteSpider(scrapy.Spider):
    name = "quote"
    start_urls = [
        'https://bluelimelearning.github.io/my-fav-quotes/'
    ]

    def parse(self, response):
        for quote in response.css('div.quotes'):
            yield{
                'quote':quote.css('p.aquote::text').extract(),
                'author':quote.css('p.author::text').extract_first(),
            }

Running your spider and saving scrapped data.

scrapy runspider quotes_spiders.py -o quotes.xml

https://www.cleancss.com/strip-xml/

Scraping data with Scrapy Shell

scrapy shell "https://bluelimelearning.github.io/my-fav-quotes/"

response.css('title')

response.css('title::text').extract()

response.css('h1::text').extract()

quote = response.css("div.quotes")[0]
aquote = quote.css("p.aquote::text").extract()
aquote

posted @ 2019-11-04 21:34 晨风_Eric 阅读(275) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· DeepSeek 全面指南，95% 的人都不知道的9个技巧（建议收藏）
· 自定义Ollama安装路径
· 本地部署DeepSeek
· 快速入门 DeepSeek-R1 大模型
· DeepSeekV3+Roo Code，智能编码好助手

公告

昵称：晨风_Eric
园龄： 13年3个月
粉丝： 6
关注： 1

+加关注

2025年2月

日

一

二

三

四

五

六

一蓑烟雨

该面对的绝不逃避，该执著的永不怨悔，该舍弃的不再留念，该珍惜的好好把握。

Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

文章分类

相册

阅读排行榜

评论排行榜

最新评论