Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

Scrapy Architecture

 

 

 Creating a Spider.

  Spiders are classes that you define that Scrapy uses to scrape(extract) information from a website(s).

复制代码
import scrapy

class QuoteSpider(scrapy.Spider):
    name = "quote"
    start_urls = [
        'https://bluelimelearning.github.io/my-fav-quotes/'
    ]

    def parse(self, response):
        for quote in response.css('div.quotes'):
            yield{
                'quote':quote.css('p.aquote::text').extract(),
                'author':quote.css('p.author::text').extract_first(),
            }
复制代码

 

 

Running your spider and saving scrapped data.

scrapy runspider quotes_spiders.py -o quotes.xml

 

 

 

 

https://www.cleancss.com/strip-xml/

 

 

Scraping data with Scrapy Shell

scrapy shell "https://bluelimelearning.github.io/my-fav-quotes/"

 

 

 

 

 

 

response.css('title')

 

 

 

response.css('title::text').extract()

 

 

response.css('h1::text').extract()

 

 

quote = response.css("div.quotes")[0]
aquote = quote.css("p.aquote::text").extract()
aquote

 

posted @   晨风_Eric  阅读(275)  评论(0编辑  收藏  举报
编辑推荐:
· 自定义通信协议——实现零拷贝文件传输
· Brainfly: 用 C# 类型系统构建 Brainfuck 编译器
· 智能桌面机器人:用.NET IoT库控制舵机并多方法播放表情
· Linux glibc自带哈希表的用例及性能测试
· 深入理解 Mybatis 分库分表执行原理
阅读排行:
· DeepSeek 全面指南,95% 的人都不知道的9个技巧(建议收藏)
· 自定义Ollama安装路径
· 本地部署DeepSeek
· 快速入门 DeepSeek-R1 大模型
· DeepSeekV3+Roo Code,智能编码好助手
点击右上角即可分享
微信分享提示