复制代码

Scrapy 爬虫小案例

爬取网站:https://m.qingting.fm/rank/

技术:Scrapy 与 Xpath

使用Scrapy命令创建爬虫文件(不会创建scrapy项目创建的,请查看,上一篇文章)

scrapy genspider example example.com

成功创建如下图所示

 请求数据需要使用parse函数

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

这是很简单的xpath语法

完整代码如下:

复制代码
import scrapy
from scrapy import cmdline
from scrapy.http import HtmlResponse


class QingtingSpider(scrapy.Spider):
    name = "qingting"
    allowed_domains = ["m.qingting.fm"]
    start_urls = ["https://m.qingting.fm/rank/"]

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

if __name__ == '__main__':
    cmdline.execute('scrapy crawl qingting'.split())
复制代码

 

posted @   怪~咖  阅读(40)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
复制代码
点击右上角即可分享
微信分享提示