Scrapy 爬虫小案例

爬取网站：https://m.qingting.fm/rank/

技术：Scrapy 与 Xpath

使用Scrapy命令创建爬虫文件（不会创建scrapy项目创建的，请查看，上一篇文章）

scrapy genspider example example.com

成功创建如下图所示

请求数据需要使用parse函数

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

这是很简单的xpath语法

完整代码如下：

import scrapy
from scrapy import cmdline
from scrapy.http import HtmlResponse


class QingtingSpider(scrapy.Spider):
    name = "qingting"
    allowed_domains = ["m.qingting.fm"]
    start_urls = ["https://m.qingting.fm/rank/"]

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

if __name__ == '__main__':
    cmdline.execute('scrapy crawl qingting'.split())

posted @ 2024-02-21 16:11 怪~咖阅读(76) 评论(0) 收藏举报

刷新页面返回顶部

清雪

Scrapy 爬虫小案例

公告