复制代码

Scrapy 爬虫小案例

爬取网站:https://m.qingting.fm/rank/

技术:Scrapy 与 Xpath

使用Scrapy命令创建爬虫文件(不会创建scrapy项目创建的,请查看,上一篇文章)

scrapy genspider example example.com

成功创建如下图所示

 请求数据需要使用parse函数

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

这是很简单的xpath语法

完整代码如下:

import scrapy
from scrapy import cmdline
from scrapy.http import HtmlResponse


class QingtingSpider(scrapy.Spider):
    name = "qingting"
    allowed_domains = ["m.qingting.fm"]
    start_urls = ["https://m.qingting.fm/rank/"]

    def parse(self, response:HtmlResponse, **kwargs):
        urls = response.xpath('//div[@class="rank-list"]/a')
        for url in urls:
            tank_number = url.xpath('./div[@class="badge"]/text()').extract_first()
            img = url.xpath('./img/@src').extract_first()
            title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first()
            desc = url.xpath('.//div[@class="desc"]/text()').extract_first()
            print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)

if __name__ == '__main__':
    cmdline.execute('scrapy crawl qingting'.split())

 

posted @ 2024-02-21 16:11  怪~咖  阅读(31)  评论(0编辑  收藏  举报
复制代码