Scrapy 爬虫小案例
爬取网站:https://m.qingting.fm/rank/
技术:Scrapy 与 Xpath
使用Scrapy命令创建爬虫文件(不会创建scrapy项目创建的,请查看,上一篇文章)
scrapy genspider example example.com
成功创建如下图所示
请求数据需要使用parse函数
def parse(self, response:HtmlResponse, **kwargs): urls = response.xpath('//div[@class="rank-list"]/a') for url in urls: tank_number = url.xpath('./div[@class="badge"]/text()').extract_first() img = url.xpath('./img/@src').extract_first() title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first() desc = url.xpath('.//div[@class="desc"]/text()').extract_first() print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc)
这是很简单的xpath语法
完整代码如下:
import scrapy from scrapy import cmdline from scrapy.http import HtmlResponse class QingtingSpider(scrapy.Spider): name = "qingting" allowed_domains = ["m.qingting.fm"] start_urls = ["https://m.qingting.fm/rank/"] def parse(self, response:HtmlResponse, **kwargs): urls = response.xpath('//div[@class="rank-list"]/a') for url in urls: tank_number = url.xpath('./div[@class="badge"]/text()').extract_first() img = url.xpath('./img/@src').extract_first() title = url.xpath('./div[@class="content"]/div[@class="title"]/text()').extract_first() desc = url.xpath('.//div[@class="desc"]/text()').extract_first() print('排名:', tank_number, '作品名:', title, "封面:", img, "热度:", desc) if __name__ == '__main__': cmdline.execute('scrapy crawl qingting'.split())