scrapy上海买房指南
Spider:
# -*- coding: utf-8 -*- import scrapy from scrapy_zhaopin.items import ScrapyHouseItem from scrapy.http import Request class MySpider(scrapy.Spider): name = "spiderhouse" allowed_domains = ["sh.lianjia.com"] start_urls = ["https://sh.lianjia.com/ershoufang/rs徐泾北城/"] def parse(self, response): for line in response.xpath('//*[contains(@log-mod,"list")]//li[contains(@class,"clear")]'): item = ScrapyHouseItem() item['title'] = line.xpath('//title/text()').extract()[0].split("_")[0].replace("二手房房源", "") item['name'] = line.xpath('.//*[@class="title"]/a/text()').extract() item['address'] = line.xpath('.//*[@class="positionInfo"]/a/text()').extract() item['house_info'] = line.xpath('.//*[@class="houseInfo"]/text()').extract() item['price'] = line.xpath('.//*[@class="totalPrice"]//span/text()').extract() item['unit_price'] = line.xpath('.//*[@class="unitPrice"]//span/text()').extract()[0].replace("单价", "").replace("元/平米", "") yield item address_list = ["徐盈路", "徐泾镇", "华新镇", "嘉定北", "中山公园", "汇金路", "青浦新城", "爱博家园", "九亭", "佘山", "泗泾", "洞泾", "赵巷"] for i in address_list: address_url = f'https://sh.lianjia.com/ershoufang/rs{i}/' yield Request(address_url, callback=self.parse) # if self.page < response.xpath('(//*[@class="pager-num"]//*[@class="num-iten"])[last()]/text()'): # self.page += 1 # page_url = self.page_url % self.page # yield Request(page_url, callback=self.parse)
分类:
Scrapy
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix