黑板客爬虫闯关之关卡一
黑板客爬虫闯关之关卡一
分析:从起始界面获得下一个界面的地址信息然后开始跳转,然后又在另外界面获得下一个界面的地址信息,直到通关
闯关地址:http://www.heibanke.com/lesson/crawler_ex00/
注意二者的区别
1 import re 2 import datetime 3 import requests 4 def Go1(url,i): 5 headers = {'authorization':'Client-ID c94869b36aa272dd62dfaeefed769d4115fb3189a9d1ec88ed457207747be626'} 6 html =requests.get(url=url,headers=headers) 7 text = html.text 8 number = re.findall(r'数字([0-9]{5})',text)#匹配 9 url = url +number[0] 10 print(url+' '+str(i)) 11 return url 12 13 def Go2(url,i): 14 headers = {'authorization':'Client-ID c94869b36aa272dd62dfaeefed769d4115fb3189a9d1ec88ed457207747be626'} 15 html =requests.get(url=url,headers=headers) 16 text = html.text 17 number = re.findall(r'数字是([0-9]{5})',text)#注意这是调整界面跟起始界面的区别,网页源码中多了一个'是'字 18 url = 'http://www.heibanke.com/lesson/crawler_ex00/' + number[0] 19 print(url+' '+str(i)) 20 return url 21 22 def main(): 23 i=1 24 url = 'http://www.heibanke.com/lesson/crawler_ex00/' 25 begin_time=datetime.datetime.now() 26 url = Go1(url,i) 27 while True: 28 i=i+1 29 try: 30 url = Go2(url,i) 31 except: 32 print('最后的界面地址是:'+url) 33 print('耗时为:'+str(datetime.datetime.now()-begin_time)) 34 break; 35 main() 36 37 """ 38 结果: 39 http://www.heibanke.com/lesson/crawler_ex00/65392 1 40 http://www.heibanke.com/lesson/crawler_ex00/36133 2 41 http://www.heibanke.com/lesson/crawler_ex00/72324 3 42 http://www.heibanke.com/lesson/crawler_ex00/57633 4 43 http://www.heibanke.com/lesson/crawler_ex00/91251 5 44 http://www.heibanke.com/lesson/crawler_ex00/87016 6 45 http://www.heibanke.com/lesson/crawler_ex00/77055 7 46 http://www.heibanke.com/lesson/crawler_ex00/30366 8 47 http://www.heibanke.com/lesson/crawler_ex00/83679 9 48 http://www.heibanke.com/lesson/crawler_ex00/31388 10 49 http://www.heibanke.com/lesson/crawler_ex00/99446 11 50 http://www.heibanke.com/lesson/crawler_ex00/69428 12 51 http://www.heibanke.com/lesson/crawler_ex00/34798 13 52 http://www.heibanke.com/lesson/crawler_ex00/16780 14 53 http://www.heibanke.com/lesson/crawler_ex00/36499 15 54 http://www.heibanke.com/lesson/crawler_ex00/21070 16 55 http://www.heibanke.com/lesson/crawler_ex00/96749 17 56 http://www.heibanke.com/lesson/crawler_ex00/71822 18 57 http://www.heibanke.com/lesson/crawler_ex00/48739 19 58 http://www.heibanke.com/lesson/crawler_ex00/62816 20 59 http://www.heibanke.com/lesson/crawler_ex00/80182 21 60 http://www.heibanke.com/lesson/crawler_ex00/68171 22 61 http://www.heibanke.com/lesson/crawler_ex00/45458 23 62 http://www.heibanke.com/lesson/crawler_ex00/56056 24 63 http://www.heibanke.com/lesson/crawler_ex00/87450 25 64 http://www.heibanke.com/lesson/crawler_ex00/52695 26 65 http://www.heibanke.com/lesson/crawler_ex00/36675 27 66 http://www.heibanke.com/lesson/crawler_ex00/25997 28 67 http://www.heibanke.com/lesson/crawler_ex00/73222 29 68 http://www.heibanke.com/lesson/crawler_ex00/93891 30 69 http://www.heibanke.com/lesson/crawler_ex00/29052 31 70 http://www.heibanke.com/lesson/crawler_ex00/72996 32 71 http://www.heibanke.com/lesson/crawler_ex00/73999 33 72 http://www.heibanke.com/lesson/crawler_ex00/23814 34 73 http://www.heibanke.com/lesson/crawler_ex00/98084 35 74 http://www.heibanke.com/lesson/crawler_ex00/51103 36 75 http://www.heibanke.com/lesson/crawler_ex00/39603 37 76 http://www.heibanke.com/lesson/crawler_ex00/34316 38 77 http://www.heibanke.com/lesson/crawler_ex00/55719 39 78 http://www.heibanke.com/lesson/crawler_ex00/53685 40 79 http://www.heibanke.com/lesson/crawler_ex00/77771 41 80 http://www.heibanke.com/lesson/crawler_ex00/69187 42 81 http://www.heibanke.com/lesson/crawler_ex00/89677 43 82 http://www.heibanke.com/lesson/crawler_ex00/71935 44 83 http://www.heibanke.com/lesson/crawler_ex00/98538 45 84 http://www.heibanke.com/lesson/crawler_ex00/79152 46 85 http://www.heibanke.com/lesson/crawler_ex00/70999 47 86 http://www.heibanke.com/lesson/crawler_ex00/35102 48 87 http://www.heibanke.com/lesson/crawler_ex00/75956 49 88 http://www.heibanke.com/lesson/crawler_ex00/19122 50 89 最后的界面地址是:http://www.heibanke.com/lesson/crawler_ex00/19122 90 耗时为:0:01:40.219459 91 """
心之所向,素履以往
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南