Python编程快速上手-从Web抓取信息
利用webbrowser模块
webbrowser模块的open()函数可以启动一个新浏览器
1 2 3 | >>> import webbrowser >>> webbrowser. open ( 'http://www.baidu.com/' ) True |
用requests模块从Web下载文件
用requests.get() 函数下载一个网页
1 2 3 4 5 6 7 8 9 10 11 12 13 | >>> import requests >>> res = requests.get( 'http://www.gutenberg.org/cache/epub/1112/pg1112.txt' ) >>> type (res) < class 'requests.models.Response' > >>> len (res.text) 179380 >>> print (res.text[: 250 ]) The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * THIS EBOOK WAS ONE OF PROJECT GUTENBERG'S EARLY FILES PRODUCED AT A TIME WHEN PROOFING METHODS AND TOO |
检查错误
1 2 3 4 5 6 7 8 | >>> res = requests.get( 'http://inventwithpython.com/page_that_does_not_exist' ) >>> res.raise_for_status() Traceback (most recent call last): File "<stdin>" , line 1 , in <module> File "C:\Python37\lib\site-packages\requests\models.py" , line 960 , in raise_for_status raise HTTPError(http_error_msg, response = self ) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http: / / inventwithpython.com / page_that_does_not_exist >>> |
可以用try 和 except 语句将raise_for_status() 代码行包裹,处理这个错误,不让程序崩溃
1 2 3 4 5 6 | import requests res = requests.get( 'http://inventwithpython.com/page_that_does_not_exist' ) try : res.raise_for_status() except Exception as exc: print ( 'There was a problem: %s' % (exc)) |
将下载的文件保存在磁盘
1 2 3 4 5 6 7 8 9 10 11 12 | >>> import requests >>> res = requests.get( 'http://www.gutenberg.org/cache/epub/1112/pg1112.txt' ) >>> res.raise_for_status() >>> playFile = open ( 'RomeoAndJuliet.txt' , 'wb' ) >>> for chunk in res.iter_content( 1000000 ): ... playFile.write(chunk) ... 179382 >>> >>> playFile.close() >>> >>> |
分类:
Python学习
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南