urllib与urllib2

urllib 模块提供的 urlretrieve() 函数。urlretrieve() 方法直接将远程数据下载到本地。

urllib.urlretrieve(img_url, file_path + str(save_count) + '.jpg')

urlretrieve(url, filename=None, reporthook=None, data=None)

如何知道urllib.urlretrieve是否抓取网页成功？

考虑使用urllib2如果可能在你的情况。它比urllib更先进和容易使用。
您可以轻松地检测任何HTTP错误：

import urllib2
resp = urllib2.urlopen("http://google.com/abc.jpg")
Traceback (most recent call last):
<>
urllib2.HTTPError: HTTP Error 404: Not Found
resp实际上是HTTPResponse对象，你可以做很多有用的事情：
resp = urllib2.urlopen("http://google.com/")
resp.code
200
resp.headers["content-type"]
'text/html; charset=windows-1251'
resp.read()
"<>"

但是这种不能应对有跳转的情况，针对有跳转的情况，只能具体情况具体分析。

参考：
[1] Python爬虫关于urlretrieve()函数的使用笔记 https://my.oschina.net/liuyuantao/blog/748338
[2] python – 如何知道urllib.urlretrieve是否成功？ https://codeday.me/bug/20170825/62906.html
[3] 使用爬虫获取imagenet下某个数据集 https://blog.csdn.net/feiye1023/article/details/74073636
[4] Python爬取图片（使用urllib2） https://www.jianshu.com/p/6094ff96536d
[5] 三种Python下载url并保存文件的代码 http://outofmemory.cn/code-snippet/83/sanzhong-Python-xiazai-url-save-file-code
[6] Python爬虫学习笔记一：简单网页图片抓取 http://www.voidcn.com/article/p-ofegnqbt-vz.html
[7] python中的urllib模块中的方法 https://blog.csdn.net/chengxuyuanyonghu/article/details/68067131

posted @ 2022-06-22 09:45 xiaoxuxli 阅读(17) 评论(0) 编辑收藏举报

刷新页面返回顶部

xiaoxu-xli

urllib与urllib2

urllib与urllib2

公告