爬虫实战---爬取图片

 import requests
import re
for page in range(1,11):
    if page==1:
        url="http://www.netbian.com/meinv/index.htm"
    else:
        url=f'http://www.netbian.com/meinv/index_{page}.htm'
    headers={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Core/1.94.178.400 QQBrowser/11.2.5170.400'
    }
 
    response=requests.get(url,headers=headers)
    response.encoding=response.apparent_encoding
    img_info=re.findall('<a href="(.*?)" title=".*?" target="_blank"><img src=".*?" alt="(.*?)" />',response.text)
	#(.*?)表示我们要的信息，.*?表示要查找的信息
    for link,title in img_info:
        link_url='http://www.netbian.com'+link
        response_1 = requests.get(url=link_url,headers=headers)
        response_1.encoding=response_1.apparent_encoding
        img_url=re.findall('target="_blank"><img src="(.*?)" alt=".*?"',response_1.text)[0]
        print(img_url)
        img_content = requests.get(url=img_url,headers=headers).content
        with open('img\\'+title+'.jpg',mode='wb') as f:
            f.write(img_content)

posted @ 2023-03-02 18:22 jesion_wang 阅读(36) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· 爬虫实战---爬取音乐

· 前端三件套--读书笔记（一）HTML

· Python爬取某个网站的图片

· 爬虫爬取网页图片《滕王阁序》文徵明行草

· 关于python爬虫爬取网页图片的实例

阅读排行：
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型，支持深度思考和联网搜索！
· 基于 Docker 搭建 FRP 内网穿透开源项目（很简单哒）
· 25岁的心里话
· ollama系列01：轻松3步本地部署deepseek，普通电脑可用
· 按钮权限的设计及实现

公告

昵称： jesion_wang
园龄： 2年5个月
粉丝： 4
关注： 2

+加关注

2025年3月

日

一

二

三

四

五

六

随笔分类 (5)

随笔档案 (5)

阅读排行榜

评论排行榜

1. 前端三件套--读书笔记（一）HTML(1)

xiaopixiong

你所在意的，别人无所谓

爬虫实战---爬取图片

爬虫实战---爬取图片

公告

搜索

常用链接

最新随笔

我的标签

积分与排名

随笔分类 (5)

随笔档案 (5)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

	import requests
	import re
	for page in range(1,11):
	if page==1:
	url="http://www.netbian.com/meinv/index.htm"
	else:
	url=f'http://www.netbian.com/meinv/index_{page}.htm'
	headers={
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Core/1.94.178.400 QQBrowser/11.2.5170.400'
	}

	response=requests.get(url,headers=headers)
	response.encoding=response.apparent_encoding
	img_info=re.findall('<a href="(.?)" title=".?" target="_blank"><img src=".?" alt="(.?)" />',response.text)
	#(.?)表示我们要的信息，.?表示要查找的信息
	for link,title in img_info:
	link_url='http://www.netbian.com'+link
	response_1 = requests.get(url=link_url,headers=headers)
	response_1.encoding=response_1.apparent_encoding
	img_url=re.findall('target="_blank"><img src="(.?)" alt=".?"',response_1.text)[0]
	print(img_url)
	img_content = requests.get(url=img_url,headers=headers).content
	with open('img\\'+title+'.jpg',mode='wb') as f:
	f.write(img_content)