爬虫之妹子图爬取
宅男爬虫学习第一课! 宅男们的福利来啦~
话不多说,直接上代码!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | # -*- encoding: utf-8 -*- # FUNCTION: Capture beauty picture import requests from bs4 import BeautifulSoup import os import time url_list = [ 'http://www.mzitu.com/201024' , 'http://www.mzitu.com/169782' ] # interested beauties headers = { 'referer' : 'https://www.mzitu.com/201024' , 'user-agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 ' 'Safari/537.36' } def get_page_num(url): response = requests.get(url, headers = headers) soup = BeautifulSoup(response.text, 'lxml' ) page_num = soup.find( class_ = 'pagenavi' ).find_all( 'a' )[ - 2 ].text name = soup.find( class_ = 'currentpath' ).text.split()[ - 1 ] return page_num, name # page_num 是字符串 def parse_page(url): """ 得到一页的图片 :param url: 页面URL :return: 图片链接,图片名称 """ response = requests.get(url, headers = headers) soup = BeautifulSoup(response.text, 'lxml' ) pic_url = soup.find( class_ = 'main-image' ).find( 'img' )[ 'src' ] pic_name = soup.find( class_ = 'main-title' ).text return pic_url, pic_name def get_pic(pic_url, pic_name, name): """下载并保存图片""" response = requests.get(pic_url, headers = headers, allow_redirects = False ) filepath = '/home/f/crawler/Beauty/photo/' + name + '/' + pic_name + '.jpg' with open (filepath, 'wb' ) as f: f.write(response.content) def main(): for url in url_list: page_num, name = get_page_num(url) try : os.mkdir( '/home/f/crawler/Beauty/photo/' + name) except FileExistsError: pass for page in range ( 1 , int (page_num) + 1 ): # range迭代 page_url = url + '/' + str (page) print (page_url) pic_url, pic_name = parse_page(page_url) get_pic(pic_url, pic_name, name) time.sleep( 2 ) if __name__ = = '__main__' : main() |
可以收藏一下,慢慢学习哈!


————————————————————————————————————————————
微信关注号:**爬虫王者**

【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧