机器学习——小白学习Linux(二)爬取并保存图片
代码参考链接:https://www.cnblogs.com/chenyuan404/p/10192758.html
首先进入环境并cd 到指定文件夹下 输入命令【vi food_pic.py】新建food_pic.py文件,进入编辑模式输入代码。输入命令【python food_pic.py】运行代码
分析网站
查看网页源代码
通过正则表达式获取图片链接 re
1 import requests 2 import re 3 from urllib import request 4 5 6 #模拟浏览器获取图片链接 7 def Get_PIC_list(keyword,max_page): 8 all_picture_list = [] 9 for page in range(max_page): 10 page = page *30 11 url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word={}&pn={}'.format(keyword, page) 12 html = requests.get(url).content.decode('utf-8') 13 picture_list = re.findall('{"thumbURL":"(.*?)",',html) 14 all_picture_list.extend(picture_list) 15 16 all_picture_list = set(all_picture_list) 17 download_picture(all_picture_list) 18 19 #下载图片 20 def download_picture(all_picture_list): 21 for i,pic_url in enumerate(all_picture_list): 22 print(i) 23 string = 'picture/{}.jpg'.format(str(i + 1)) 24 request.urlretrieve(pic_url, string) 25 26 #开始函数 27 def start(): 28 keyword = '美食照片' 29 max_page = 2 30 Get_PIC_list(keyword,max_page) 31 32 33 if __name__ == '__main__': 34 start()