python爬取网络图片

文章目录

前言
一、获取url
二、分析html
- 1.爬取href
- 2.翻页
三、编写代码
总结

前言

最近刚刚学习完网络爬虫，作为一个正常的男人（lsp）爬取网络美女图片自认必不可少，博主第一次自己写爬虫，历时十五个小时，勉强完成任务

一、获取url

博主不建议直接从百度爬取，百度图片较多，来源复杂，对html的分析能力较高。可以找一些中小型的图片网站。博主爬取的是彼岸图网。 url=http://pic.netbian.com/ 博主爬虫采用request爬取，所以对网页进一步深入，进入4K美女

url=http://pic.netbian.com/4kmeinv/

二、分析html

1.爬取href

博主尝试了直接爬取jpg，发现图片清晰度很低，弄成电脑壁纸一篇马赛克，所以爬取了这个图片指向的超链接，然后进行访问，在进行爬取。

2.翻页

这个通过页面url的分析，加一个循环即可。

三、编写代码

话不多说，代码奉上

from bs4 import BeautifulSoup
import requests
import os
import re

def GetPicture(lists):
    url='http://pic.netbian.com/'+lists
    root="./"
    path=root+url.split('/')[-1]
    try:
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r=requests.get(url)
            with open(path,'wb') as f:
                f.write(r.content)
                f.close()
                print("文件保存成功")
        else:
            print("文件已存在")
    except:
        print("爬取失败")
    pass           
def FirstPage():
    r=requests.get("http://pic.netbian.com/4kmeinv/index.html",timeout=30)
    r.encoding=r.apparent_encoding
    ls=Gethtml(r.text)
    for i in ls:
        r=requests.get("http://pic.netbian.com"+i,timeout=30)
        r.encoding=r.apparent_encoding
        demo=r.text
        soup=BeautifulSoup(demo,"html.parser")
        for link in soup.find_all('img'):
                s=link.get('src')
                print(s)
                GetPicture(s)
                break
    pass

def untilLast():   
    for a in range(2,174):
        r=requests.get("http://pic.netbian.com/4kmeinv/index_"+str(a)+".html",timeout=30)
        r.encoding=r.apparent_encoding
        ls=Gethtml(r.text)
        for i in ls:
            r=requests.get("http://pic.netbian.com/"+i,timeout=30)
            r.encoding=r.apparent_encoding
            demo=r.text
            soup=BeautifulSoup(demo,"html.parser")
            for link in soup.find_all('img'):
                s=link.get('src')
                GetPicture(s)
                break
        pass

def Gethtml(html):
    ls=[]
    soup=BeautifulSoup(html,"html.parser")
    for link in soup.find("ul",class_="clearfix"):
        try:
            for i in link.find_all('a'):
                ls.append(i.get('href'))
        except AttributeError:
            pass
    # 跟NavigableString说拜拜
    return ls

def main():
    FirstPage()
    untilLast()
main()

如果网站没有更新，大家可以直接复制代码爬取照片。

总结

爬虫初级还是比较简单，多多练习就好。

posted @ 2021-02-18 14:18 lcc-666 阅读(105) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部