Python-图片和视频文件爬虫

最近在学Python,看了不少资料、视频,对爬虫比较感兴趣,爬过了网页文字、图片、视频。文字就不说了直接从网页上去根据标签分离出来就好了。图片和视频则需要在获取到相应的链接之后取做下载。以下是图片和视频下载的代码备份:

图片&&视频:

# eg:url-http://dynamic-image.yesky.com/740x-/uploadImages/2016/338/21/7058TW4EAC62.JPG
#   path:D:\\pic\\
def pic_down(url,path): 
    fileName = path + 'pic.jpg'
    imgRes = requests.get(url)
    with open(fileName,'wb') as f:
        f.write(imgRes.content)


def audio_down(url,path):
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.3.2.1000 Chrome/30.0.1599.101 Safari/537.36"}      
        pre_content_length = 0 #前次下载的数据长度(大小)
        #接收视频数据
        while True:
            #若文件存在则断点续传
            if os.path.exists(path):
                headers['Range'] = 'bytes=%d-' % os.path.getsize(path)
            res = requests.get(url,stream=True,headers=headers)
            content_length=int(res.headers['content-length'])
            #若当前报文长度小于前次报文长度,或者已接收文件等于当前报文长度,则可以认为视频接收完成
            if content_length<pre_content_length or (os.path.exists(path) and os.path.getsize(path)==content_length):
                break
            pre_content_length =content_length
            #写入收到的视频数据
            with open(path,'ab') as file:
                file.write(res.content)
                file.flush()
                print('receive data,file size : %d  total size:%d' % (os.path.getsize(path),content_length))
    except Exception as e:
        print(e)

 

posted @ 2018-03-18 22:11  墨林2015  阅读(620)  评论(0编辑  收藏  举报