Python-图片和视频文件爬虫
最近在学Python,看了不少资料、视频,对爬虫比较感兴趣,爬过了网页文字、图片、视频。文字就不说了直接从网页上去根据标签分离出来就好了。图片和视频则需要在获取到相应的链接之后取做下载。以下是图片和视频下载的代码备份:
图片&&视频:
# eg:url-http://dynamic-image.yesky.com/740x-/uploadImages/2016/338/21/7058TW4EAC62.JPG # path:D:\\pic\\ def pic_down(url,path): fileName = path + 'pic.jpg' imgRes = requests.get(url) with open(fileName,'wb') as f: f.write(imgRes.content) def audio_down(url,path): try: headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.3.2.1000 Chrome/30.0.1599.101 Safari/537.36"} pre_content_length = 0 #前次下载的数据长度(大小) #接收视频数据 while True: #若文件存在则断点续传 if os.path.exists(path): headers['Range'] = 'bytes=%d-' % os.path.getsize(path) res = requests.get(url,stream=True,headers=headers) content_length=int(res.headers['content-length']) #若当前报文长度小于前次报文长度,或者已接收文件等于当前报文长度,则可以认为视频接收完成 if content_length<pre_content_length or (os.path.exists(path) and os.path.getsize(path)==content_length): break pre_content_length =content_length #写入收到的视频数据 with open(path,'ab') as file: file.write(res.content) file.flush() print('receive data,file size : %d total size:%d' % (os.path.getsize(path),content_length)) except Exception as e: print(e)