python 爬虫网络图片中遇到的问题总结

1.只导入了import urllib，读取网页的时候page =urllib.urlopen(url),提示 “module’ object has no attribute ’urlopen’”,试了几种办法都不行，后来发现是python3中，用的urllib要加response,改成：page = urllib.request.urlopen(url),在开头也导入response模块 from urllib import request

2.在给爬到的图片下载并重命名时urllib.urlretrieve()，也出现了报错，提示“module’ object has no attribute ’urlretrieve’”，也是相同的问题，在python中也要加response，改成urllib.request.urlretrieve(),就Ok了。

3.一个小问题，在最后打印时，提示：cannot use a string pattern on a bytes-like object，网上查了下，是编码的问题，在对对html解析读取编码格式统一转码为utf-8 html=html.decode('utf-8'),也成功解决

贴上完整的代码下面：

import re import urllib

from urllib import request

def getHtml(url):

　　page = urllib.request.urlopen(url)

　　html = page.read()

　　html=html.decode('utf-8')

　　return html

def getImg(html):

　　reg=r'src="(.*?\.jpg)" alt'

　　imgre=re.compile(reg)

　　imglist=re.findall(imgre,html)

　　x=0

　　for imgurl in imglist:

　　　　urllib.request.urlretrieve(imgurl,'%s.jpg' % x)

　　　　 x+=1

html=getHtml("http://photo.bitauto.com/?WT.mc_id=360tpdq")

getImg(html)

posted on 2017-10-23 14:24 不吃西红柿a 阅读(408) 评论(0) 收藏举报