爬虫遇到的问题

一、Python2 和 python3 中的urllib、urllib2问题

1、urllib2在py3中已不存在，解决urllib2的方式：

1 urllib2在python3.x中被改为urllib.request

2、AttributeError: 'module' object has no attribute 'urlencode'，解决方法：

1 需要导入import urllib.parse

3、TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.，解决方法：

1 把原先：data = urllib.urlencode(values) 2 改为：data = urllib.parse.urlencode(values).encode(encoding='UTF8')

4、TypeError: Can't convert 'bytes' object to str implicitly，解决方法：

1 需进行编码或解码操作：
2 data = urllib.parse.urlencode(values).encode(encoding='utf8')
3 url = 'https://passport.cnblogs.com/user/signin?ReturnUrl=xxxxxxxxxx
4 geturl = url+'?'+data.decode()
5 print(geturl)

5、UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser")

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read(),"html.parser")
print(bsObj.h1)

在BeautifulSoup里面增加"html.parser"

posted @ 2017-02-07 10:14 逍遥无名阅读(581) 评论(0) 编辑收藏举报

刷新页面返回顶部

爬虫遇到的问题

公告