urllib包的讲解
在python3.x的版本中将老版的urllib,urllib2两个模块合并在urllib包中,共五个模块:1.robotparser;2.parse;3.request;4.response;5.error。各个模块的重要函数:
parse模块:urlencode(对字典或列表生成加密数据) |
request模块:Request(发送get或post请求)urlopen(根据url或Request创建文件对象) |
该包最简单实用方式:
from urllib import requst response = request.urlopen('http://www.baidu.com') html = response.read()
以下是通过get或post方式打开网页:
from urllib import parse from urllib import request def get_method(url, search): url += "?" + parse.urlencode(search) return request.Request(url) def post_method(url, search): data = parse.urlencode(search) return request.Request(url, data.encode('gb2312')) #这里数据必须通过encode编码成bytes才可用,否则TypeError: POST data should be bytes or an iterable of bytes. It cannot be str. def main(): url = "http://www.baidu.com/s" search = [("wd", "codemo")] #req = get_method(url, search) #get方式发送数据 req = post_method(url, search) #post方式发送数据 fd = request.urlopen(req) fobj = open("baidu.html", "w") fobj.write(str(fd.read())) fobj.close() main()