urllib包的讲解

在python3.x的版本中将老版的urllib,urllib2两个模块合并在urllib包中,共五个模块:1.robotparser;2.parse;3.request;4.response;5.error。各个模块的重要函数:

parse模块:urlencode(对字典或列表生成加密数据)
request模块:Request(发送get或post请求)urlopen(根据url或Request创建文件对象)
 
 

该包最简单实用方式:

from urllib import requst
response = request.urlopen('http://www.baidu.com')
html = response.read()

以下是通过get或post方式打开网页:

from urllib import parse
from urllib import request

def get_method(url, search):
    url += "?" + parse.urlencode(search)
    return request.Request(url)

def post_method(url, search):
    data = parse.urlencode(search)
    return request.Request(url, data.encode('gb2312')) #这里数据必须通过encode编码成bytes才可用,否则TypeError: POST data should be bytes or an iterable of bytes. It cannot be str.

def main():
    url = "http://www.baidu.com/s"
    search = [("wd", "codemo")]
    #req = get_method(url, search) #get方式发送数据
    req = post_method(url, search) #post方式发送数据
    fd = request.urlopen(req)
    fobj = open("baidu.html", "w")
    fobj.write(str(fd.read()))
    fobj.close()
main()
posted @ 2012-05-16 21:40  icamel  阅读(601)  评论(3编辑  收藏  举报