python 使用 urllib2
使用basic auth 的3种方式
1. 设置header
import urllib2 from base64 import encodestring headers = {'Content-Type': 'application/json;charset=UTF-8', 'Authorization': 'Basic %s'% encodestring('%s:%s'%(user, password))[:-1]} def http_request(url, data_json, headers): # data_json is json string, if it is not null then this is a POST request else this is a GET request req = urllib2.Request(url, data_json, headers=headers) response = urllib2.urlopen(req) return response.getcode(), response.read()
2. 使用 HTTPBasicAuthHandler, 并install。执行认证一次,可以多次访问
import urllib2 # Create an OpenerDirector with support for Basic HTTP Authentication... auth_handler = urllib2.HTTPBasicAuthHandler() auth_handler.add_password(realm='PDQ Application', uri='https://mahler:8092/site-updates.py', user='klem', passwd='kadidd!ehopper') opener = urllib2.build_opener(auth_handler) # ...and install it globally so it can be used with urlopen. urllib2.install_opener(opener) # 可以以配置的用户,依次访问多个url urllib2.urlopen('http://www.example.com/login.html') urllib2.urlopen('http://www.example.com/login.html')
3.
含有身份验证的网页,无论是 http 还是 https, 以下代码均适用。(如果还是报权限问题,有可能是 header 问题。)
import urllib2 class WebRequester(object): def __init__(self, top_level_url, user, passwd): password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, top_level_url, user, passwd) handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(handler) urllib2.install_opener(opener) def get_page(self, url): fd = urllib2.urlopen(url) return fd.read().encode('utf-8')
说明:在init中,install 了一个openner, 作为default 的openner, 之后的请求默认由它发出(如果不另外声明),urlopen 会使用该openner。
但是 open 不会使用它。所以,临时请求网页,可以直接使用 urllib2.build_opener().open(url)。
opener = urllib2.build_opener()
fd = opener.open(url)