使用Mechanize实现自动化表单处理
mechanize是对urllib2的部分功能的替换,能够更好的模拟浏览器行为,在web访问控制方面做得更全面
mechanize的特点:
1 http,https协议等 2 简单的HTML表单填写 3 浏览器历史记录和重载 4 Referer的HTTP头的正确添加 5 自动遵守robots.txt的 6 自动处理HTTP-EQUIV和刷新
常用函数
.CookieJar():设置cookie .Browser():打开浏览器 .addheaders():User-Agent,用来欺骗服务器的 .open():打开网页,按照官网描述可以打开任意网页,不仅限于http .select_form():选择表单的,选择表单的ID的时候需要注意。 .form[]:填写信息 .submit():提交
1.安装:
pip install mechanize
注:
只能在python 2.x 上
2.简单使用
import mechanize
br = mechanize.Browser()
br.open("http://www.cnblogs.com/baby123/p/8078508.html")
print br.title()
import mechanize
request2 = mechanize.Request("https://news.cnblogs.com/")
response2 = mechanize.urlopen(request2)
print response2.geturl()
print response2.info()
注: response2.info() # headers response2.read() # body
3.使用百度查询
# coding=UTF-8
import mechanize
br = mechanize.Browser()
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open("https://www.baidu.com/")
br.select_form(nr = 0)
br.form['wd'] = 'python mechanize'
br.submit()
brr=br.response().read()
print brr
4.登陆
# coding=UTF-8
import mechanize
br = mechanize.Browser()
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open("https://passport.csdn.net/account/login?service=http://www.csdn.net")
br.select_form(nr = 0)
br.form['username'] = 'XXXXXXX'
br.form['password'] = '123456'
br.submit()
brr=br.response().read()
with open("logininfo.txt","w") as f:
f.write(brr)
将登陆后的html页面写入文件 logininfo.txt, 从文件内容看,登录成功