Python实现模拟登陆

大家经常会用Python进行数据挖掘的说,但是有些网站是需要登陆才能看到内容的,那怎么用Python实现模拟登陆呢?其实网路上关于这方面的描述很多,不过前些日子遇到了一个需要cookie才能登陆的网站,而且这个网站还有些问题,于是费了好大的劲才搞定,现在贴出来给大家分享下。

首先是用Python3标准库里的urllib包实现的一个版本,不需要考虑许多细节:

 1 #! /usr/bin/env python
 2 # -*- coding:utf-8 -*-
 3 
 4 import urllib.request
 5 import urllib.parse
 6 import http.cookiejar
 7 
 8 StudentInfoURL = 'http://210.x.x.1:90/student/index.jsp'
 9 loginURL = 'http://210.x.x.1:90/login.jsp'
10 loginCheckURL = 'http://210.x.x.1:90/j_security_check'
11 post_data = urllib.parse.urlencode({'j_username': 'xxxxxxx', 'j_password': 'xxxxxxx'})
12 headers = {
13     'Content-Type': 'application/x-www-form-urlencoded',
14     'UserAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36'
15 }
16 
17 cj = http.cookiejar.CookieJar()
18 opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
19 #此处一定要链接一次,否则得不到cookie
20 opener.open(loginCheckURL)    
21 urllib.request.install_opener(opener)
22 
23 
24 ######################此处加入异常处理,再登一次即可######################
25 request = urllib.request.Request(loginCheckURL, post_data, headers)
26 try:
27     response = urllib.request.urlopen(request)
28 except:
29     response = urllib.request.urlopen(request)
30 print(response.read().decode('GBK'))
31 
32 
33 ######################可以开始正常访问啦######################
34 request = urllib.request.Request(StudentInfoURL, headers=headers)
35 fp =  urllib.request.urlopen(request)
36 print(fp.read().decode('GBK'))

下面是另一个版本,用的是比较底层的http包里的client模块实现的,个人很喜欢这个版本:

 1 #!/usr/bin/env python
 2 #  -*- coding:utf-8 -*-
 3 
 4 import http.client
 5 
 6 ###########################################################
 7 HOST = '210.x.x.1:90'
 8 UserName =  "xxxxxxx"
 9 PassWord =  "xxxxxxx"
10 data =  "j_username=%s&j_password=%s"        %(UserName,PassWord)
11 Headers = {
12     "Content-Type":"application/x-www-form-urlencoded",
13     "User-Agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)",
14     }
15 ###########################################################
16 
17 
18 #连接服务器
19 conn = http.client.HTTPConnection(HOST,timeout=30)
20 conn.connect()
21 
22 #GET到登录页,以获取cookies
23 conn.request("GET","/j_security_check",None,Headers)
24 res = conn.getresponse()
25 m_cookie = res.getheader("Set-Cookie").split(';')[0]
26 res.read()
27  
28 #POST到登录页,进行登录
29 Headers["Cookie"] = m_cookie
30 conn.request("POST","/j_security_check",data,Headers)
31 res = conn.getresponse()
32 res.read()
33 if res.status == 400:
34     #再次链接到登录页
35     conn.request("POST","/j_security_check",data,Headers)
36     res = conn.getresponse()
37     res.read()
38 conn.close()
39 
40 
41 
42 
43 
44 ######################可以开始正常访问啦######################
45 conn2 = http.client.HTTPConnection(HOST)
46 conn2.request("GET","/student/index.jsp",None,Headers)
47 fp = conn2.getresponse()
48 print(fp.status)
49 print(fp.read().decode("GBK"))
50 ###########################################################

欢迎大家批评

posted @ 2014-03-22 12:38  oOXuOo  阅读(1358)  评论(4)    收藏  举报