python爬虫学习(3)_模拟登陆
1.登陆超星慕课,chrome抓包,模拟header,提取表单隐藏元素构成params。
主要是验证码图片地址,在js中发现由js->new Date().getTime()时间戳动态生成url,python对应time.time(),生成验证码图片url,图片下载在本地,手动输入。代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | #coding=utf-8 import requests import time from bs4 import BeautifulSoup header = { 'Referer' : 'http://aust.fanya.chaoxing.com/portal' , 'Upgrade-Insecure-Requests' : '1' , 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' } name = raw_input ( "input name:" ) password = raw_input ( "input password:" ) num = int (time.time()) #时间戳,取整 code_url = 'http://passport2.chaoxing.com/num/code/?' + str (num) #图片url session = requests.Session() r = session.get(code_url) image = r.content with open ( '/home/zhanyunwu/code.jpg' , 'wb' ) as f: f.write(image) numcode = raw_input ( "input code:" ) #post的参数 params = { 'refer_0x001' : 'http%3A%2F%2Fi.mooc.chaoxing.com%2Fspace%2Findex.shtml' , 'pid' : '1' , 'pidName' :'', 'fid' : '12007' , 'fidName' : '安徽理工大学' , 'allowJoin' : '0' , 'isCheckNumCode' : '1' , 'f' : '0' , 'uname' :name, 'password' :password, 'numcode' :numcode } url = 'http://passport2.chaoxing.com/login' #form提交的url req = session.post(url,params,headers = header) courses = session.get( 'http://mooc12.chaoxing.com/visit/courses' ,cookies = req.cookies,headers = header) #通过成功登陆的cookie访问其他页面 |
2.浏览器已成功登陆,通过保存的cookie登陆豆瓣
1 2 3 4 5 6 7 8 9 10 11 12 13 | #coding=utf-8 import requests session = requests.Session() cookie = {} allcookie = 'll="118190"; bid=c3kC6ui9q28; _pk_id.100001.8cb4=4c5ed6a80ede35ed.1471684466.1.1471684546.1471684466.; _pk_ses.100001.8cb4=*; __utma=30149280.794301906.1471684473.1471684473.1471684473.1; __utmb=30149280.2.9.1471684473; __utmc=30149280; __utmz=30149280.1471684473.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; dbcl2="140658732:f1Vx65Uloqc"; ck=FGYf; push_noty_num=0; push_doumail_num=0; _vwo_uuid_v2=0B4AF16F37C54670B861F7D7A7C5B679|5b7205084917bf0bf6bd9380a8224a9d' for c in allcookie.split( ";" ): key,value = c.split( "=" , 1 ) cookie[key] = value s = session.get( 'http://www.douban.com/people/140658732/' ,cookies = cookie) print s.content text = s.content with open ( "/home/zhanyunwu/test.html" , "wb" ) as f1: f1.write(text) |
【推荐】还在用 ECharts 开发大屏?试试这款永久免费的开源 BI 工具!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步