python scrapy爬取瓜子二手车网站二手车信息(一)

任务目标: 获取瓜子二手车网站所有二手车信息,包括车源号、上牌时间、表显里程、排量、变速箱、标题、价格等网页内所展示信息,存入mongodb数据库

第一步:新建scrapy项目:
在指定项目文件夹中打开命令行终端(shift + 鼠标右键,选择“在此处打开命令窗口”),键入命令 scrapy startproject guazi 创建一个名为guazi的scrapy项目
进入guazi文件夹下: cd guazi 键入 scrapy genspider guazi_spider www.guazi.com 新建一个爬虫文件guazi_spider.py

第二步:打开目标网站,查看需抓取信息,定义抓取目标items.py
任意打开一个二手车详情页,如 https://www.guazi.com/changzhi/1a509a9c9b0419e2x.htm#fr_page=list&fr_pos=city&fr_no=0 查看本次任务需抓取的目标数据,如页面展示的标题、车源号、上牌时间、表显里程、排量、变速箱、价格等信息(如需其他信息,也可添加,本项目暂定这几个信息),定义 items.py

import scrapy

class GuaziItem(scrapy.Item):
    city = scrapy.Field()
    title = scrapy.Field()
    # 车源号
    source_id = scrapy.Field()
    license_time = scrapy.Field()
    # 表显里程
    mileage = scrapy.Field()
    # 排量
    displacement = scrapy.Field()
    # 变速箱
    gearbox_type = scrapy.Field()
    full_price = scrapy.Field()

第三步:分析目标网站的数据抓取路径,初步编写爬虫文件guazi_requests.py
首先进入位置信息显示“全国”的二手车页面 https://www.guazi.com/www/buy/ 请求该页面

import requests

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'
}

url = 'https://www.guazi.com/www/buy'
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
print(response)
print(response.text)

输出响应信息后发现,响应码为203,响应信息为一段js代码以及文本信息 “正在打开中,请稍后...”,未能正确的获取到网页源代码信息,在headers中加入cookie信息以及其他请求头信息后重新请求,仍无法获取到正确的响应,因此我们需要破解响应的js代码。

第四步:破解响应的js代码
上述步骤响应的html内容有一段js代码,内容为:

<script type="text/javascript">eval(function(p,a,c,k,e,r){e=function(c){return(c<62?'':e(parseInt(c/62)))+((c=c%62)>35?String.fromCharCode(c+29):c.toString(36))};if('0'.replace(0,e)==0){while(c--)r[e(c)]=k[c];k=[function(e){return r[e]||e}];e=function(){return'([efhj-pru-wzA-Y]|1\\w)'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('f u(x,y){e M=(x&N)+(y&N);e 1f=(x>>16)+(y>>16)+(M>>16);h(1f<<16)|(M&N)}f 1g(O,P){h(O<<P)|(O>>>(32-P))}f C(q,a,b,x,s,t){h u(1g(u(u(a,q),u(x,t)),s),b)}f j(a,b,c,d,x,s,t){h C((b&c)|((~b)&d),a,b,x,s,t)}f k(a,b,c,d,x,s,t){h C((b&d)|(c&(~d)),a,b,x,s,t)}f l(a,b,c,d,x,s,t){h C(b^c^d,a,b,x,s,t)}f m(a,b,c,d,x,s,t){h C(c^(b|(~d)),a,b,x,s,t)}f D(x,w){x[w>>5]|=0x80<<(w%32);x[(((w+64)>>>9)<<4)+14]=w;e i;e Q;e R;e S;e T;e a=1732584193;e b=-271733879;e c=-1732584194;e d=271733878;v(i=0;i<x.n;i+=16){Q=a;R=b;S=c;T=d;a=j(a,b,c,d,x[i],7,-680876936);d=j(d,a,b,c,x[i+1],12,-389564586);c=j(c,d,a,b,x[i+2],17,606105819);b=j(b,c,d,a,x[i+3],22,-1044525330);a=j(a,b,c,d,x[i+4],7,-176418897);d=j(d,a,b,c,x[i+5],12,1200080426);c=j(c,d,a,b,x[i+6],17,-1473231341);b=j(b,c,d,a,x[i+7],22,-45705983);a=j(a,b,c,d,x[i+8],7,1770035416);d=j(d,a,b,c,x[i+9],12,-1958414417);c=j(c,d,a,b,x[i+10],17,-42063);b=j(b,c,d,a,x[i+11],22,-1990404162);a=j(a,b,c,d,x[i+12],7,1804603682);d=j(d,a,b,c,x[i+13],12,-40341101);c=j(c,d,a,b,x[i+14],17,-1502002290);b=j(b,c,d,a,x[i+15],22,1236535329);a=k(a,b,c,d,x[i+1],5,-165796510);d=k(d,a,b,c,x[i+6],9,-1069501632);c=k(c,d,a,b,x[i+11],14,643717713);b=k(b,c,d,a,x[i],20,-373897302);a=k(a,b,c,d,x[i+5],5,-701558691);d=k(d,a,b,c,x[i+10],9,38016083);c=k(c,d,a,b,x[i+15],14,-660478335);b=k(b,c,d,a,x[i+4],20,-405537848);a=k(a,b,c,d,x[i+9],5,568446438);d=k(d,a,b,c,x[i+14],9,-1019803690);c=k(c,d,a,b,x[i+3],14,-187363961);b=k(b,c,d,a,x[i+8],20,1163531501);a=k(a,b,c,d,x[i+13],5,-1444681467);d=k(d,a,b,c,x[i+2],9,-51403784);c=k(c,d,a,b,x[i+7],14,1735328473);b=k(b,c,d,a,x[i+12],20,-1926607734);a=l(a,b,c,d,x[i+5],4,-378558);d=l(d,a,b,c,x[i+8],11,-2022574463);c=l(c,d,a,b,x[i+11],16,1839030562);b=l(b,c,d,a,x[i+14],23,-35309556);a=l(a,b,c,d,x[i+1],4,-1530992060);d=l(d,a,b,c,x[i+4],11,1272893353);c=l(c,d,a,b,x[i+7],16,-155497632);b=l(b,c,d,a,x[i+10],23,-1094730640);a=l(a,b,c,d,x[i+13],4,681279174);d=l(d,a,b,c,x[i],11,-358537222);c=l(c,d,a,b,x[i+3],16,-722521979);b=l(b,c,d,a,x[i+6],23,76029189);a=l(a,b,c,d,x[i+9],4,-640364487);d=l(d,a,b,c,x[i+12],11,-421815835);c=l(c,d,a,b,x[i+15],16,530742520);b=l(b,c,d,a,x[i+2],23,-995338651);a=m(a,b,c,d,x[i],6,-198630844);d=m(d,a,b,c,x[i+7],10,1126891415);c=m(c,d,a,b,x[i+14],15,-1416354905);b=m(b,c,d,a,x[i+5],21,-57434055);a=m(a,b,c,d,x[i+12],6,1700485571);d=m(d,a,b,c,x[i+3],10,-1894986606);c=m(c,d,a,b,x[i+10],15,-1051523);b=m(b,c,d,a,x[i+1],21,-2054922799);a=m(a,b,c,d,x[i+8],6,1873313359);d=m(d,a,b,c,x[i+15],10,-30611744);c=m(c,d,a,b,x[i+6],15,-1560198380);b=m(b,c,d,a,x[i+13],21,1309151649);a=m(a,b,c,d,x[i+4],6,-145523070);d=m(d,a,b,c,x[i+11],10,-1120210379);c=m(c,d,a,b,x[i+2],15,718787259);b=m(b,c,d,a,x[i+9],21,-343485551);a=u(a,Q);b=u(b,R);c=u(c,S);d=u(d,T)}h[a,b,c,d]}f U(o){e i;e p=\'\';e 1h=o.n*32;v(i=0;i<1h;i+=8){p+=String.fromCharCode((o[i>>5]>>>(i%32))&1i)}h p}f F(o){e i;e p=[];p[(o.n>>2)-1]=1j;v(i=0;i<p.n;i+=1){p[i]=0}e 1k=o.n*8;v(i=0;i<1k;i+=8){p[i>>5]|=(o.1l(i/8)&1i)<<(i%32)}h p}f 1m(s){h U(D(F(s),s.n*8))}f rstrHMAC(G,V){e i;e A=F(G);e H=[];e I=[];e W;H[15]=I[15]=1j;z(A.n>16){A=D(A,G.n*8)}v(i=0;i<16;i+=1){H[i]=A[i]^0x36363636;I[i]=A[i]^0x5C5C5C5C}W=D(H.1n(F(V)),1o+V.n*8);h U(D(I.1n(W),1o+128))}f 1p(o){e X=\'0123456789abcdef\';e p=\'\';e x;e i;v(i=0;i<o.n;i+=1){x=o.1l(i);p+=X.Y((x>>>4)&1q)+X.Y(x&1q)}h p}f 1r(o){h unescape(encodeURIComponent(o))}f 1s(s){h 1m(1r(s))}f 1t(s){h 1p(1s(s))}f 1u(){e 18="";e 19="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";e w=J.1v(J.1w()*2);v(e i=0;i<w;i++){18+=19.Y(J.1v(J.1w()*19.n))}h 18}f 1x(s){s=s.1y(/[a-zA-Z]/g,\'#\');e E=s.split(\'\');v(e i=0;i<E.n;i++){z(E[i]==\'#\'){E[i]=1u()}}h E.join(\'\')}f anti(1z,G){e 1A=1t(1z);h 1x(1A)}f xredirect(1a,1B,r){e K=new Date();K.setTime(K.getTime()+2592000000);e 1b="; 1b="+K.toUTCString();1C.1c=1a+"="+1B+1b+"; path=/";z(1C.1c.1D(1a)===-1&&navigator.cookieEnabled){alert(\'请修改浏览器设置,允许1c缓存\')}1E{z(r==\'\'){e r=B.1F;z(B.1d!=\'L:\'){r=\'L:\'+1G.B.1F.1H(1G.B.1d.n)}}1E{z(B.1d!=\'L:\'){r=\'L:\'+r}}e 1e=r.1D(\'#\');z(1e!==-1){r=r.1H(0,1e)}B.1y(r)}}',[],106,'||||||||||||||var|function||return||ff|gg|hh|ii|length|input|output||url|||safeAdd|for|len|||if|bkey|location|cmn|binl|arr|rstr2binl|key|ipad|opad|Math|date|https|lsw|0xFFFF|num|cnt|olda|oldb|oldc|oldd|binl2rstr|data|hash|hexTab|charAt||||||||||text|possible|name|expires|cookie|protocol|ulen|msw|bitRotateLeft|length32|0xFF|undefined|length8|charCodeAt|rstr|concat|512|rstr2hex|0x0F|str2rstrUTF8|raw|hex|uid|floor|random|charRun|replace|string|estring|value|document|indexOf|else|href|window|substring'.split('|'),0,{}));var value=anti('VRMRrdnPvYJBbZ3nwNI2HdWgkkYQx5QQQ5DZE4MwG08=','178728346661452438343796');var name='antipas';var url='';xredirect(name,value,url,'https://');</script>

分析该js代码发现,该js代码无法直接看懂,为混淆后的js代码,但其中有一部分为函数内容(从function(p,a,c,k,e,r)开始),还有定义变量以及执行操作的语句(在最后)

var value=anti('VRMRrdnPvYJBbZ3nwNI2HdWgkkYQx5QQQ5DZE4MwG08=','178728346661452438343796');
var name='antipas';var url='';
xredirect(name,value,url,'https://');

复制出所有函数内容,从eval后面开始,即 (function(p,a,c,k,e,r)开始,到 var value=anti('VRMRrdnPvYJBbZ3nwNI2HdWgkkYQx5QQQ5DZE4MwG08=','178728346661452438343796')前面的分号结束(包含分号),即:

(function(p,a,c,k,e,r){e=function(c){return(c<62?'':e(parseInt(c/62)))+((c=c%62)>35?String.fromCharCode(c+29):c.toString(36))};if('0'.replace(0,e)==0){while(c--)r[e(c)]=k[c];k=[function(e){return r[e]||e}];e=function(){return'([efhj-pru-wzA-Y]|1\\w)'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('f u(x,y){e M=(x&N)+(y&N);e 1f=(x>>16)+(y>>16)+(M>>16);h(1f<<16)|(M&N)}f 1g(O,P){h(O<<P)|(O>>>(32-P))}f C(q,a,b,x,s,t){h u(1g(u(u(a,q),u(x,t)),s),b)}f j(a,b,c,d,x,s,t){h C((b&c)|((~b)&d),a,b,x,s,t)}f k(a,b,c,d,x,s,t){h C((b&d)|(c&(~d)),a,b,x,s,t)}f l(a,b,c,d,x,s,t){h C(b^c^d,a,b,x,s,t)}f m(a,b,c,d,x,s,t){h C(c^(b|(~d)),a,b,x,s,t)}f D(x,w){x[w>>5]|=0x80<<(w%32);x[(((w+64)>>>9)<<4)+14]=w;e i;e Q;e R;e S;e T;e a=1732584193;e b=-271733879;e c=-1732584194;e d=271733878;v(i=0;i<x.n;i+=16){Q=a;R=b;S=c;T=d;a=j(a,b,c,d,x[i],7,-680876936);d=j(d,a,b,c,x[i+1],12,-389564586);c=j(c,d,a,b,x[i+2],17,606105819);b=j(b,c,d,a,x[i+3],22,-1044525330);a=j(a,b,c,d,x[i+4],7,-176418897);d=j(d,a,b,c,x[i+5],12,1200080426);c=j(c,d,a,b,x[i+6],17,-1473231341);b=j(b,c,d,a,x[i+7],22,-45705983);a=j(a,b,c,d,x[i+8],7,1770035416);d=j(d,a,b,c,x[i+9],12,-1958414417);c=j(c,d,a,b,x[i+10],17,-42063);b=j(b,c,d,a,x[i+11],22,-1990404162);a=j(a,b,c,d,x[i+12],7,1804603682);d=j(d,a,b,c,x[i+13],12,-40341101);c=j(c,d,a,b,x[i+14],17,-1502002290);b=j(b,c,d,a,x[i+15],22,1236535329);a=k(a,b,c,d,x[i+1],5,-165796510);d=k(d,a,b,c,x[i+6],9,-1069501632);c=k(c,d,a,b,x[i+11],14,643717713);b=k(b,c,d,a,x[i],20,-373897302);a=k(a,b,c,d,x[i+5],5,-701558691);d=k(d,a,b,c,x[i+10],9,38016083);c=k(c,d,a,b,x[i+15],14,-660478335);b=k(b,c,d,a,x[i+4],20,-405537848);a=k(a,b,c,d,x[i+9],5,568446438);d=k(d,a,b,c,x[i+14],9,-1019803690);c=k(c,d,a,b,x[i+3],14,-187363961);b=k(b,c,d,a,x[i+8],20,1163531501);a=k(a,b,c,d,x[i+13],5,-1444681467);d=k(d,a,b,c,x[i+2],9,-51403784);c=k(c,d,a,b,x[i+7],14,1735328473);b=k(b,c,d,a,x[i+12],20,-1926607734);a=l(a,b,c,d,x[i+5],4,-378558);d=l(d,a,b,c,x[i+8],11,-2022574463);c=l(c,d,a,b,x[i+11],16,1839030562);b=l(b,c,d,a,x[i+14],23,-35309556);a=l(a,b,c,d,x[i+1],4,-1530992060);d=l(d,a,b,c,x[i+4],11,1272893353);c=l(c,d,a,b,x[i+7],16,-155497632);b=l(b,c,d,a,x[i+10],23,-1094730640);a=l(a,b,c,d,x[i+13],4,681279174);d=l(d,a,b,c,x[i],11,-358537222);c=l(c,d,a,b,x[i+3],16,-722521979);b=l(b,c,d,a,x[i+6],23,76029189);a=l(a,b,c,d,x[i+9],4,-640364487);d=l(d,a,b,c,x[i+12],11,-421815835);c=l(c,d,a,b,x[i+15],16,530742520);b=l(b,c,d,a,x[i+2],23,-995338651);a=m(a,b,c,d,x[i],6,-198630844);d=m(d,a,b,c,x[i+7],10,1126891415);c=m(c,d,a,b,x[i+14],15,-1416354905);b=m(b,c,d,a,x[i+5],21,-57434055);a=m(a,b,c,d,x[i+12],6,1700485571);d=m(d,a,b,c,x[i+3],10,-1894986606);c=m(c,d,a,b,x[i+10],15,-1051523);b=m(b,c,d,a,x[i+1],21,-2054922799);a=m(a,b,c,d,x[i+8],6,1873313359);d=m(d,a,b,c,x[i+15],10,-30611744);c=m(c,d,a,b,x[i+6],15,-1560198380);b=m(b,c,d,a,x[i+13],21,1309151649);a=m(a,b,c,d,x[i+4],6,-145523070);d=m(d,a,b,c,x[i+11],10,-1120210379);c=m(c,d,a,b,x[i+2],15,718787259);b=m(b,c,d,a,x[i+9],21,-343485551);a=u(a,Q);b=u(b,R);c=u(c,S);d=u(d,T)}h[a,b,c,d]}f U(o){e i;e p=\'\';e 1h=o.n*32;v(i=0;i<1h;i+=8){p+=String.fromCharCode((o[i>>5]>>>(i%32))&1i)}h p}f F(o){e i;e p=[];p[(o.n>>2)-1]=1j;v(i=0;i<p.n;i+=1){p[i]=0}e 1k=o.n*8;v(i=0;i<1k;i+=8){p[i>>5]|=(o.1l(i/8)&1i)<<(i%32)}h p}f 1m(s){h U(D(F(s),s.n*8))}f rstrHMAC(G,V){e i;e A=F(G);e H=[];e I=[];e W;H[15]=I[15]=1j;z(A.n>16){A=D(A,G.n*8)}v(i=0;i<16;i+=1){H[i]=A[i]^0x36363636;I[i]=A[i]^0x5C5C5C5C}W=D(H.1n(F(V)),1o+V.n*8);h U(D(I.1n(W),1o+128))}f 1p(o){e X=\'0123456789abcdef\';e p=\'\';e x;e i;v(i=0;i<o.n;i+=1){x=o.1l(i);p+=X.Y((x>>>4)&1q)+X.Y(x&1q)}h p}f 1r(o){h unescape(encodeURIComponent(o))}f 1s(s){h 1m(1r(s))}f 1t(s){h 1p(1s(s))}f 1u(){e 18="";e 19="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";e w=J.1v(J.1w()*2);v(e i=0;i<w;i++){18+=19.Y(J.1v(J.1w()*19.n))}h 18}f 1x(s){s=s.1y(/[a-zA-Z]/g,\'#\');e E=s.split(\'\');v(e i=0;i<E.n;i++){z(E[i]==\'#\'){E[i]=1u()}}h E.join(\'\')}f anti(1z,G){e 1A=1t(1z);h 1x(1A)}f xredirect(1a,1B,r){e K=new Date();K.setTime(K.getTime()+2592000000);e 1b="; 1b="+K.toUTCString();1C.1c=1a+"="+1B+1b+"; path=/";z(1C.1c.1D(1a)===-1&&navigator.cookieEnabled){alert(\'请修改浏览器设置,允许1c缓存\')}1E{z(r==\'\'){e r=B.1F;z(B.1d!=\'L:\'){r=\'L:\'+1G.B.1F.1H(1G.B.1d.n)}}1E{z(B.1d!=\'L:\'){r=\'L:\'+r}}e 1e=r.1D(\'#\');z(1e!==-1){r=r.1H(0,1e)}B.1y(r)}}',[],106,'||||||||||||||var|function||return||ff|gg|hh|ii|length|input|output||url|||safeAdd|for|len|||if|bkey|location|cmn|binl|arr|rstr2binl|key|ipad|opad|Math|date|https|lsw|0xFFFF|num|cnt|olda|oldb|oldc|oldd|binl2rstr|data|hash|hexTab|charAt||||||||||text|possible|name|expires|cookie|protocol|ulen|msw|bitRotateLeft|length32|0xFF|undefined|length8|charCodeAt|rstr|concat|512|rstr2hex|0x0F|str2rstrUTF8|raw|hex|uid|floor|random|charRun|replace|string|estring|value|document|indexOf|else|href|window|substring'.split('|'),0,{}));

复制这段混淆后的js代码(注意起始位置和结束位置不要错),打开chrome浏览器,按 F12 或 鼠标右键,选择检查,打开开发者工具,点击console页面,ctrl+v 把这段代码复制过去,敲回车,会解析出混淆前的js代码,复制出解析后的代码,通过代码美化工具格式化后即可获得标准的js代码,复制出该代码并加上之前的定义变量以及执行操作的语句,保存在guazi.js文件内(与items.py同级),内容如下:

function safeAdd(x, y) {
    var lsw = (x & 0xFFFF) + (y & 0xFFFF);
    var msw = (x >> 16) + (y >> 16) + (lsw >> 16);
    return (msw << 16) | (lsw & 0xFFFF)
}
function bitRotateLeft(num, cnt) {
    return (num << cnt) | (num >>> (32 - cnt))
}
function cmn(q, a, b, x, s, t) {
    return safeAdd(bitRotateLeft(safeAdd(safeAdd(a, q), safeAdd(x, t)), s), b)
}
function ff(a, b, c, d, x, s, t) {
    return cmn((b & c) | ((~b) & d), a, b, x, s, t)
}
function gg(a, b, c, d, x, s, t) {
    return cmn((b & d) | (c & (~d)), a, b, x, s, t)
}
function hh(a, b, c, d, x, s, t) {
    return cmn(b ^ c ^ d, a, b, x, s, t)
}
function ii(a, b, c, d, x, s, t) {
    return cmn(c ^ (b | (~d)), a, b, x, s, t)
}
function binl(x, len) {
    x[len >> 5] |= 0x80 << (len % 32);
    x[(((len + 64) >>> 9) << 4) + 14] = len;
    var i;
    var olda;
    var oldb;
    var oldc;
    var oldd;
    var a = 1732584193;
    var b = -271733879;
    var c = -1732584194;
    var d = 271733878;
    for (i = 0; i < x.length; i += 16) {
        olda = a;
        oldb = b;
        oldc = c;
        oldd = d;
        a = ff(a, b, c, d, x[i], 7, -680876936);
        d = ff(d, a, b, c, x[i + 1], 12, -389564586);
        c = ff(c, d, a, b, x[i + 2], 17, 606105819);
        b = ff(b, c, d, a, x[i + 3], 22, -1044525330);
        a = ff(a, b, c, d, x[i + 4], 7, -176418897);
        d = ff(d, a, b, c, x[i + 5], 12, 1200080426);
        c = ff(c, d, a, b, x[i + 6], 17, -1473231341);
        b = ff(b, c, d, a, x[i + 7], 22, -45705983);
        a = ff(a, b, c, d, x[i + 8], 7, 1770035416);
        d = ff(d, a, b, c, x[i + 9], 12, -1958414417);
        c = ff(c, d, a, b, x[i + 10], 17, -42063);
        b = ff(b, c, d, a, x[i + 11], 22, -1990404162);
        a = ff(a, b, c, d, x[i + 12], 7, 1804603682);
        d = ff(d, a, b, c, x[i + 13], 12, -40341101);
        c = ff(c, d, a, b, x[i + 14], 17, -1502002290);
        b = ff(b, c, d, a, x[i + 15], 22, 1236535329);
        a = gg(a, b, c, d, x[i + 1], 5, -165796510);
        d = gg(d, a, b, c, x[i + 6], 9, -1069501632);
        c = gg(c, d, a, b, x[i + 11], 14, 643717713);
        b = gg(b, c, d, a, x[i], 20, -373897302);
        a = gg(a, b, c, d, x[i + 5], 5, -701558691);
        d = gg(d, a, b, c, x[i + 10], 9, 38016083);
        c = gg(c, d, a, b, x[i + 15], 14, -660478335);
        b = gg(b, c, d, a, x[i + 4], 20, -405537848);
        a = gg(a, b, c, d, x[i + 9], 5, 568446438);
        d = gg(d, a, b, c, x[i + 14], 9, -1019803690);
        c = gg(c, d, a, b, x[i + 3], 14, -187363961);
        b = gg(b, c, d, a, x[i + 8], 20, 1163531501);
        a = gg(a, b, c, d, x[i + 13], 5, -1444681467);
        d = gg(d, a, b, c, x[i + 2], 9, -51403784);
        c = gg(c, d, a, b, x[i + 7], 14, 1735328473);
        b = gg(b, c, d, a, x[i + 12], 20, -1926607734);
        a = hh(a, b, c, d, x[i + 5], 4, -378558);
        d = hh(d, a, b, c, x[i + 8], 11, -2022574463);
        c = hh(c, d, a, b, x[i + 11], 16, 1839030562);
        b = hh(b, c, d, a, x[i + 14], 23, -35309556);
        a = hh(a, b, c, d, x[i + 1], 4, -1530992060);
        d = hh(d, a, b, c, x[i + 4], 11, 1272893353);
        c = hh(c, d, a, b, x[i + 7], 16, -155497632);
        b = hh(b, c, d, a, x[i + 10], 23, -1094730640);
        a = hh(a, b, c, d, x[i + 13], 4, 681279174);
        d = hh(d, a, b, c, x[i], 11, -358537222);
        c = hh(c, d, a, b, x[i + 3], 16, -722521979);
        b = hh(b, c, d, a, x[i + 6], 23, 76029189);
        a = hh(a, b, c, d, x[i + 9], 4, -640364487);
        d = hh(d, a, b, c, x[i + 12], 11, -421815835);
        c = hh(c, d, a, b, x[i + 15], 16, 530742520);
        b = hh(b, c, d, a, x[i + 2], 23, -995338651);
        a = ii(a, b, c, d, x[i], 6, -198630844);
        d = ii(d, a, b, c, x[i + 7], 10, 1126891415);
        c = ii(c, d, a, b, x[i + 14], 15, -1416354905);
        b = ii(b, c, d, a, x[i + 5], 21, -57434055);
        a = ii(a, b, c, d, x[i + 12], 6, 1700485571);
        d = ii(d, a, b, c, x[i + 3], 10, -1894986606);
        c = ii(c, d, a, b, x[i + 10], 15, -1051523);
        b = ii(b, c, d, a, x[i + 1], 21, -2054922799);
        a = ii(a, b, c, d, x[i + 8], 6, 1873313359);
        d = ii(d, a, b, c, x[i + 15], 10, -30611744);
        c = ii(c, d, a, b, x[i + 6], 15, -1560198380);
        b = ii(b, c, d, a, x[i + 13], 21, 1309151649);
        a = ii(a, b, c, d, x[i + 4], 6, -145523070);
        d = ii(d, a, b, c, x[i + 11], 10, -1120210379);
        c = ii(c, d, a, b, x[i + 2], 15, 718787259);
        b = ii(b, c, d, a, x[i + 9], 21, -343485551);
        a = safeAdd(a, olda);
        b = safeAdd(b, oldb);
        c = safeAdd(c, oldc);
        d = safeAdd(d, oldd)
    }
    return [a, b, c, d]
}
function binl2rstr(input) {
    var i;
    var output = '';
    var length32 = input.length * 32;
    for (i = 0; i < length32; i += 8) {
        output += String.fromCharCode((input[i >> 5] >>> (i % 32)) & 0xFF)
    }
    return output
}
function rstr2binl(input) {
    var i;
    var output = [];
    output[(input.length >> 2) - 1] = undefined;
    for (i = 0; i < output.length; i += 1) {
        output[i] = 0
    }
    var length8 = input.length * 8;
    for (i = 0; i < length8; i += 8) {
        output[i >> 5] |= (input.charCodeAt(i / 8) & 0xFF) << (i % 32)
    }
    return output
}
function rstr(s) {
    return binl2rstr(binl(rstr2binl(s), s.length * 8))
}
function rstrHMAC(key, data) {
    var i;
    var bkey = rstr2binl(key);
    var ipad = [];
    var opad = [];
    var hash;
    ipad[15] = opad[15] = undefined;
    if (bkey.length > 16) {
        bkey = binl(bkey, key.length * 8)
    }
    for (i = 0; i < 16; i += 1) {
        ipad[i] = bkey[i] ^ 0x36363636;
        opad[i] = bkey[i] ^ 0x5C5C5C5C
    }
    hash = binl(ipad.concat(rstr2binl(data)), 512 + data.length * 8);
    return binl2rstr(binl(opad.concat(hash), 512 + 128))
}
function rstr2hex(input) {
    var hexTab = '0123456789abcdef';
    var output = '';
    var x;
    var i;
    for (i = 0; i < input.length; i += 1) {
        x = input.charCodeAt(i);
        output += hexTab.charAt((x >>> 4) & 0x0F) + hexTab.charAt(x & 0x0F)
    }
    return output
}
function str2rstrUTF8(input) {
    return unescape(encodeURIComponent(input))
}
function raw(s) {
    return rstr(str2rstrUTF8(s))
}
function hex(s) {
    return rstr2hex(raw(s))
}
function uid() {
    var text = "";
    var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    var len = Math.floor(Math.random() * 2);
    for (var i = 0; i < len; i++) {
        text += possible.charAt(Math.floor(Math.random() * possible.length))
    }
    return text
}
function charRun(s) {
    s = s.replace(/[a-zA-Z]/g, '#');
    var arr = s.split('');
    for (var i = 0; i < arr.length; i++) {
        if (arr[i] == '#') {
            arr[i] = uid()
        }
    }
    return arr.join('')
}
function anti(string, key) {
    var estring = hex(string);
    return charRun(estring)
}
function xredirect(name, value, url) {
    var date = new Date();
    date.setTime(date.getTime() + 2592000000);
    var expires = "; expires=" + date.toUTCString();
    document.cookie = name + "=" + value + expires + "; path=/";
    if (document.cookie.indexOf(name) === -1 && navigator.cookieEnabled) {
        alert('请修改浏览器设置,允许cookie缓存')
    } else {
        if (url == '') {
            var url = location.href;
            if (location.protocol != 'https:') {
                url = 'https:' + window.location.href.substring(window.location.protocol.length)
            }
        } else {
            if (location.protocol != 'https:') {
                url = 'https:' + url
            }
        }
        var ulen = url.indexOf('#');
        if (ulen !== -1) {
            url = url.substring(0, ulen)
        }
        location.replace(url)
    }
}

var value=anti('C7G9FPet7ALt/7wKd3KSP0wTC5VfAwEUI0sDVelfjH0=','323933112359342737291');
var name='antipas';
var url='';
xredirect(name,value,url,'https://');

分析该js代码发现,最后一个执行的操作为xredirect函数,传入的参数为value=anti(...), name='antipas', url='',查看该函数后发现其中有一段语句为 document.cookie = name + "=" + value + expires + "; path=/"; 因此推断该网站需传入符合该函数条件的cookie,我们只要知道name、value、expire是什么即可破解出cookie信息。

查看上面的代码可知:

  1. name='antipas'是固定值;
  2. value是调用anti函数得到的,传入的参数在第一次203响应得到的页面中也有展示,即 'C7G9FPet7ALt/7wKd3KSP0wTC5VfAwEUI0sDVelfjH0='和'323933112359342737291' (每次响应的值都不同)可通过正则表达式获取到这两个参数值,同时在python中有一个模块execjs可以操纵js代码,pip install pyexecjs 安装后即可使用,使用该模块执行anti函数,即可得到value值;
  3. expire值为 "; expires=" + date.toUTCString(); 在js中 date.toUTCString() 表示根据世界时 (UTC) 把 Date 对象转换为字符串,而根据date.setTime(date.getTime() + 2592000000);可知,设置的时间为 当前时间 + 2592000000 (毫秒), 在python中获取的时间一般单位为秒,获取该时间类型的字符串语句如下,根据如下语句即可得到expire的值:
import datetime
expire_time = datetime.datetime.utcnow() + datetime.timedelta(seconds=2592000)
expires = "; expires=" + expire_time.strftime('%a, %d %b %Y %H:%M:%S GMT')

得到以上三个值后即可得到cookie信息:
cookie = name + "=" + value + expires + "; path=/"
将获取到的cookie信息加入请求头: headers['Cookie'] = cookie

将以上代码逻辑实现,重新请求页面:

import requests
import re
import execjs
import datetime

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'
}

url = 'https://www.guazi.com/www/buy'
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
html = response.text
if '正在打开中,请稍后' in html:
    pattern = re.compile(r'value=anti\(\'(.*?)\',\'(.*?)\'\)')
    string_ = pattern.search(html).group(1)
    key = pattern.search(html).group(2)
    print('string_: {}, key: {}'.format(string_, key))
    with open('guazi.js', 'r', encoding='utf-8') as f:
        js_compile = execjs.compile(f.read())
        value = js_compile.call('anti', string_, key)
    name = 'antipas'
    expire_time = datetime.datetime.utcnow() + datetime.timedelta(seconds=2592000)
    expires = "; expires=" + expire_time.strftime('%a, %d %b %Y %H:%M:%S GMT')
    cookie = name + "=" + value + expires + "; path=/"
    headers['Cookie'] = cookie
    response = requests.get(url, headers=headers)
    print(response)
    print(response.text)

发现响应码为200,且可以正常获取到网页数据信息,至此,js代码破解成功。

本篇到此结束,余下内容见后篇 https://www.cnblogs.com/achangblog/p/13956987.html

posted @ 2020-11-11 00:02  脱下长日的假面  阅读(779)  评论(0编辑  收藏  举报