加速乐逆向 cookies 参数
简介
加速乐用于解决网站访问速度过慢及网站反黑客问题。
爬取使用该技术网站时需要携带特定的cookies参数(有的是__jsl_clearance_s,有的__jsl_clearance),本项目以一个使用该技术的网站为例进行逆向分析。
完整代码和封装好的获取cookies脚本请前往github
第一步获取__jsluid_h参数
目标url = aHR0cDovL3d3dy56b25neWFuZy5nb3YuY24vb3Blbm5lc3MvT3Blbm5lc3NDb250ZW50L3Nob3dMaXN0LzE0NDIvNDU3MTIvcGFnZV8xLmh0bWw=
第一次请求网站,网站返回的响应状态码为 521,响应返回的为经过 AAEncode 混淆的 JS 代码;
需要获取的__jsluid_h参数在第一次请求的响应头中
import re
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
response = requests.get(url, headers=headers)
print(response.headers['Set-Cookie'])
__jsluid_h获取成功
第二步获取__jsl_clearance参数
__jsl_clearance前置参数在第一次请求的返回值中生成
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
response = requests.get(url, headers=headers)
print(response.text)
通过正则从响应值中取出js并执行,从而获得第一次的__jsl_clearance
cookie = re.findall(r'(cookie=.*?)location', response.text)[0]
import re
import execjs
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
response = requests.get(url, headers=headers)
cookie = re.findall(r'(cookie=.*?)location', response.text)[0]
js_code = "function get_cookies(){"+cookie+"return cookie}"
print(execjs.compile(js_code).call('get_cookies'))
再发起第二次请求,网站同样返回的响应状态码为 521,响应返回的为经过 OB 混淆的 JS 代码;
携带上一步获取到的cookie发起请求
import re
import execjs
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
response = requests.get(url, headers=headers)
cookie = response.headers['Set-Cookie'].split(';')[0].split('=')
cookies = {cookie[0]: cookie[1]}
cookie = re.findall(r'(cookie=.*?)location', response.text)[0]
js_code = "function get_cookies(){"+cookie+"return cookie}"
cookie = execjs.compile(js_code).call('get_cookies').split(';')[0].split('=')
cookies.update({cookie[0]: cookie[1]})
print(cookies)
response = requests.get(url, cookies=cookies, headers=headers)
print(response.text)
获取到一堆混淆后的代码
第三步对混淆代码进行解析逆向
在得到的混淆底部找到go函数
go({"bts":["1665989922.614|0|Q7i","PVU56j4JKfYysAKA6m6TpE%3D"],"chars":"muuwQudeqEBeV7IGhOHlff","ct":"4ed606e7793bd9acaa47abf7f9223f09","ha":"md5","tn":"__jsl_clearance","vt":"3600","wt":"1500"})
go函数主要功能是将传入对象中的参数bts数组第一个参数 + chars中的1个字符串 + chars中的1个字符串 + bts数组第二个参数进行组合成一个字符串cookie。
cookie = data["bts"][0] + i + j + data["bts"][1]
再对字符串进行加密后判断,如果加密后的值与对象中的ct参数值相同,那么组合的字符串参数正确也就获得了cookies中的__jsl_clearance参数。
而对象中的ha参数,表示的就是对应的加密方法,一个有三种MD5、SHA1、SHA256使用特定加密后判断就可以得到正确的__jsl_clearance值
def go(data):
chars = data["chars"]
for i in chars:
for j in chars:
cookie = data["bts"][0] + i + j + data["bts"][1]
if data['ha'] == 'md5':
encrypt = md5()
elif data['ha'] == 'sha1':
encrypt = sha1()
elif data['ha'] == 'sha256':
encrypt = sha256()
encrypt.update(cookie.encode(encoding='utf-8'))
if encrypt.hexdigest() == data['ct']:
return cookie
获取的完整代码:
import ast
import re
import execjs
import requests
from hashlib import md5, sha1, sha256
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
def go(data):
chars = data["chars"]
for i in chars:
for j in chars:
cookie = data["bts"][0] + i + j + data["bts"][1]
if data['ha'] == 'md5':
encrypt = md5()
elif data['ha'] == 'sha1':
encrypt = sha1()
elif data['ha'] == 'sha256':
encrypt = sha256()
encrypt.update(cookie.encode(encoding='utf-8'))
if encrypt.hexdigest() == data['ct']:
return cookie
response = requests.get(url, headers=headers)
cookie = response.headers['Set-Cookie'].split(';')[0].split('=')
cookies = {cookie[0]: cookie[1]}
cookie = re.findall(r'(cookie=.*?)location', response.text)[0]
js_code = "function get_cookies(){"+cookie+"return cookie}"
cookie = execjs.compile(js_code).call('get_cookies').split(';')[0].split('=')
cookies.update({cookie[0]: cookie[1]})
response = requests.get(url, cookies=cookies, headers=headers)
data = ast.literal_eval(re.findall(r'go\((.*?)\)', response.text)[1])
print(go(data))
最后
携带这两个cookies参数,再次发起请求就可以获取到正确的响应值了
已经封装好对应的同步脚本
特此补充!!
PS:应用脚本发起请求时,请求头中的User-Agent最好与脚本中的一致,否则可能会导致请求失败521。(一般来说请求的url都是https开头的,如果是http不带s可能会出错,本人亲踩坑经历)
完整代码请前往github:https://github.com/futurebook/SpiderReverse.git
补充
补环境实现(注意:需要配置nodjs环境,且网站的环境可能有些js代码需进行一些调整)
Ps: 补环境适合单个网站,上述执行脚本适合多网站使用
# -*- coding: utf-8 -*-
# @Time : 2023/3/2 16:01
# @Author : 红后
# @Email : not_enabled@163.com
# @blog : https://www.cnblogs.com/Red-Sun
# @File : GetCookie.py
# @Software: PyCharm
import re
import execjs
import requests
class GetCookie:
def __init__(self, ja3=False, headers=None):
if ja3:
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DH+AES:RSA+AES'
self.session = requests.session()
if headers is None:
self.session.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
else:
self.session.headers = headers
self.cookies = {}
self.sdk_one = '''
function get_cookie(js_text){
document = {};
location = {};
eval(js_text);
return document.cookie
}'''
self.sdk_two = '''
function get_cookie(js_text){
window = global;
window.navigator = {
'userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.50'
};
document = {};
location = {};
setTimeout = function (ref, t) {
ref()
}
eval(js_text);
return document.cookie
}'''
def set_cookie(self, cookie):
k = cookie.split(';')[0].split('=')
return {k[0]: k[1]}
def update_cookies(self, sdk, response):
cookie_js = re.findall(r'<script>(.*?)</script>', response.text)[0]
cookie = execjs.compile(sdk).call('get_cookie', cookie_js)
self.cookies.update(self.set_cookie(cookie))
def get_cookies(self, url):
response = self.session.get(url)
self.cookies.update(response.cookies.get_dict())
self.update_cookies(self.sdk_one, response)
response = self.session.get(url, cookies=self.cookies)
self.update_cookies(self.sdk_two, response)
return self.cookies
if __name__ == '__main__':
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
}
a = GetCookie(headers=headers)
url = '需要过加速乐的链接'
cookies = a.get_cookies(url=url)
response = requests.get(url, headers=headers, cookies=cookies)
print(response)
print(cookies)
print(response.text)