猿人学web端爬虫攻防大赛赛题第7题——动态字体,随风漂移
题目网址: https://match.yuanrenxue.cn/match/7
解题步骤
-
看流量包。
-
在看数据包的session中没有任何加密字段,请求头中也没有加密的参数。
-
响应数据中的如
됨
形式的数据与页面中的数字是一一对应的,直接搞一个字典,获取每个页面的数据然后对照一下就完事了。可惜没那么简单,再访问同一个页面,你会发现对照关系变了。
-
所以上面的想法是有问题的,再看响应数据中的
woff
字段值也跟着变了,说明跟woff
字段可能有关系。打断点,触发一下。
woff 文件是字体文件,实际上就是编码和字符的映射表,如 섴,&#x 是字符前缀,c134 是字符对应的编码
ttf = data.woff; $('.font').text('').append('<style type="text/css">@font-face { font-family:"fonteditor";src: url(data:font/truetype;charset=utf-8;base64,' + ttf + '); }</style>');
src
中是woff文件的下载地址,这里可以看到 woff 文件被保存为了 ttf 格式,通过 python 将其下载下来:from fontTools.ttLib import TTFont # pip install fontTools from base64 import b64decode from parsel import Selector # pip install parsel def demo(data): """data为接口返回的内容""" with open('7.ttf', mode='wb') as file: file.write(b64decode(data['woff'])) # 将 woff 字段 b64解码后写入到文件 font = TTFont('7.ttf') # 加载字体文件 font.saveXML('7.xml') # 保存为xml文件 # 读取 xml 文件 with open('7.xml', mode='r', encoding='utf-8') as f: xml_data = f.read() select = Selector(xml_data) glyf = select.css('glyf > TTGlyph') # 获取 glyf 下所有的 TTglyph 标签 for TTGlyph in glyf[1:]: # 第 0 个标签的值是不需要的,所以从 第 1 个元素开始遍历 name = TTGlyph.css('::attr(name)').get().replace('uni', '') # 获取 TTGlyph 标签里对应的 name 属性,并将 uni 替换为空 pt_tag = TTGlyph.css('pt') # 获取 TTGlyph 下所有的 pt 标签 on_list = [] for pt in pt_tag: # 遍历 pt 标签 on = pt.css('::attr(on)').get() # 获取 pt 标签里对应的 on 属性 on_list.append(on) # 将解析的到 on 属性值添加到列表中 print(f"'{''.join(on_list)}': '{name}',") # 打印出字典形式的字符串 # ''.join(on_list) 对应字典键 # name 对应字典值 resp = { "woff": "AAEAAAAKAIAAAwAgT1MvMv/BOMUAAAEoAAAAYGNtYXDlXV9jAAABpAAAAYZnbHlmS99AtgAAA0QAAAQCaGVhZB6SqjgAAACsAAAANmhoZWEG0QEyAAAA5AAAACRobXR4ArwAAAAAAYgAAAAabG9jYQWOBpkAAAMsAAAAGG1heHABGABFAAABCAAAACBuYW1lUGhGMAAAB0gAAAJzcG9zdCjmdk0AAAm8AAAAiAABAAAAAQAA83P19l8PPPUACQPoAAAAANnIUd8AAAAA418UhQAH/+wCRwMDAAAACAACAAAAAAAAAAEAAAQk/qwAfgJYAAAAKwItAAEAAAAAAAAAAAAAAAAAAAACAAEAAAALADkAAwAAAAAAAgAAAAoACgAAAP8AAAAAAAAABAIqAZAABQAIAtED0wAAAMQC0QPTAAACoABEAWkAAAIABQMAAAAAAAAAAAAAEAAAAAAAAAAAAAAAUGZFZABApDXINwQk/qwAfgQkAVQAAAABAAAAAAAAAAAAAAAgAAAAZAAAAlgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAMAAAAcAAEAAAAAAIAAAwABAAAAHAAEAGQAAAAUABAAAwAEpDWmhaeTtoTCV8KGxCHFkcg3//8AAKQ1poWnk7aEwlfChcQhxZHIN///W89ZhVhwSYQ9sgAAO+A6dTfQAAEAAAAAAAAAAAAAAAoAAAAAAAAAAAACAAUAAAEGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALADoAcgDIAQkBHwFiAX8BsAHuAgEAAQAH/+wAIwAHAAIAADczFQccBxsAAAIAI//yAhcDAwAMABkAAAEmBwYQFxYyNzYQJyYHMhcWEAcGIicmEDc2ASR8REFBRP1rBwdrgWUfKSkfvCEfHyEC3iWiU/7EdWtrdQE8U6J4bDn+9DtdXTsBDDlsAAABACr/8gIhAt4AJAAAEwMzNjM2MxYWFRQGIyInJjcjFhcWMzI3NjU0JiMGBwYHIzchNWMVQhwpIi1ZY3RGSBs7BWMURVJCfloyfHIfNisbDCABWALe/lVOKBtIZFVcKRg/PVAyTjZjeIYDFQsu+10AAwAj//ICJQLeAB8ALAA4AAABIgcGFRQXFhc1JgcGFRQWMjc2NTQnJgcVNjc0NTQnJgcyFxYUBwYiJyY0NzYTMhcWFAcGIiY0NzYBJGk4NQEcRTgTQov4OEcnI0BDMidUcVMlU0kjqAZJKSxDXCU2NiifZjQyAt5AJWc4HjkJAwRJMUZhbjk1YUYxSQQDCTkeOGclQFcfD38bJiYbfw8f/tYjOoIkJkqCOiMAAAIAKv/yAhoDAwAbACgAAAEiBwYVFBcWFzY2NCYjIgciByM3NDc2FzYXMyYDFjcWFAcGJyInJjQ2AUFsUllBRZFbe0iNMyNAJQcNSBdVjiArKLtGUgInJ0xaIi9gAt6DS7N7m1QBAX/ich1MJHFSWg8Pd9z+mhsncZ8rPQkqfkd0AAABAHAAAAFoAt4ACQAAAQYGBxU2NxEzEQEjH2oqfChUAt5ALhhUHDv9pQLeAAEAKv/yAhcC3gArAAABIgcGFzM2NjcWFxYUBiMjFTMWFhQHBiMmJyYnIxYXFhcyNjU0JyYHNjU0JgE3XFRHC0kTPlREHkVdTENNPmERTEJfMikFWBRHP31ceiIVRXRqAt5ALm5CRwMDIw+MMEwCRX0xKxEXQEB8KUEBfGQ7OWMzJ4NBfQAAAgAaAAACRwLeAAoADgAAAQEVIRUzNTM1IxEHMxEhAYP+lwFpSnp6UAb+1gLe/hlilZVDAgaL/oUAAAEANAAAAjAC3gAdAAABIgYHMzY3NhcyFhUUBwYHBgcGFSE1ITY3Njc2NCYBQW2JC0cRJSBaRnJaHltzJk8B5v58P2hwJl2UAt6Wckw3NgRKIV4ZQz1VLjtrQFFfVmMJqYMAAgAq//ICFwLeABwAKAAAASIGFRQXFjMWNjczFxQHBiMiJyMWMzY3NjU0JyYHMhcWFAYjIiY1NDYBJGCaRTqYCG4YEQ9FM1NgOEEEzXlNNS5OhGYgUo9JQ0dPAt6Fc3I6Ogk7NAmDWEhstAF4Wb2hTm5PNjeUVIIqXksAAAEASwAAAg8C3gAGAAATFSEBMwE1SwGG/uBTAQsC3mX9hwKeQAAAAAAAABIA3gABAAAAAAAAABcAAAABAAAAAAABAAwAFwABAAAAAAACAAcAIwABAAAAAAADABQAKgABAAAAAAAEABQAKgABAAAAAAAFAAsAPgABAAAAAAAGABQAKgABAAAAAAAKACsASQABAAAAAAALABMAdAADAAEECQAAAC4AhwADAAEECQABABgAtQADAAEECQACAA4AzQADAAEECQADACgA2wADAAEECQAEACgA2wADAAEECQAFABYBAwADAAEECQAGACgA2wADAAEECQAKAFYBGQADAAEECQALACYBb0NyZWF0ZWQgYnkgZm9udC1jYXJyaWVyLlBpbmdGYW5nIFNDUmVndWxhci5QaW5nRmFuZy1TQy1SZWd1bGFyVmVyc2lvbiAxLjBHZW5lcmF0ZWQgYnkgc3ZnMnR0ZiBmcm9tIEZvbnRlbGxvIHByb2plY3QuaHR0cDovL2ZvbnRlbGxvLmNvbQBDAHIAZQBhAHQAZQBkACAAYgB5ACAAZgBvAG4AdAAtAGMAYQByAHIAaQBlAHIALgBQAGkAbgBnAEYAYQBuAGcAIABTAEMAUgBlAGcAdQBsAGEAcgAuAFAAaQBuAGcARgBhAG4AZwAtAFMAQwAtAFIAZQBnAHUAbABhAHIAVgBlAHIAcwBpAG8AbgAgADEALgAwAEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAgAAAAAAAAAOAAAAAAAAAAAAAAAAAAAAAAAAAAAACwALAAABCgEEAQkBCAELAQYBBQEDAQIBBwd1bmljMjU3B3VuaWI2ODQHdW5pYzI4NQd1bmljODM3B3VuaWM1OTEHdW5pYTY4NQd1bmlhNDM1B3VuaWE3OTMHdW5pYzQyMQd1bmljMjg2", "status": "1", "state": "success", "data": [ { "value": "양 뚄 양 ꐵ " }, { "value": "슅 쐡 젷 슆 " }, { "value": "양 쉗 슅 ꞓ " }, { "value": "ꞓ 슅 슅 쐡 " }, { "value": "ꚅ 쐡 양 ꚅ " }, { "value": "ꞓ ꞓ 쉗 ꞓ " }, { "value": "뚄 슆 쉗 쐡 " }, { "value": "ꞓ 젷 쐡 쐡 " }, { "value": "젷 슅 쐡 쐡 " }, { "value": "ꚅ 젷 ꚅ ꞓ " } ] } demo(resp)
运行得到映射结果。
-
解析得到映射字典。
on_map = { '1001101111': '1', '101010101101010001010101101010101010010010010101001000010': '8', '10101010100001010111010101101010010101000': '6', '10100100100101010010010010': '0', '1110101001001010110101010100101011111': '5', '10010101001110101011010101010101000100100': '9', '100110101001010101011110101000': '2', '111111111111111': '4', '1111111': '7', '10101100101000111100010101011010100101010100': '3', }
-
有了映射字典就可以请求并解析到正确的数字了
from fontTools.ttLib import TTFont # pip install fontTools from base64 import b64decode from parsel import Selector import requests def save_font(font_data): on_map = { '1001101111': '1', '101010101101010001010101101010101010010010010101001000010': '8', '10101010100001010111010101101010010101000': '6', '10100100100101010010010010': '0', '1110101001001010110101010100101011111': '5', '10010101001110101011010101010101000100100': '9', '100110101001010101011110101000': '2', '111111111111111': '4', '1111111': '7', '10101100101000111100010101011010100101010100': '3', } with open('7.ttf', mode='wb') as f: f.write(b64decode(font_data['woff'])) # 保存字体文件 font = TTFont('7.ttf') # 加载字体文件 font.saveXML('7.xml') # 保存为xml文件 # 读取 xml 文件 with open('7.xml', mode='r', encoding='utf-8') as f: xml_data = f.read() select = Selector(xml_data) # 获取 <glyf> --> 所有 TTGlyph 标签 TTGlyph = select.css('glyf > TTGlyph')[1:] # 第 0 个标签的信息不需要,从第 1 个标签开始获取 rep_dist = {} for tt in TTGlyph: name = tt.css('::attr(name)').get().replace('uni', '') # TTGlyph标签 --> name 值 pt = tt.css('pt') # 获取 Glyph标签 --> TTGlyph标签 --> pt标签对应的 on 值 on_list = [] for pt_tag in pt: on_list.append(pt_tag.css('::attr(on)').get()) rep_dist[name] = on_map[''.join(on_list)] # 根据映射将 on 值替换成正确的数字 result_dict = [] for data in font_data['data']: num_list = [] for nums in data['value'].replace('&#x', '').split(' ')[0:-1]: num_list.append(rep_dist[nums]) result_dict.append(int(''.join(num_list))) # print(rep_dist[nums], end='') # print() return result_dict headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" } cookies = { "sessionid": "mhgiqaaxkqpt3ybutbub96ubi9rr5gtk" } url = "https://match.yuanrenxue.cn/api/match/7?page=1" resp = requests.get(url, headers=headers, cookies=cookies) print(save_font(resp.json()))
运行结果如下。
与第一页的数据一致,没问题。 -
接下来就是获取所有召唤师的名字了,在js代码中进行了处理。
对应的js代码。let page = 1; let name = ['极镀ギ紬荕', '爷灬霸气傀儡', '梦战苍穹', '傲世哥', 'мaη肆風聲', '一刀メ隔世', '横刀メ绝杀', 'Q不死你R死你', '魔帝殤邪', '封刀不再战', '倾城孤狼', '戎马江湖', '狂得像风', '影之哀伤', '謸氕づ独尊', '傲视狂杀', '追风之梦', '枭雄在世', '傲视之巅', '黑夜刺客', '占你心为王', '爷来取你狗命', '御风踏血', '凫矢暮城', '孤影メ残刀', '野区霸王', '噬血啸月', '风逝无迹', '帅的睡不着', '血色杀戮者', '冷视天下', '帅出新高度', '風狆瑬蒗', '灵魂禁锢', 'ヤ地狱篮枫ゞ', '溅血メ破天', '剑尊メ杀戮', '塞外う飛龍', '哥‘K纯帅', '逆風祈雨', '恣意踏江山', '望断、天涯路', '地獄惡灵', '疯狂メ孽杀', '寂月灭影', '骚年霸称帝王', '狂杀メ无赦', '死灵的哀伤', '撩妹界扛把子', '霸刀☆藐视天下', '潇洒又能打', '狂卩龙灬巅丷峰', '羁旅天涯.', '南宫沐风', '风恋绝尘', '剑下孤魂', '一蓑烟雨', '领域★倾战', '威龙丶断魂神狙', '辉煌战绩', '屎来运赚', '伱、Bu够档次', '九音引魂箫', '骨子里的傲气', '霸海断长空', '没枪也很狂', '死魂★之灵']; let heroArray = [] for (let i = 0; i <= 4; i++) { let yyq = 1; // ['', '', '', '', '', '', '', '', '', ''] 对应一页十条数据 ['', '', '', '', '', '', '', '', '', ''].forEach((index, val) => { // console.log(name[yyq + (page - 1) * 10]); heroArray.push(name[yyq + (page - 1) * 10]) yyq += 1 }) page += 1; } console.log(heroArray)
运行结果。
与页面一致 -
最后就可以编写python代码获得胜点最高的召唤师了。
import requests from base64 import b64decode from fontTools.ttLib import TTFont # pip install fontTools from parsel import Selector headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" } cookies = { "sessionid": "xxxxx" } def send_match6(page): url = "https://match.yuanrenxue.cn/api/match/7" params = { "page": f"{page}" } response = requests.get(url, headers=headers, cookies=cookies, params=params) return { 'woff': response.json()['woff'], 'data': response.json()['data'] } def save_font(font_data): on_map = { '1001101111': '1', '101010101101010001010101101010101010010010010101001000010': '8', '10101010100001010111010101101010010101000': '6', '10100100100101010010010010': '0', '1110101001001010110101010100101011111': '5', '10010101001110101011010101010101000100100': '9', '100110101001010101011110101000': '2', '111111111111111': '4', '1111111': '7', '10101100101000111100010101011010100101010100': '3', } with open('7.ttf', mode='wb') as f: f.write(b64decode(font_data['woff'])) # 保存字体文件 font = TTFont('7.ttf') # 加载字体文件 font.saveXML('7.xml') # 保存为xml文件 # 读取 xml 文件 with open('7.xml', mode='r', encoding='utf-8') as f: xml_data = f.read() select = Selector(xml_data) # 获取 <glyf> --> TTGlyph TTGlyph = select.css('glyf > TTGlyph')[1:] # 第 0 个标签的信息不需要,从第 1 个标签开始获取 rep_dist = {} for tt in TTGlyph: name = tt.css('::attr(name)').get().replace('uni', '') # TTGlyph标签 --> name 值 pt = tt.css('pt') # 获取 Glyph标签 --> TTGlyph标签 --> pt标签对应的 on 值 on_list = [] for pt_tag in pt: on_list.append(pt_tag.css('::attr(on)').get()) rep_dist[name] = on_map[''.join(on_list)] # 根据映射将 on 值替换成正确的数字 result_dict = [] for data in font_data['data']: num_list = [] for nums in data['value'].replace('&#x', '').split(' ')[0:-1]: num_list.append(rep_dist[nums]) result_dict.append(int(''.join(num_list))) # print(rep_dist[nums], end='') # print() return result_dict if __name__ == '__main__': hero_array = [ '爷灬霸气傀儡', '梦战苍穹', '傲世哥', 'мaη肆風聲', '一刀メ隔世', '横刀メ绝杀', 'Q不死你R死你', '魔帝殤邪', '封刀不再战', '倾城孤狼', '戎马江湖', '狂得像风', '影之哀伤', '謸氕づ独尊', '傲视狂杀', '追风之梦', '枭雄在世', '傲视之巅', '黑夜刺客', '占你心为王', '爷来取你狗命', '御风踏血', '凫矢暮城', '孤影メ残刀', '野区霸王', '噬血啸月', '风逝无迹', '帅的睡不着', '血色杀戮者', '冷视天下', '帅出新高度', '風狆瑬蒗', '灵魂禁锢', 'ヤ地狱篮枫ゞ', '溅血メ破天', '剑尊メ杀戮', '塞外う飛龍', '哥‘K纯帅', '逆風祈雨', '恣意踏江山', '望断、天涯路', '地獄惡灵', '疯狂メ孽杀', '寂月灭影', '骚年霸称帝王', '狂杀メ无赦', '死灵的哀伤', '撩妹界扛把子', '霸刀☆藐视天下', '潇洒又能打' ] # 正确的英雄排序列表 hero_nums = [] # 声明一个列表,用于存储所有的数值 for page in range(1, 6): math6_data = send_match6(page) nums_list = save_font(math6_data) print(page, nums_list) # 打印请求的页数,对应的列表(数值) for num in nums_list: # 遍历每页得到的列表 hero_nums.append(num) # 将列表数值添加到 hreo_nums 列表中 print(hero_nums) # 打印所有页面的数值 max_num = 0 # 定义一个变量,用于存储最大的数值 for num in hero_nums: # 将 hero_nums 中的每一个数值与 max_num 作比较 if num > max_num: # 如果改数值比 max_num 大 max_num = num # max_num 就等于改数值 hero_index = hero_nums.index(max_num) # 获取列表中最大数值的索引 print(hero_array[hero_index]) # 获取对应英雄列表的英雄名
运行结果。
-
提交,成功通过。