猿人学web端爬虫攻防大赛赛题第7题——动态字体,随风漂移

题目网址: https://match.yuanrenxue.cn/match/7

解题步骤

  1. 看流量包。
    image

  2. 在看数据包的session中没有任何加密字段,请求头中也没有加密的参数。
    image

  3. 响应数据中的如&#xb428形式的数据与页面中的数字是一一对应的,直接搞一个字典,获取每个页面的数据然后对照一下就完事了。可惜没那么简单,再访问同一个页面,你会发现对照关系变了。
    image

  4. 所以上面的想法是有问题的,再看响应数据中的woff字段值也跟着变了,说明跟woff字段可能有关系。打断点,触发一下。
    image

    woff 文件是字体文件,实际上就是编码和字符的映射表,如 섴,&#x 是字符前缀,c134 是字符对应的编码

    ttf = data.woff;
    $('.font').text('').append('<style type="text/css">@font-face { font-family:"fonteditor";src: url(data:font/truetype;charset=utf-8;base64,' + ttf + '); }</style>');
    

    src中是woff文件的下载地址,这里可以看到 woff 文件被保存为了 ttf 格式,通过 python 将其下载下来:

    from fontTools.ttLib import TTFont  # pip install fontTools
    from base64 import b64decode
    from parsel import Selector  # pip install parsel
    
    
    def demo(data):
    	"""data为接口返回的内容"""
    
    	with open('7.ttf', mode='wb') as file:
    		file.write(b64decode(data['woff']))  # 将 woff 字段 b64解码后写入到文件
    
    	font = TTFont('7.ttf')  # 加载字体文件
    	font.saveXML('7.xml')  # 保存为xml文件
    	# 读取 xml 文件
    	with open('7.xml', mode='r', encoding='utf-8') as f:
    		xml_data = f.read()
    
    	select = Selector(xml_data)
    	glyf = select.css('glyf > TTGlyph')  # 获取 glyf 下所有的 TTglyph 标签
    	for TTGlyph in glyf[1:]:  # 第 0 个标签的值是不需要的,所以从 第 1 个元素开始遍历
    		name = TTGlyph.css('::attr(name)').get().replace('uni', '')  # 获取 TTGlyph 标签里对应的 name 属性,并将 uni 替换为空
    
    		pt_tag = TTGlyph.css('pt')  # 获取 TTGlyph 下所有的 pt 标签
    		on_list = []
    		for pt in pt_tag:  # 遍历 pt 标签
    			on = pt.css('::attr(on)').get()  # 获取 pt 标签里对应的 on 属性
    			on_list.append(on)  # 将解析的到 on 属性值添加到列表中
    		print(f"'{''.join(on_list)}': '{name}',")  # 打印出字典形式的字符串
    		# ''.join(on_list) 对应字典键
    		# name 对应字典值
    resp = {
    	"woff": "AAEAAAAKAIAAAwAgT1MvMv/BOMUAAAEoAAAAYGNtYXDlXV9jAAABpAAAAYZnbHlmS99AtgAAA0QAAAQCaGVhZB6SqjgAAACsAAAANmhoZWEG0QEyAAAA5AAAACRobXR4ArwAAAAAAYgAAAAabG9jYQWOBpkAAAMsAAAAGG1heHABGABFAAABCAAAACBuYW1lUGhGMAAAB0gAAAJzcG9zdCjmdk0AAAm8AAAAiAABAAAAAQAA83P19l8PPPUACQPoAAAAANnIUd8AAAAA418UhQAH/+wCRwMDAAAACAACAAAAAAAAAAEAAAQk/qwAfgJYAAAAKwItAAEAAAAAAAAAAAAAAAAAAAACAAEAAAALADkAAwAAAAAAAgAAAAoACgAAAP8AAAAAAAAABAIqAZAABQAIAtED0wAAAMQC0QPTAAACoABEAWkAAAIABQMAAAAAAAAAAAAAEAAAAAAAAAAAAAAAUGZFZABApDXINwQk/qwAfgQkAVQAAAABAAAAAAAAAAAAAAAgAAAAZAAAAlgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAMAAAAcAAEAAAAAAIAAAwABAAAAHAAEAGQAAAAUABAAAwAEpDWmhaeTtoTCV8KGxCHFkcg3//8AAKQ1poWnk7aEwlfChcQhxZHIN///W89ZhVhwSYQ9sgAAO+A6dTfQAAEAAAAAAAAAAAAAAAoAAAAAAAAAAAACAAUAAAEGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALADoAcgDIAQkBHwFiAX8BsAHuAgEAAQAH/+wAIwAHAAIAADczFQccBxsAAAIAI//yAhcDAwAMABkAAAEmBwYQFxYyNzYQJyYHMhcWEAcGIicmEDc2ASR8REFBRP1rBwdrgWUfKSkfvCEfHyEC3iWiU/7EdWtrdQE8U6J4bDn+9DtdXTsBDDlsAAABACr/8gIhAt4AJAAAEwMzNjM2MxYWFRQGIyInJjcjFhcWMzI3NjU0JiMGBwYHIzchNWMVQhwpIi1ZY3RGSBs7BWMURVJCfloyfHIfNisbDCABWALe/lVOKBtIZFVcKRg/PVAyTjZjeIYDFQsu+10AAwAj//ICJQLeAB8ALAA4AAABIgcGFRQXFhc1JgcGFRQWMjc2NTQnJgcVNjc0NTQnJgcyFxYUBwYiJyY0NzYTMhcWFAcGIiY0NzYBJGk4NQEcRTgTQov4OEcnI0BDMidUcVMlU0kjqAZJKSxDXCU2NiifZjQyAt5AJWc4HjkJAwRJMUZhbjk1YUYxSQQDCTkeOGclQFcfD38bJiYbfw8f/tYjOoIkJkqCOiMAAAIAKv/yAhoDAwAbACgAAAEiBwYVFBcWFzY2NCYjIgciByM3NDc2FzYXMyYDFjcWFAcGJyInJjQ2AUFsUllBRZFbe0iNMyNAJQcNSBdVjiArKLtGUgInJ0xaIi9gAt6DS7N7m1QBAX/ich1MJHFSWg8Pd9z+mhsncZ8rPQkqfkd0AAABAHAAAAFoAt4ACQAAAQYGBxU2NxEzEQEjH2oqfChUAt5ALhhUHDv9pQLeAAEAKv/yAhcC3gArAAABIgcGFzM2NjcWFxYUBiMjFTMWFhQHBiMmJyYnIxYXFhcyNjU0JyYHNjU0JgE3XFRHC0kTPlREHkVdTENNPmERTEJfMikFWBRHP31ceiIVRXRqAt5ALm5CRwMDIw+MMEwCRX0xKxEXQEB8KUEBfGQ7OWMzJ4NBfQAAAgAaAAACRwLeAAoADgAAAQEVIRUzNTM1IxEHMxEhAYP+lwFpSnp6UAb+1gLe/hlilZVDAgaL/oUAAAEANAAAAjAC3gAdAAABIgYHMzY3NhcyFhUUBwYHBgcGFSE1ITY3Njc2NCYBQW2JC0cRJSBaRnJaHltzJk8B5v58P2hwJl2UAt6Wckw3NgRKIV4ZQz1VLjtrQFFfVmMJqYMAAgAq//ICFwLeABwAKAAAASIGFRQXFjMWNjczFxQHBiMiJyMWMzY3NjU0JyYHMhcWFAYjIiY1NDYBJGCaRTqYCG4YEQ9FM1NgOEEEzXlNNS5OhGYgUo9JQ0dPAt6Fc3I6Ogk7NAmDWEhstAF4Wb2hTm5PNjeUVIIqXksAAAEASwAAAg8C3gAGAAATFSEBMwE1SwGG/uBTAQsC3mX9hwKeQAAAAAAAABIA3gABAAAAAAAAABcAAAABAAAAAAABAAwAFwABAAAAAAACAAcAIwABAAAAAAADABQAKgABAAAAAAAEABQAKgABAAAAAAAFAAsAPgABAAAAAAAGABQAKgABAAAAAAAKACsASQABAAAAAAALABMAdAADAAEECQAAAC4AhwADAAEECQABABgAtQADAAEECQACAA4AzQADAAEECQADACgA2wADAAEECQAEACgA2wADAAEECQAFABYBAwADAAEECQAGACgA2wADAAEECQAKAFYBGQADAAEECQALACYBb0NyZWF0ZWQgYnkgZm9udC1jYXJyaWVyLlBpbmdGYW5nIFNDUmVndWxhci5QaW5nRmFuZy1TQy1SZWd1bGFyVmVyc2lvbiAxLjBHZW5lcmF0ZWQgYnkgc3ZnMnR0ZiBmcm9tIEZvbnRlbGxvIHByb2plY3QuaHR0cDovL2ZvbnRlbGxvLmNvbQBDAHIAZQBhAHQAZQBkACAAYgB5ACAAZgBvAG4AdAAtAGMAYQByAHIAaQBlAHIALgBQAGkAbgBnAEYAYQBuAGcAIABTAEMAUgBlAGcAdQBsAGEAcgAuAFAAaQBuAGcARgBhAG4AZwAtAFMAQwAtAFIAZQBnAHUAbABhAHIAVgBlAHIAcwBpAG8AbgAgADEALgAwAEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAgAAAAAAAAAOAAAAAAAAAAAAAAAAAAAAAAAAAAAACwALAAABCgEEAQkBCAELAQYBBQEDAQIBBwd1bmljMjU3B3VuaWI2ODQHdW5pYzI4NQd1bmljODM3B3VuaWM1OTEHdW5pYTY4NQd1bmlhNDM1B3VuaWE3OTMHdW5pYzQyMQd1bmljMjg2",
    	"status": "1",
    	"state": "success",
    	"data": [
    		{
    			"value": "&#xc591 &#xb684 &#xc591 &#xa435 "
    		},
    		{
    			"value": "&#xc285 &#xc421 &#xc837 &#xc286 "
    		},
    		{
    			"value": "&#xc591 &#xc257 &#xc285 &#xa793 "
    		},
    		{
    			"value": "&#xa793 &#xc285 &#xc285 &#xc421 "
    		},
    		{
    			"value": "&#xa685 &#xc421 &#xc591 &#xa685 "
    		},
    		{
    			"value": "&#xa793 &#xa793 &#xc257 &#xa793 "
    		},
    		{
    			"value": "&#xb684 &#xc286 &#xc257 &#xc421 "
    		},
    		{
    			"value": "&#xa793 &#xc837 &#xc421 &#xc421 "
    		},
    		{
    			"value": "&#xc837 &#xc285 &#xc421 &#xc421 "
    		},
    		{
    			"value": "&#xa685 &#xc837 &#xa685 &#xa793 "
    		}
    	]
    }
    demo(resp)
    

    运行得到映射结果。
    image
    image

  5. 解析得到映射字典。

    on_map = {
    	'1001101111': '1',
    	'101010101101010001010101101010101010010010010101001000010': '8',
    	'10101010100001010111010101101010010101000': '6',
    	'10100100100101010010010010': '0',
    	'1110101001001010110101010100101011111': '5',
    	'10010101001110101011010101010101000100100': '9',
    	'100110101001010101011110101000': '2',
    	'111111111111111': '4',
    	'1111111': '7',
    	'10101100101000111100010101011010100101010100': '3',
    }
    
  6. 有了映射字典就可以请求并解析到正确的数字了

    from fontTools.ttLib import TTFont  # pip install fontTools
    from base64 import b64decode
    from parsel import Selector
    import requests
    
    def save_font(font_data):
    	on_map = {
    		'1001101111': '1',
    		'101010101101010001010101101010101010010010010101001000010': '8',
    		'10101010100001010111010101101010010101000': '6',
    		'10100100100101010010010010': '0',
    		'1110101001001010110101010100101011111': '5',
    		'10010101001110101011010101010101000100100': '9',
    		'100110101001010101011110101000': '2',
    		'111111111111111': '4',
    		'1111111': '7',
    		'10101100101000111100010101011010100101010100': '3',
    	}
    
    	with open('7.ttf', mode='wb') as f:
    		f.write(b64decode(font_data['woff']))  # 保存字体文件
    
    	font = TTFont('7.ttf')  # 加载字体文件
    	font.saveXML('7.xml')  # 保存为xml文件
    
    	# 读取 xml 文件
    	with open('7.xml', mode='r', encoding='utf-8') as f:
    		xml_data = f.read()
    
    	select = Selector(xml_data)
    	# 获取 <glyf> --> 所有 TTGlyph 标签
    	TTGlyph = select.css('glyf > TTGlyph')[1:]  # 第 0 个标签的信息不需要,从第 1 个标签开始获取
    	rep_dist = {}
    	for tt in TTGlyph:
    		name = tt.css('::attr(name)').get().replace('uni', '')  # TTGlyph标签 --> name 值
    		pt = tt.css('pt')  # 获取 Glyph标签 --> TTGlyph标签 --> pt标签对应的 on 值
    		on_list = []
    		for pt_tag in pt:
    			on_list.append(pt_tag.css('::attr(on)').get())
    		rep_dist[name] = on_map[''.join(on_list)]  # 根据映射将 on 值替换成正确的数字
    
    	result_dict = []
    	for data in font_data['data']:
    		num_list = []
    		for nums in data['value'].replace('&#x', '').split(' ')[0:-1]:
    			num_list.append(rep_dist[nums])
    		result_dict.append(int(''.join(num_list)))
    		#     print(rep_dist[nums], end='')
    		# print()
    	return result_dict
    
    
    headers = {
    	"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
    }
    cookies = {
    	"sessionid": "mhgiqaaxkqpt3ybutbub96ubi9rr5gtk"
    }
    url = "https://match.yuanrenxue.cn/api/match/7?page=1"
    resp = requests.get(url, headers=headers, cookies=cookies)
    print(save_font(resp.json()))
    

    运行结果如下。
    image
    与第一页的数据一致,没问题。

  7. 接下来就是获取所有召唤师的名字了,在js代码中进行了处理。
    image
    对应的js代码。

    let page = 1;
    let name = ['极镀ギ紬荕', '爷灬霸气傀儡', '梦战苍穹', '傲世哥', 'мaη肆風聲', '一刀メ隔世', '横刀メ绝杀', 'Q不死你R死你', '魔帝殤邪', '封刀不再战', '倾城孤狼', '戎马江湖', '狂得像风', '影之哀伤', '謸氕づ独尊', '傲视狂杀', '追风之梦', '枭雄在世', '傲视之巅', '黑夜刺客', '占你心为王', '爷来取你狗命', '御风踏血', '凫矢暮城', '孤影メ残刀', '野区霸王', '噬血啸月', '风逝无迹', '帅的睡不着', '血色杀戮者', '冷视天下', '帅出新高度', '風狆瑬蒗', '灵魂禁锢', 'ヤ地狱篮枫ゞ', '溅血メ破天', '剑尊メ杀戮', '塞外う飛龍', '哥‘K纯帅', '逆風祈雨', '恣意踏江山', '望断、天涯路', '地獄惡灵', '疯狂メ孽杀', '寂月灭影', '骚年霸称帝王', '狂杀メ无赦', '死灵的哀伤', '撩妹界扛把子', '霸刀☆藐视天下', '潇洒又能打', '狂卩龙灬巅丷峰', '羁旅天涯.', '南宫沐风', '风恋绝尘', '剑下孤魂', '一蓑烟雨', '领域★倾战', '威龙丶断魂神狙', '辉煌战绩', '屎来运赚', '伱、Bu够档次', '九音引魂箫', '骨子里的傲气', '霸海断长空', '没枪也很狂', '死魂★之灵'];
    let heroArray = []
    for (let i = 0; i <= 4; i++) {
    	let yyq = 1;
    	// ['', '', '', '', '', '', '', '', '', ''] 对应一页十条数据
    	['', '', '', '', '', '', '', '', '', ''].forEach((index, val) => {
    		// console.log(name[yyq + (page - 1) * 10]);
    		heroArray.push(name[yyq + (page - 1) * 10])
    		yyq += 1
    	})
    	page += 1;
    }
    console.log(heroArray)
    

    运行结果。
    image
    与页面一致

  8. 最后就可以编写python代码获得胜点最高的召唤师了。

    import requests
    from base64 import b64decode
    from fontTools.ttLib import TTFont  # pip install fontTools
    from parsel import Selector
    
    headers = {
    	"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
    }
    cookies = {
    	"sessionid": "xxxxx"
    }
    
    
    def send_match6(page):
    	url = "https://match.yuanrenxue.cn/api/match/7"
    	params = {
    		"page": f"{page}"
    	}
    	response = requests.get(url, headers=headers, cookies=cookies, params=params)
    	return {
    		'woff': response.json()['woff'],
    		'data': response.json()['data']
    	}
    
    
    def save_font(font_data):
    	on_map = {
    		'1001101111': '1',
    		'101010101101010001010101101010101010010010010101001000010': '8',
    		'10101010100001010111010101101010010101000': '6',
    		'10100100100101010010010010': '0',
    		'1110101001001010110101010100101011111': '5',
    		'10010101001110101011010101010101000100100': '9',
    		'100110101001010101011110101000': '2',
    		'111111111111111': '4',
    		'1111111': '7',
    		'10101100101000111100010101011010100101010100': '3',
    	}
    
    	with open('7.ttf', mode='wb') as f:
    		f.write(b64decode(font_data['woff']))  # 保存字体文件
    
    	font = TTFont('7.ttf')  # 加载字体文件
    	font.saveXML('7.xml')  # 保存为xml文件
    
    	# 读取 xml 文件
    	with open('7.xml', mode='r', encoding='utf-8') as f:
    		xml_data = f.read()
    
    	select = Selector(xml_data)
    	# 获取 <glyf> --> TTGlyph
    	TTGlyph = select.css('glyf > TTGlyph')[1:]  # 第 0 个标签的信息不需要,从第 1 个标签开始获取
    	rep_dist = {}
    	for tt in TTGlyph:
    		name = tt.css('::attr(name)').get().replace('uni', '')  # TTGlyph标签 --> name 值
    		pt = tt.css('pt')  # 获取 Glyph标签 --> TTGlyph标签 --> pt标签对应的 on 值
    		on_list = []
    		for pt_tag in pt:
    			on_list.append(pt_tag.css('::attr(on)').get())
    		rep_dist[name] = on_map[''.join(on_list)]  # 根据映射将 on 值替换成正确的数字
    
    	result_dict = []
    	for data in font_data['data']:
    		num_list = []
    		for nums in data['value'].replace('&#x', '').split(' ')[0:-1]:
    			num_list.append(rep_dist[nums])
    		result_dict.append(int(''.join(num_list)))
    		#     print(rep_dist[nums], end='')
    		# print()
    	return result_dict
    
    
    if __name__ == '__main__':
    	hero_array = [
    		'爷灬霸气傀儡', '梦战苍穹', '傲世哥', 'мaη肆風聲', '一刀メ隔世', '横刀メ绝杀', 'Q不死你R死你', '魔帝殤邪', '封刀不再战', '倾城孤狼',
    		'戎马江湖', '狂得像风', '影之哀伤', '謸氕づ独尊', '傲视狂杀', '追风之梦', '枭雄在世', '傲视之巅', '黑夜刺客', '占你心为王',
    		'爷来取你狗命', '御风踏血', '凫矢暮城', '孤影メ残刀', '野区霸王', '噬血啸月', '风逝无迹', '帅的睡不着', '血色杀戮者', '冷视天下',
    		'帅出新高度', '風狆瑬蒗', '灵魂禁锢', 'ヤ地狱篮枫ゞ', '溅血メ破天', '剑尊メ杀戮', '塞外う飛龍', '哥‘K纯帅', '逆風祈雨', '恣意踏江山',
    		'望断、天涯路', '地獄惡灵', '疯狂メ孽杀', '寂月灭影', '骚年霸称帝王', '狂杀メ无赦', '死灵的哀伤', '撩妹界扛把子', '霸刀☆藐视天下', '潇洒又能打'
    	]  # 正确的英雄排序列表
    	hero_nums = []  # 声明一个列表,用于存储所有的数值
    	for page in range(1, 6):
    		math6_data = send_match6(page)
    		nums_list = save_font(math6_data)
    		print(page, nums_list)  # 打印请求的页数,对应的列表(数值)
    		for num in nums_list:  # 遍历每页得到的列表
    			hero_nums.append(num)  # 将列表数值添加到 hreo_nums 列表中
    	print(hero_nums)  # 打印所有页面的数值
    
    	max_num = 0  # 定义一个变量,用于存储最大的数值
    	for num in hero_nums:
    		# 将 hero_nums 中的每一个数值与 max_num 作比较
    		if num > max_num:  # 如果改数值比 max_num 大
    			max_num = num  # max_num 就等于改数值
    
    	hero_index = hero_nums.index(max_num)  # 获取列表中最大数值的索引
    	print(hero_array[hero_index])  # 获取对应英雄列表的英雄名
    

    运行结果。
    image

  9. 提交,成功通过。
    image

posted @ 2024-11-17 11:48  死不悔改奇男子  阅读(37)  评论(0编辑  收藏  举报