破解/解决‘动态’字体

## 声明：
破解思路由同事提供一些思路完成破解，原文章是他整理的我拷贝过来的，进一步完善封装的的代码我已附上，如有引用或者转发请附上地址或经由我2人其一人同意即可，谢谢~
## 前言
我想大家也是网上搜索到的都是对简单的静态字体破解，简简单单做个映射表而已，这个思路没毛病，可是...
当你要去批量请求帖子网页的时候突然发现每个网页的字体库都不一样的时候。（难道每个网页写个映射表？！？...）
当我要破解这个动态的时候也确实难以下手，所以我也从网上搜了很多资料发现都是标题党内容如出一辙的解决静态的(⊙o⊙)…
我敢说这个文章算是全网首发，这里要感谢一下我的同事能够帮我记录下来总结成这篇文章。ヾ(◍°∇°◍)ﾉﾞ
头大的我连吃饭也在想怎么解决这个东西，然后想到了我用模型算法去训练识别最相近的字母，也就是knn算法。
原理：首先将一个网页的字体作为一个参考把它的点记录在坐标系中，然后用活动的去做一个参考比对最相近的也就是这个字体的坐标，
当然这里用的不的真的knn算法只是因这个算法的思路，然后有了接下来的思路，通过简化，可以用方差来替代向量的算法从而找到最接近的点。

分析页面

打开一个页面, 发现字体文件地址是动态的, 这个倒是好说, 写个正则, 就可以动态匹配出来

先下载下来一个新页面的字体文件, 做一下对比, 如图

头脑风暴ing.gif

(与伙伴对话ing...)

不着急, 还是要冷静下来, 再想想哪里还有突破点

同一个页面的字体文件地址是动态的, 但是, 里面的字体编码和顺序是不会变的呀

可以使用某一个页面的字体文件做一个标准的字体映射表呀!

好像发现了新世界的大门, 可门还没开开, 就被自己堵死了, 就想做出来映射表然后呢!(又要奔腾了)

想呀想呀想呀想, 最后叫上小伙伴一起想

突然就想到了, 虽然那么多不一样, 但是, 但是, 相同文字的坐标点相同呀! 突然又打开了大门

首先排除特别的文字的情况下, 只是在这个字体文件的情况下, 60%的字坐标点一样

那剩下的怎么办呢! 先不管了, 先把这60%给弄出来

提取60%的字体映射表

制作标准编码文字映射表(某页字体文件为准)

def extract_ttf_file(self, file_name, get_word_map=True):
        _font = TTFont(file_name)
        uni_list = _font.getGlyphOrder()[1:]

        # 被替换的字体的列表
        word_list = [
            "坏", "少", "远", "大", "九", "左", "近", "呢", "十", "高", "着",
            "矮", "八", "二", "右", "是", "得", "的", "小", "短", "很", "一", "了",
            "地", "好", "多", "七", "不", "长", "低", "三", "五", "六", "下", "更",
            "和", "四", "上"
        ]
        utf_word_map = {}
        utf_coordinates_map = {}

        for index, uni_code in enumerate(uni_list):
            utf_word_map[uni_code] = word_list[index]
            utf_coordinates_map[uni_code] = list(_font['glyf'][uni_code].coordinates)

        if get_word_map:
            return utf_word_map, utf_coordinates_map
        return utf_coordinates_map

    # self.local_utf_word_map, self.local_utf_coordinates_map = self.extract_ttf_file(self.local_ttf_name)

# According one font file to get font code and font, then use them to make a dict map named font_rule like: {font_code: font} (all the new font code will All data will be based on this table to take the corresponding information))
font_rule = {
    "edd2":"坏",...
    "eca5":"四",
    "ede5":"上"}

制作新标准编码映射表

下载要破解的字体文件, 并替换标准编码字体映射表

def replace_ttf_map(self):
        unicode_mlist_map = []
        new_utf_coordinates_map = self.extract_ttf_file(self.download_ttf_name, get_word_map=False)
        for local_unicode, local_coordinate in self.local_utf_coordinates_map.items():
            coordinate_equal_list = []
            for new_unicode, new_coordinate in new_utf_coordinates_map.items():
                if len(new_coordinate) == len(local_coordinate):
                    coordinate_equal_list.append({"norm_key": local_unicode, "norm_coordinate": local_coordinate, "new_key": new_unicode, "new_coordinate": new_coordinate})

            if len(coordinate_equal_list) == 1:
                unicode_mlist_map.append(coordinate_equal_list[0])for unicode_dict in unicode_mlist_map:
            self.new_unicode_map[unicode_dict["new_key"]] = self.local_utf_word_map[unicode_dict["norm_key"]]

        print("new unicode map extract success\n", self.new_unicode_map)

会得到22个字体的映射表, 共38个:

{'uniED23': '坏', 'uniED75': '少', 'uniEC18': '九', 'uniECB0': '左', 'uniECF7': '呢', 'uniEC7A': '高', 'uniECA7': '矮', 'uniEDD6': '八', 'uniEDA0': '二', 'uniEDF2': '右', 'uniEC96': '的', 'uniEDF0': '小', 'uniEC8B': '短', 'uniECDB': '一', 'uniEDD5': '地', 'uniED8F': '好', 'uniECC1': '多', 'uniEDBA': '七', 'uniEC2A': '长', 'uniED59': '下', 'uniEC94': '和', 'uniED73': '四'}

替换40%的字体映射表

重组新标准映射表

接下来, 就用坐标点来解决, 以下为思路

使用两点坐标差来判断, 但是这个偏差值拿不准

相同文字, 坐标点数量必须一致, 即所有坐标点“(y-x)的平方差的绝对值”的和最小的就为同一个字。

公式：(x1-x2)**2 + (y1-y2)**2

来先试试

 def get_distence(self, norm_coordinate, new_coordinate):
        distance_total = 0
        for index, coordinate_point in enumerate(norm_coordinate):
            distance_total += abs(new_coordinate[index][0] - coordinate_point[0]) + abs(new_coordinate[index][1] - coordinate_point[1])
        return distance_total

然后在重组标准编码, 标准坐标, 新的编码, 和新坐标

(这是想, 找出最相近的坐标, 使用新坐标提取出标准编码, 然后用标准编码提取对应的文字, 在替换成使用本页用的编码映射表)

# 准备替换的编码坐标映射表
{"norm_key": local_unicode, "norm_coordinate": local_coordinate, "new_key": new_unicode, "new_coordinate": new_coordinate}

提取所有坐标点加起来最小的元素

def handle_subtraction(self, coordinate_equal_list):
        coordinate_min_list = []
        for coordinate_equal in coordinate_equal_list:
            n = self.get_distence(coordinate_equal.get('norm_coordinate'), coordinate_equal.get('new_coordinate'))
            coordinate_min_list.append(n)

        return coordinate_equal_list[coordinate_min_list.index(min(coordinate_min_list))]

替换, 生成新的标准映射表

self.new_unicode_map[unicode_dict["new_key"]] = self.local_utf_word_map[unicode_dict["norm_key"]]

加入判断

在以上替换60%的字体映射表再加入一个判断, 改成如下

def replace_ttf_map(self):
        unicode_mlist_map = []
        new_utf_coordinates_map = self.extract_ttf_file(self.download_ttf_name, get_word_map=False)
        for local_unicode, local_coordinate in self.local_utf_coordinates_map.items():
            coordinate_equal_list = []
            for new_unicode, new_coordinate in new_utf_coordinates_map.items():
                if len(new_coordinate) == len(local_coordinate):
                    coordinate_equal_list.append({"norm_key": local_unicode, "norm_coordinate": local_coordinate, "new_key": new_unicode, "new_coordinate": new_coordinate})

            if len(coordinate_equal_list) == 1:
                unicode_mlist_map.append(coordinate_equal_list[0])
            elif len(coordinate_equal_list) > 1:
                min_word = self.handle_subtraction(coordinate_equal_list)
                unicode_mlist_map.append(min_word)
        for unicode_dict in unicode_mlist_map:
            self.new_unicode_map[unicode_dict["new_key"]] = self.local_utf_word_map[unicode_dict["norm_key"]]

        print("new unicode map extract success\n", self.new_unicode_map)

输出一个标准的坐标值, 这里我就不上图进行对比了, 经过对比, 发现没什么问题

{'uniED23': '坏', 'uniED75': '少', 'uniEDC5': '远', 'uniED5A': '大', 'uniEC18': '九', 'uniECB0': '左', 'uniECA5': '近', 'uniECF7': '呢', 'uniED3F': '十', 'uniEC7A': '高', 'uniEC44': '着', 'uniECA7': '矮', 'uniEDD6': '八', 'uniEDA0': '二', 'uniEDF2': '右', 'uniED09': '是', 'uniEC32': '得', 'uniEC96': '的', 'uniEDF0': '小', 'uniEC8B': '短', 'uniED3D': '很', 'uniECDB': '一', 'uniEC60': '了', 'uniEDD5': '地', 'uniED8F': '好', 'uniECC1': '多', 'uniEDBA': '七', 'uniED2D': '不', 'uniEC2A': '长', 'uniED11': '低', 'uniEC5E': '三', 'uniECDD': '五', 'uniEDBC': '六', 'uniED59': '下', 'uniEE02': '更', 'uniEC94': '和', 'uniED73': '四', 'uniED6A': '上'}

源码

# -*- coding: utf-8 -*-
# @Author: Mehaei
# @Date:   2020-01-10 14:51:53
# @Last Modified by:  Presley
# @Last Modified time: 2020-01-13 10:10:13
import re
import os
import requests
from lxml import etree
from fontTools.ttLib import TTFont


class NotFoundFontFileUrl(Exception):
    pass


class CarHomeFont(object):
    def __init__(self, url, *args, **kwargs):
        self.local_ttf_name = "norm_font.ttf"
        self.download_ttf_name = 'new_font.ttf'
        self.new_unicode_map = {}
        self._making_local_code_map()
        self._download_ttf_file(url, self.download_ttf_name)

    def _download_ttf_file(self, url, file_name):
        self.page_html = self.download(url) or ""
        # 获取字体的连接文件
        font_file_name = (re.findall(r",url\('(//.*\.ttf)?'\) format", self.page_html) or [""])[0]
        if not font_file_name:
            raise NotFoundFontFileUrl("not found font file name")
        # 下载字体文件
        file_content = self.download("https:%s" % font_file_name, content=True)
        # 讲字体文件保存到本地
        with open(file_name, 'wb') as f:
            f.write(file_content)
        print("font file download success")

    def _making_local_code_map(self):
        if not os.path.exists(self.local_ttf_name):
            # 这个url为标准字体文件地址, 如要更改, 请手动更改字体列表
            url = "https://club.autohome.com.cn/bbs/thread/62c48ae0f0ae73ef/75904283-1.html"
            self._download_ttf_file(url, self.local_ttf_name)
        self.local_utf_word_map, self.local_utf_coordinates_map = self.extract_ttf_file(self.local_ttf_name)
        print("local ttf load done")

    def get_distence(self, norm_coordinate, new_coordinate):
        distance_total = 0
        for index, coordinate_point in enumerate(norm_coordinate):
            distance_total += abs(new_coordinate[index][0] - coordinate_point[0]) + abs(new_coordinate[index][1] - coordinate_point[1])
        return distance_total

    def handle_subtraction(self, coordinate_equal_list):
        coordinate_min_list = []
        for coordinate_equal in coordinate_equal_list:
            n = self.get_distence(coordinate_equal.get('norm_coordinate'), coordinate_equal.get('new_coordinate'))
            coordinate_min_list.append(n)

        return coordinate_equal_list[coordinate_min_list.index(min(coordinate_min_list))]

    def replace_ttf_map(self):
        unicode_mlist_map = []
        new_utf_coordinates_map = self.extract_ttf_file(self.download_ttf_name, get_word_map=False)
        for local_unicode, local_coordinate in self.local_utf_coordinates_map.items():
            coordinate_equal_list = []
            for new_unicode, new_coordinate in new_utf_coordinates_map.items():
                if len(new_coordinate) == len(local_coordinate):
                    coordinate_equal_list.append({"norm_key": local_unicode, "norm_coordinate": local_coordinate, "new_key": new_unicode, "new_coordinate": new_coordinate})

            if len(coordinate_equal_list) == 1:
                unicode_mlist_map.append(coordinate_equal_list[0])
            elif len(coordinate_equal_list) > 1:
                min_word = self.handle_subtraction(coordinate_equal_list)
                unicode_mlist_map.append(min_word)
        for unicode_dict in unicode_mlist_map:
            self.new_unicode_map[unicode_dict["new_key"]] = self.local_utf_word_map[unicode_dict["norm_key"]]

        print("new unicode map extract success\n", self.new_unicode_map)

    def extract_ttf_file(self, file_name, get_word_map=True):
        _font = TTFont(file_name)
        uni_list = _font.getGlyphOrder()[1:]

        # 被替换的字体的列表
        word_list = [
            "坏", "少", "远", "大", "九", "左", "近", "呢", "十", "高", "着",
            "矮", "八", "二", "右", "是", "得", "的", "小", "短", "很", "一", "了",
            "地", "好", "多", "七", "不", "长", "低", "三", "五", "六", "下", "更",
            "和", "四", "上"
        ]
        utf_word_map = {}
        utf_coordinates_map = {}

        for index, uni_code in enumerate(uni_list):
            utf_word_map[uni_code] = word_list[index]
            utf_coordinates_map[uni_code] = list(_font['glyf'][uni_code].coordinates)

        if get_word_map:
            return utf_word_map, utf_coordinates_map
        return utf_coordinates_map

    def repalce_source_code(self):
        replaced_html = self.page_html
        for utf_code, word in self.new_unicode_map.items():
            replaced_html = replaced_html.replace("&#x%s;" % utf_code[3:].lower(), word)
        return replaced_html

    def get_subject_content(self):
        normal_html = self.repalce_source_code()
        # 使用xpath 获取 主贴
        xp_html = etree.HTML(normal_html)
        subject_text = ''.join(xp_html.xpath('//div[@xname="content"]//div[@class="tz-paragraph"]//text()'))
        return subject_text

    def download(self, url, *args, try_time=5, method="GET", content=False, **kwargs):
        kwargs.setdefault("headers", {})
        kwargs["headers"].update({"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"})
        while try_time:
            try:
                response = requests.request(method.upper(), url, *args, **kwargs)
                if response.ok:
                    if content:
                        return response.content
                    return response.text
                else:
                    continue
            except Exception as e:
                try_time -= 1
                print("download error: %s" % e)


if __name__ == "__main__":
    url = "https://club.autohome.com.cn/bbs/thread/34d6bcc159b717a9/85794510-1.html#pvareaid=6830286"
    car = CarHomeFont(url)
    car.replace_ttf_map()
    text = car.get_subject_content()
    print(text)

  1 import os
  2 
  3 from fontTools.ttLib import TTFont
  4 from utils.font_rule import font_rule
  5 import logging
  6 logging.getLogger("chardet").setLevel(logging.WARNING)
  7 logging.getLogger("fontTools").setLevel(logging.WARNING)
  8 
  9 
 10 def get_distance(t_lst, t_lst2):
 11     distance_lst = []
 12     for i, el in enumerate(t_lst):
 13         distance = (t_lst2[i][0] - el[0]) ** 2 + (t_lst2[i][1] - el[1]) ** 2
 14         distance_lst.append(distance)
 15     # print('[distance_lst]', distance_lst)
 16     num = 0
 17     for n in distance_lst:
 18         num += n
 19     return num
 20 
 21 
 22 def handle_subtraction(lst):
 23     n_lst = []
 24     for d in lst:
 25         old_index = d.get('old_index')
 26         new_index = d.get('new_index')
 27         # n = test(old_index, new_index)
 28         n = get_distance(old_index, new_index)
 29         n_lst.append(n)
 30     # print('[n_lst]', n_lst)
 31     min_n = min(n_lst)
 32     # print('[min_n]', min_n)
 33     return lst[n_lst.index(min_n)]
 34     # for i, num in enumerate(n_lst):
 35     #     if num == min_n:
 36     #         r_dic = lst[i]
 37     #         return r_dic
 38 
 39 
 40 def get_font_map_lst(old_font_lst, new_font_lst, old_font, new_font):
 41     font_map_lst = []
 42     for el in old_font_lst:
 43         u_lst = []
 44         coordinates = list(old_font['glyf'][el].coordinates)
 45         for j in new_font_lst:
 46             new_coordinates = list(new_font['glyf'][j].coordinates)
 47             if len(coordinates) == len(new_coordinates):
 48                 dic = {'old_key': f'{el[-4:]}', 'new_key': f'{j[-4:]}', 'old_index': coordinates,
 49                        'new_index': new_coordinates}
 50                 u_lst.append(dic)
 51         if len(u_lst) > 1:
 52             r_lst = handle_subtraction(u_lst)
 53             font_map_lst.append(r_lst)
 54         elif len(u_lst) == 1:
 55             font_map_lst.append(u_lst.pop())
 56     # print('[len_map_font]', len(font_map_lst))
 57     return font_map_lst
 58 
 59 
 60 def get_font_map(font_map_lst):
 61     font_dic = {}
 62     for map_dic in font_map_lst:
 63         old_key = map_dic.get('old_key')
 64         new_key = map_dic.get('new_key')
 65         font = font_rule.get(old_key.lower())
 66         if not font:
 67             print(map_dic)
 68         font_dic[new_key.lower()]=str(font)
 69     return font_dic
 70 
 71 
 72 def decrypt_font(new_file_name):
 73     old_file_name = 'old_font.ttf'
 74     if not os.path.exists(old_file_name):
 75         old_file_name = './utils/old_font.ttf'
 76     new_file_name = new_file_name
 77     old_font = TTFont(old_file_name)
 78     # print(old_font)
 79     new_font = TTFont(new_file_name)
 80     old_font_lst = old_font.getGlyphOrder()[1:]
 81     print('[old_font_lst]', old_font_lst)
 82     new_font_lst = new_font.getGlyphOrder()[1:]
 83     print('[new_font_lst]', new_font_lst)
 84     font_map_lst = get_font_map_lst(old_font_lst, new_font_lst, old_font, new_font)
 85     # print('font_map_lst', font_map_lst)
 86     font_map = get_font_map(font_map_lst)
 87     # print('[new_font_map]', font_map)
 88     return font_map
 89 
 90 
 91 if __name__ == '__main__':
 92     # old_file_name = 'old_font.ttf'
 93     new_file_name = 'new.ttf'
 94     # old_font = TTFont(old_file_name)
 95     # new_font = TTFont(new_file_name)
 96     # #
 97     # old_font_lst = old_font.getGlyphOrder()[1:]
 98     # new_font_lst = new_font.getGlyphOrder()[1:]
 99     # print(old_font_lst)
100     # print(new_font_lst)
101     # for obvious
102     # old_font.saveXML("old_font.xml")
103     # new_font.saveXML('new.xml')
104     # font_map_lst = get_font_map_lst(old_font_lst, new_font_lst)
105     # font_map = get_font_map(font_map_lst)
106     font_map = decrypt_font(new_file_name)
107     print(font_map)
108     print('[len_font_map]', len(font_map))

进一步封装完善代码

posted @ 2020-08-06 16:13 ￣□￣阅读(234) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

￣□￣