generate chinese character image with Pillow and OpenCV

背景

以图片的形式成成验证码，防止工具类软件自动化暴力破解的攻击。

需要按照不同字体，生成图片，同时还可能添加干扰项：

平移
旋转
添加干扰背景
字形扭曲

或者其他场景，生成不同字体的标准字帖，提供临摹。

参考代码

https://github.com/atlantistin/Blogs/tree/master/20190801-hanzi-datasets

TTF字体

https://zhuanlan.zhihu.com/p/28179203

TTF (TrueType Font) 字体格式是由苹果和微软为 PostScript 而开发的字体格式。在 Mac 和 Windows 操作系统上，TTF 一直是最常见的格式，所有主流浏览器都支持它。然而，IE8 不支持 TTF；且 IE9 上只有被设置成 "installable" 才能支持（译注：别想了，转别的格式吧）。

TTF 允许嵌入最基本的数字版权管理标志————内置标志可以告诉我们字体作者是否允许改字体在 PDF 或者网站等处使用，所以可能会有版权问题。另一个缺点是，TTF 和 OTF 字体是没压缩的，因此他们文件更大。

https://en.wikipedia.org/wiki/TrueType

TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating systems.

The primary strength of TrueType was originally that it offered font developers a high degree of control over precisely how their fonts are displayed, right down to particular pixels, at various font sizes. With widely varying rendering technologies in use today, pixel-level control is no longer certain in a TrueType font.

https://docs.microsoft.com/en-us/typography/truetype/

TrueType is a digital font technology designed by Apple Computer, and now used by both Apple and Microsoft in their operating systems. Microsoft has distributed millions of quality TrueType fonts in hundreds of different styles, including them in its range of products and the popular TrueType Font Packs.

TrueType fonts offer the highest possible quality on computer screens and printers, and include a range of features which make them easy to use.

中文字体下载

https://www.fonts.net.cn/fonts-zh-1.html

Code（优化后）

https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/pillow/chinese_chars/app.py

from PIL import Image, ImageDraw, ImageFont, ImageFilter
import numpy as np
import glob as gb
import shutil
import cv2
import os


def prepare_output_dir():
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)
        return

    shutil.rmtree(output_dir)
    os.makedirs(output_dir)


def make_background_generator():
    background_image_paths = gb.glob(os.path.join(background_dir, "*"))

    background_image_count = len(background_image_paths)
    
    while True:
        background_image_index = np.random.randint(background_image_count)
        one_background_image_path = background_image_paths[background_image_index]

        yield Image.open(one_background_image_path).resize(image_size)


def draw_char_on_image(char_image, char, font_path):
    font_size = height // 6 * 5
    if not regular_mode:
        font_size = np.random.randint(height // 2, height // 6 * 5)

    font = ImageFont.truetype(font_path, font_size, encoding="utf-8")

    draw = ImageDraw.Draw(char_image)

    # font color is black by default
    rgb = (0, 0, 0)
    if not regular_mode:
        r, g, b = np.random.randint(150, 255), np.random.randint(150, 255), np.random.randint(150, 255)
        rgb = (r, g, b)

    xy = ((height - font_size) // 2, (width - font_size) // 2)

    draw.text(xy, char, rgb, font=font)


def shear_image(char_image):
    theta = np.random.randint(-15, 15) * np.pi / 180

    m_shear = np.array([[1, np.tan(theta), 0], [0, 1, 0]], dtype=np.float32)

    image_shear = cv2.warpAffine(np.array(char_image), m_shear, image_size)

    char_image = Image.fromarray(image_shear)

    return char_image


def get_char_image(char, font_path):
    # setup white background
    image_data = np.zeros(image_shape, dtype="u1")
    image_data.fill(255)
    char_image = Image.fromarray(image_data)

    draw_char_on_image(char_image, char, font_path)

    # uglify the form of char
    if not regular_mode:
        char_image = char_image.rotate(np.random.randint(-100, 100))

        char_image = char_image.filter(ImageFilter.GaussianBlur(radius=0.7))

        char_image = shear_image(char_image)

    return char_image


def make_char_image_generator_for_all_fonts(char):
    font_paths = gb.glob(os.path.join(font_dir, "*"))
    print(font_paths)

    for font_path in font_paths:
        yield get_char_image(char, font_path)


def make_char_image_generator_randomly(char, n):
    font_paths = gb.glob(os.path.join(font_dir, "*"))
    print(font_paths)

    font_count = len(font_paths)
    while True:
        if n <= 0:
            break
        else:
            n -= 1

        font_path = font_paths[np.random.randint(font_count)]

        char_image = get_char_image(char, font_path)

        yield char_image


def make_char_image_generator(char, random=False, n=3):
    if not random:
        char_image_generator = make_char_image_generator_for_all_fonts(char)
        return char_image_generator

    char_image_generator = make_char_image_generator_randomly(char, n)
    return char_image_generator


def get_all_chars(chars_file):
    with open(chars_file, 'r', encoding="utf-8") as fh:
        all_chars = fh.read()
        print(all_chars)

        return all_chars


def prepare_one_char_dir(one_char):
    char_dir = os.path.join(output_dir, one_char)

    if not os.path.exists(char_dir):
        os.makedirs(char_dir)

    return char_dir


def main():
    prepare_output_dir()

    all_chars = get_all_chars(chars_file)
    for i, one_char in enumerate(all_chars):
        print(f"{i} word is {one_char}")

        char_dir = prepare_one_char_dir(one_char)

        char_image_generator = make_char_image_generator(one_char)
        background_generator = make_background_generator()

        char_and_background_generator = zip(char_image_generator, background_generator)

        for index, (char, back) in enumerate(char_and_background_generator):
            img_data = np.array(char)

            # need to blend one background pic
            if not regular_mode:
                img_data = np.array(char) // 5 * 3 + np.array(back) // 5 * 2

            img_path = os.path.join(char_dir, str(index) + ".jpg")
            img = Image.fromarray(img_data)
            img.save(img_path, "JPEG")


height, width = 64, 64
image_size = (height, width)
image_shape = (*image_size, 3)

# white background and black font
regular_mode = True

output_dir = "output"
background_dir = "background"
font_dir = "font"
chars_file = "all.txt"

if __name__ == "__main__":
    main()

默认运行，生成白底黑字的图片:

可以关闭 regular_mode，运行查看有干扰项的图片。

仿射变换

干扰代码中，有一段对字体进行形变的代码，使用了 cv2的 warpAffine 接口

warp 为弯曲的意思

affine 为仿射的意思

def shear_image(char_image):
    theta = np.random.randint(-15, 15) * np.pi / 180

    m_shear = np.array([[1, np.tan(theta), 0], [0, 1, 0]], dtype=np.float32)

    image_shear = cv2.warpAffine(np.array(char_image), m_shear, image_size)

    char_image = Image.fromarray(image_shear)

    return char_image

affine

https://www.merriam-webster.com/dictionary/affine

Definition of affine

(Entry 1 of 2)

: a relative by marriage : in-law

affine
adjective

Definition of affine (Entry 2 of 2)

: of, relating to, or being a transformation (such as a translation, a rotation, or a uniform stretching) that carries straight lines into straight lines and parallel lines into parallel lines but may alter distance between points and angles between lines affine geometry

n.

1.

亲眷

2.

亲属

adj.

1.

同族的

2.

对应的

3.

〔化〕亲合的;〔数〕拟似的;远交的

4.

【数学】远交的,仿射的

仿射变换

不同维空间的线性变换。

https://baike.baidu.com/item/%E4%BB%BF%E5%B0%84%E5%87%BD%E6%95%B0/9276178

https://www.cnblogs.com/bnuvincent/p/6691189.html

仿射变换(Affine Transformation)
Affine Transformation是一种二维坐标到二维坐标之间的线性变换，保持二维图形的“平直性”（译注：straightness，即变换后直线还是直线不会打弯，圆弧还是圆弧）和“平行性”（译注：parallelness，其实是指保二维图形间的相对位置关系不变，平行线还是平行线，相交直线的交角不变。）。

c和d的区别可以看下图：

仿射变换可以通过一系列的原子变换的复合来实现，包括：平移（Translation）、缩放（Scale）、翻转（Flip）、旋转（Rotation）和剪切（Shear）。

仿射变换可以用下面公式表示：

参考：http://wenku.baidu.com/view/826a796027d3240c8447ef20.html

shear变换 -- 剪切力变换

https://www.quora.com/What-is-the-difference-between-stress-and-shear-stress

See the picture of the element below for an idea of normal and shear stresses, and how each deforms the element.

https://www.ques10.com/p/22033/define-the-terms-with-example-1-reflection-2-shear/

The three basic transformations of scaling, rotating, and translating are the most useful and most common. There are some other transformations which are useful in certain applications. Two such transformations are reflection and shear.

Reflection:-

A reflection is a transformation that produces a mirror image of an object relative to an axis of reflection. We can choose an axis of reflection in the xy plane or perpendicular to the xy plane.

Shear:-

A transformation that slants the shape of an object is called the shear transformation.Two common shearing transfor-mations are used.One shifts x co-ordinate values and other shifts y co-ordinate values. However, in both the cases only one co-ordinate (x or y) changes its co-ordinates and other preserves its values.

X Shear:-

The x shear preserves the y co-ordinates, but changes the x values which causes vertical lines to tilt right or left as shown in the figure below . The transformation matrix for x shear is given as

Y shear:-

The y shear preserves the x coordinates, but changes the y values which causes horizontal lines to transform into lines which slope up or down, as shown in the figure below. The transformation matrix for y shear is given as

0