font GSUB
-
Character
is an overloaded term that can mean many things. -
A
code point
is the atomic unit of information.
Text is a sequence of code points.
Each code point is a number which is given meaning by the Unicode standard. -
A
code unit
is the unit of storage of a part of an encoded code point.
In UTF-8 this means 8-bits, in UTF-16 this means 16-bits.
A single code unit may represent a full code point, or part of a code point.
For example, the snowman glyph (☃) is a single code point but 3 UTF-8 code units, and 1 UTF-16 code unit. -
A
grapheme
is a sequence of one or more code points that are displayed as
asingle graphical unit
that a reader recognizes as asingle element of the writing system
.
For example, both a and ä are graphemes, but they may consist of multiple code points
(e.g. ä may be two code points, one for the base character a followed by one for the diaeresis; but there's also an alternative, legacy, single code point representing this grapheme).
Some code points are never part of any grapheme (e.g. the zero-width non-joiner, or directional overrides).
grapheme
字位,书写位(语言书写系统的最小有意义单位)
- A
glyph
is an image, usually stored in a font (which is a collection of glyphs), used to represent graphemes or parts thereof.
Fonts may compose multiple glyphs into a single representation,
for example, if the above ä is a single code point,
a font may choose to render that as two separate, spatially overlaid glyphs.
For OTF, the font's GSUB and GPOS tables contain substitution and positioning information to make this work.
A font may contain multiple alternative glyphs for the same grapheme, too.
glyph
字形