为什么cat binary之后可能出现乱码
一、终端显示
You designate hard character sets (ASCII, DEC supplemental graphics, DEC special graphics, and national replacement character sets) as G0 through G3, using the following escape sequence formats.
Escape Sequence | Designated As |
---|---|
1/11 2/8 ESC ( {final} |
G0 |
1/11 2/9 ESC ) {final} |
G1 |
1/11 2/10 ESC * {final} |
G2 |
1/11 2/11 ESC + {final} |
G3 |
NOTE: You cannot designate a G2 and G3 set in VT100 mode.
The final character in the escape sequences above represents the character set you want to designate. Table 4-4 lists the available character sets and their associated final character.
To designate a character set as any of the graphic sets (G0 through G3), you must include a final character in one of the escape sequences in Table 4-4. For example, to designate the ASCII character set as the G0 graphic set you would use the following escape sequence.
ESC ( B
To designate the ASCII character set as the G2 graphic set you would use this escape sequence.
ESC * B
Note that there is a definite pattern of escape sequences in the designation process.
四、关于字符集的说明
2.4 Character Sets
You cannot change the functions of the C0 or C1 codes. However, you can map different sets of graphic characters into the GL and/or GR codes. The sets are stored in the terminal as a graphic repertoire. But you cannot use these graphics character sets until you map them into the GL or GR codes. Chapter 4 describes the commands for mapping graphic character sets into GL or GR.
The terminal's graphic repertoire consists of the following character sets, described in the following sections.
- DEC multinational (consists of the ASCII graphics set and DEC supplemental graphics set)
- DEC special graphics
- National replacement character (NRC) sets
- Down-line-loadable
2.4.1 DEC Multinational Character Set
By factory default, when you power up or reset the terminal, the DEC multinational character set (Table 2-3) is mapped into the 8-bit code matrix (columns 0 through 15).
The 7-bit compatible left half of the DEC multinational set is the ASCII graphics set. The C0 codes are the ASCII control characters, and the GL codes are the ASCII graphics set.
The 8-bit compatible right half of the DEC multinational set includes the C1 8-bit control characters in columns 8 and 9. The GR codes are the DEC supplemental graphics set. The DEC supplemental graphics set has alphabetic characters with accents and diacritical marks that appear in the major Western European alphabets. It also has other symbols not included in the ASCII graphics set.
The terminal can work with over a dozen national (Western European) keyboards. All keyboards assume the default DEC multinational character set mapping. The code descriptions in the rest of this manual also assume this mapping. Various characters from the DEC supplemental graphics set appear as standard (printing character) keys on different keyboards.
The DEC supplemental graphics character set is not available in VT52 and VT100 modes.
2.4.2 DEC Special Graphics Character Set
The terminal's graphic repertoire includes the DEC special graphics set (also known as the VT100 line drawing character set). This character set (Table 2-4) has about two-thirds of the ASCII graphic characters. It also has special symbols and short line segments. The line segments let you create a limited range of pictures while still using text mode.
Commands described in Chapter 4 let you map the DEC special graphics set into either GL or GR, replacing either the ASCII graphics set or the DEC supplemental graphics set. Digital recommends that you switch between ASCII and DEC special graphics in GL, because the latter has most of the ASCII graphic characters. Also, this mapping is compatible with a VT100 terminal.
在其中的Table 2-4中显示的是我们看到的所谓“乱码”的内容,可以看到其中的大写应为字母在新的编码中依然保持不变。所谓的“特殊图形集”又叫做“画线字符集”,用来实现一些简单的线条描画,在动态程序控制下可以实现最为原始的“RPG”游戏。
五、修复的方法:
tsecer@ harry echo -e '\016' 遇到SI时出现乱码
+_ece_@ ha__y +_
.ba_h_hi_+-_y .__h2 H-+_T-_C-+fig_S+SE10 chec+.+-g +5age++-bi+32-3.3.2.+a_.gz _e_+ice_
.c-a+ ._+b+e__i-+ age++_fi+e__c c-+-1.+|+ _e--_+.ac+i-+ _+a+ic+e_+
.+e__h_+ .+i+i+f- a++_+i_.+|+ +5_age++ _c_-_-+ec+_+0.01 +a--
.+y_-+_hi_+-_y Cha+geL-g_2.6 bac++- +5age++-bi+32-3.3.2 _ec+_i+y_bac++- +i+de+-
+_ece_@ ha__y UPPER
-ba_h: UPPER: c-++a+d +-+ f-++d
+_ece_@ ha__y ech- -e '\017' 执行echo -e '\017'恢复展示
tsecer@ harry
六、内核vt对该行为的处理
在7bit编码时期,这些特殊字符其实是通过将相同的内码解释为不同显示实现的,而在使用uincode之后,这些字符也有了自己唯一的unicode编码,这些编码在内核的consolemap.c中有定义,对于一些特殊模式下的字符定义了对应的unicode编码:
static unsigned short translations[][256] = {
/* 8-bit Latin-1 mapped to Unicode -- trivial mapping */
{
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f,
0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017,
0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f,
0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027,
0x0028, 0x0029, 0x002a, 0x002b, 0x002c, 0x002d, 0x002e, 0x002f,
0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f,
0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f,
0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x005f,
0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,
0x0068, 0x0069, 0x006a, 0x006b, 0x006c, 0x006d, 0x006e, 0x006f,
0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,
0x0078, 0x0079, 0x007a, 0x007b, 0x007c, 0x007d, 0x007e, 0x007f,
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087,
0x0088, 0x0089, 0x008a, 0x008b, 0x008c, 0x008d, 0x008e, 0x008f,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097,
0x0098, 0x0099, 0x009a, 0x009b, 0x009c, 0x009d, 0x009e, 0x009f,
0x00a0, 0x00a1, 0x00a2, 0x00a3, 0x00a4, 0x00a5, 0x00a6, 0x00a7,
0x00a8, 0x00a9, 0x00aa, 0x00ab, 0x00ac, 0x00ad, 0x00ae, 0x00af,
0x00b0, 0x00b1, 0x00b2, 0x00b3, 0x00b4, 0x00b5, 0x00b6, 0x00b7,
0x00b8, 0x00b9, 0x00ba, 0x00bb, 0x00bc, 0x00bd, 0x00be, 0x00bf,
0x00c0, 0x00c1, 0x00c2, 0x00c3, 0x00c4, 0x00c5, 0x00c6, 0x00c7,
0x00c8, 0x00c9, 0x00ca, 0x00cb, 0x00cc, 0x00cd, 0x00ce, 0x00cf,
0x00d0, 0x00d1, 0x00d2, 0x00d3, 0x00d4, 0x00d5, 0x00d6, 0x00d7,
0x00d8, 0x00d9, 0x00da, 0x00db, 0x00dc, 0x00dd, 0x00de, 0x00df,
0x00e0, 0x00e1, 0x00e2, 0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7,
0x00e8, 0x00e9, 0x00ea, 0x00eb, 0x00ec, 0x00ed, 0x00ee, 0x00ef,
0x00f0, 0x00f1, 0x00f2, 0x00f3, 0x00f4, 0x00f5, 0x00f6, 0x00f7,
0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00fd, 0x00fe, 0x00ff
},
/* VT100 graphics mapped to Unicode */
{
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f,
0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017,
0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f,
0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027,
0x0028, 0x0029, 0x002a, 0x2192, 0x2190, 0x2191, 0x2193, 0x002f,
0x2588, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f,
0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f,
0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x00a0,
0x25c6, 0x2592, 0x2409, 0x240c, 0x240d, 0x240a, 0x00b0, 0x00b1,
0x2591, 0x240b, 0x2518, 0x2510, 0x250c, 0x2514, 0x253c, 0x23ba,
0x23bb, 0x2500, 0x23bc, 0x23bd, 0x251c, 0x2524, 0x2534, 0x252c,
0x2502, 0x2264, 0x2265, 0x03c0, 0x2260, 0x00a3, 0x00b7, 0x007f,
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087,
0x0088, 0x0089, 0x008a, 0x008b, 0x008c, 0x008d, 0x008e, 0x008f,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097,
0x0098, 0x0099, 0x009a, 0x009b, 0x009c, 0x009d, 0x009e, 0x009f,
0x00a0, 0x00a1, 0x00a2, 0x00a3, 0x00a4, 0x00a5, 0x00a6, 0x00a7,
0x00a8, 0x00a9, 0x00aa, 0x00ab, 0x00ac, 0x00ad, 0x00ae, 0x00af,
0x00b0, 0x00b1, 0x00b2, 0x00b3, 0x00b4, 0x00b5, 0x00b6, 0x00b7,
0x00b8, 0x00b9, 0x00ba, 0x00bb, 0x00bc, 0x00bd, 0x00be, 0x00bf,
0x00c0, 0x00c1, 0x00c2, 0x00c3, 0x00c4, 0x00c5, 0x00c6, 0x00c7,
0x00c8, 0x00c9, 0x00ca, 0x00cb, 0x00cc, 0x00cd, 0x00ce, 0x00cf,
0x00d0, 0x00d1, 0x00d2, 0x00d3, 0x00d4, 0x00d5, 0x00d6, 0x00d7,
0x00d8, 0x00d9, 0x00da, 0x00db, 0x00dc, 0x00dd, 0x00de, 0x00df,
0x00e0, 0x00e1, 0x00e2, 0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7,
0x00e8, 0x00e9, 0x00ea, 0x00eb, 0x00ec, 0x00ed, 0x00ee, 0x00ef,
0x00f0, 0x00f1, 0x00f2, 0x00f3, 0x00f4, 0x00f5, 0x00f6, 0x00f7,
0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00fd, 0x00fe, 0x00ff
},
……