为什么cat binary之后可能出现乱码

一、终端显示

大部分使用SecureCRT的用户可能都会经历这种情况,不小心cat了一个二进制文件,导致整个屏幕显示错乱,使用reset,stty -sane都没有办法恢复,只能重新打开一个终端。我之前就知道它是由于终端中SI控制字符导致的问题,但是更深入的原因没有进一步分析。后来我尝试使用另一种开源的终端putty试图复现并看下究竟是什么原因,但是在putty上并没有重现这个问题,所以这个行为应该是SecureCRT的一个行为(其它处理了SO控制字符的终端也会有这种情况)。
在google上搜索 “secureCRT shift out”,可以找到该软件的SecureCRT(R) 7.2 (Beta) -- December 5, 2013一个更新手册:
  - Added a session INI-file-only option "Ignore Shift Out Sequence"
    that prevents SecureCRT from going into graphics mode when Shift
    Out (\016) is received.
也就是说,该终端如果接收到该字符,默认会进入图形模式,但是也只是在7.2beta版本之后才有这个功能,这个声明的日期是2013年5月,大家通常使用的都没有这么新的。
二、原始终端的模型
早起最为流行的中断就是DEC的vt100终端,虽然不像输出显示中文那么困难,但是这样的终端也面临这多语言化的问题。好在当时的使用者大多是欧洲的用户,通常都是字母文字,ASCII码编码可以满足大部分需求。该终端使用的方法也是对于相同的编码作不同的解释,如果设置成法语,同样的ASCII码就表示为不同的意思。而这个所谓的“表示”其实就是在终端上展示的方式。一个字母的展示对终端硬件来说事实上是一个存放在ROM中的字库。学过《微机原理》的同学应该知道,显示硬件中保存有特定字母的点阵信息,或者大家认为是像素信息也可以。
当时的终端通过定义一系列控制指令(由ESC引导,可以认为是跨机器的指令),可以设置当前终端使用什么样的语言显示信息。这里要注意的是,虽然显示不一样,但是主机和终端之间依然是使用7bit的ascii码,设置语言只是控制硬件显示字库,如何显示这个ascii而已。这一点大家可以通过记录secureCRT日志来实现,当系统出现乱码之后,查看本地的日志文件,其中依然是正确的ASCII码,这说明这个shift out只是修改了相同ascii的显示方式。
早起的vt100终端支持同事在加载4张显示字库,然后通过特定的加载指令来加载两张作为当前ascii表的GL和GR,当前生效的可以通过shift in 和shift out控制字在两者之间切换。
下面是vt200网站上的一张图片,图片原始位置
为什么cat binary之后可能出现乱码 - Tsecer - Tsecer的回音岛
 
三、G0到G3的指定方式

You designate hard character sets (ASCII, DEC supplemental graphics, DEC special graphics, and national replacement character sets) as G0 through G3, using the following escape sequence formats.

Escape SequenceDesignated As
1/11 2/8
ESC   (   {final}
G0
1/11 2/9
ESC   )   {final}
G1
1/11 2/10
ESC   *   {final}
G2
1/11 2/11
ESC   +   {final}
G3

NOTE: You cannot designate a G2 and G3 set in VT100 mode.

The final character in the escape sequences above represents the character set you want to designate. Table 4-4 lists the available character sets and their associated final character.

To designate a character set as any of the graphic sets (G0 through G3), you must include a final character in one of the escape sequences in Table 4-4. For example, to designate the ASCII character set as the G0 graphic set you would use the following escape sequence.

   ESC  (  B

To designate the ASCII character set as the G2 graphic set you would use this escape sequence.

   ESC  *  B

Note that there is a definite pattern of escape sequences in the designation process.

四、关于字符集的说明

来自以下地址

2.4 Character Sets

You cannot change the functions of the C0 or C1 codes. However, you can map different sets of graphic characters into the GL and/or GR codes. The sets are stored in the terminal as a graphic repertoire. But you cannot use these graphics character sets until you map them into the GL or GR codes. Chapter 4 describes the commands for mapping graphic character sets into GL or GR.

The terminal's graphic repertoire consists of the following character sets, described in the following sections.

  • DEC multinational (consists of the ASCII graphics set and DEC supplemental graphics set)
  • DEC special graphics
  • National replacement character (NRC) sets
  • Down-line-loadable

2.4.1 DEC Multinational Character Set

By factory default, when you power up or reset the terminal, the DEC multinational character set (Table 2-3) is mapped into the 8-bit code matrix (columns 0 through 15).

The 7-bit compatible left half of the DEC multinational set is the ASCII graphics set. The C0 codes are the ASCII control characters, and the GL codes are the ASCII graphics set.

The 8-bit compatible right half of the DEC multinational set includes the C1 8-bit control characters in columns 8 and 9. The GR codes are the DEC supplemental graphics set. The DEC supplemental graphics set has alphabetic characters with accents and diacritical marks that appear in the major Western European alphabets. It also has other symbols not included in the ASCII graphics set.

The terminal can work with over a dozen national (Western European) keyboards. All keyboards assume the default DEC multinational character set mapping. The code descriptions in the rest of this manual also assume this mapping. Various characters from the DEC supplemental graphics set appear as standard (printing character) keys on different keyboards.

The DEC supplemental graphics character set is not available in VT52 and VT100 modes.

2.4.2 DEC Special Graphics Character Set

The terminal's graphic repertoire includes the DEC special graphics set (also known as the VT100 line drawing character set). This character set (Table 2-4) has about two-thirds of the ASCII graphic characters. It also has special symbols and short line segments. The line segments let you create a limited range of pictures while still using text mode.

Commands described in Chapter 4 let you map the DEC special graphics set into either GL or GR, replacing either the ASCII graphics set or the DEC supplemental graphics set. Digital recommends that you switch between ASCII and DEC special graphics in GL, because the latter has most of the ASCII graphic characters. Also, this mapping is compatible with a VT100 terminal.

在其中的Table 2-4中显示的是我们看到的所谓“乱码”的内容,可以看到其中的大写应为字母在新的编码中依然保持不变。所谓的“特殊图形集”又叫做“画线字符集”,用来实现一些简单的线条描画,在动态程序控制下可以实现最为原始的“RPG”游戏。

五、修复的方法:

tsecer@ harry echo -e '\016' 遇到SI时出现乱码

 

+_ece_@ ha__y +_

.ba_h_hi_+-_y   .__h2          H-+_T-_C-+fig_S+SE10  chec+.+-g            +5age++-bi+32-3.3.2.+a_.gz  _e_+ice_

.c-a+           ._+b+e__i-+    age++_fi+e__c         c-+-1.+|+            _e--_+.ac+i-+               _+a+ic+e_+

.+e__h_+        .+i+i+f-       a++_+i_.+|+           +5_age++             _c_-_-+ec+_+0.01            +a--

.+y_-+_hi_+-_y  Cha+geL-g_2.6  bac++-                +5age++-bi+32-3.3.2  _ec+_i+y_bac++-             +i+de+-

+_ece_@ ha__y UPPER

-ba_h: UPPER: c-++a+d +-+ f-++d

+_ece_@ ha__y ech- -e '\017' 执行echo -e '\017'恢复展示

 

tsecer@ harry 

六、内核vt对该行为的处理

在7bit编码时期,这些特殊字符其实是通过将相同的内码解释为不同显示实现的,而在使用uincode之后,这些字符也有了自己唯一的unicode编码,这些编码在内核的consolemap.c中有定义,对于一些特殊模式下的字符定义了对应的unicode编码:

static unsigned short translations[][256] = {

  /* 8-bit Latin-1 mapped to Unicode -- trivial mapping */

  {

    0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,

    0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f,

    0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017,

    0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f,

    0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027,

    0x0028, 0x0029, 0x002a, 0x002b, 0x002c, 0x002d, 0x002e, 0x002f,

    0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,

    0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f,

    0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,

    0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f,

    0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,

    0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x005f,

    0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,

    0x0068, 0x0069, 0x006a, 0x006b, 0x006c, 0x006d, 0x006e, 0x006f,

    0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,

    0x0078, 0x0079, 0x007a, 0x007b, 0x007c, 0x007d, 0x007e, 0x007f,

    0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087,

    0x0088, 0x0089, 0x008a, 0x008b, 0x008c, 0x008d, 0x008e, 0x008f,

    0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097,

    0x0098, 0x0099, 0x009a, 0x009b, 0x009c, 0x009d, 0x009e, 0x009f,

    0x00a0, 0x00a1, 0x00a2, 0x00a3, 0x00a4, 0x00a5, 0x00a6, 0x00a7,

    0x00a8, 0x00a9, 0x00aa, 0x00ab, 0x00ac, 0x00ad, 0x00ae, 0x00af,

    0x00b0, 0x00b1, 0x00b2, 0x00b3, 0x00b4, 0x00b5, 0x00b6, 0x00b7,

    0x00b8, 0x00b9, 0x00ba, 0x00bb, 0x00bc, 0x00bd, 0x00be, 0x00bf,

    0x00c0, 0x00c1, 0x00c2, 0x00c3, 0x00c4, 0x00c5, 0x00c6, 0x00c7,

    0x00c8, 0x00c9, 0x00ca, 0x00cb, 0x00cc, 0x00cd, 0x00ce, 0x00cf,

    0x00d0, 0x00d1, 0x00d2, 0x00d3, 0x00d4, 0x00d5, 0x00d6, 0x00d7,

    0x00d8, 0x00d9, 0x00da, 0x00db, 0x00dc, 0x00dd, 0x00de, 0x00df,

    0x00e0, 0x00e1, 0x00e2, 0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7,

    0x00e8, 0x00e9, 0x00ea, 0x00eb, 0x00ec, 0x00ed, 0x00ee, 0x00ef,

    0x00f0, 0x00f1, 0x00f2, 0x00f3, 0x00f4, 0x00f5, 0x00f6, 0x00f7,

    0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00fd, 0x00fe, 0x00ff

  }, 

  /* VT100 graphics mapped to Unicode */

  {

    0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,

    0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f,

    0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017,

    0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f,

    0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027,

    0x0028, 0x0029, 0x002a, 0x2192, 0x2190, 0x2191, 0x2193, 0x002f,

    0x2588, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,

    0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f,

    0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,

    0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f,

    0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,

    0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x00a0,

    0x25c6, 0x2592, 0x2409, 0x240c, 0x240d, 0x240a, 0x00b0, 0x00b1,

    0x2591, 0x240b, 0x2518, 0x2510, 0x250c, 0x2514, 0x253c, 0x23ba,

    0x23bb, 0x2500, 0x23bc, 0x23bd, 0x251c, 0x2524, 0x2534, 0x252c,

    0x2502, 0x2264, 0x2265, 0x03c0, 0x2260, 0x00a3, 0x00b7, 0x007f,

    0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087,

    0x0088, 0x0089, 0x008a, 0x008b, 0x008c, 0x008d, 0x008e, 0x008f,

    0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097,

    0x0098, 0x0099, 0x009a, 0x009b, 0x009c, 0x009d, 0x009e, 0x009f,

    0x00a0, 0x00a1, 0x00a2, 0x00a3, 0x00a4, 0x00a5, 0x00a6, 0x00a7,

    0x00a8, 0x00a9, 0x00aa, 0x00ab, 0x00ac, 0x00ad, 0x00ae, 0x00af,

    0x00b0, 0x00b1, 0x00b2, 0x00b3, 0x00b4, 0x00b5, 0x00b6, 0x00b7,

    0x00b8, 0x00b9, 0x00ba, 0x00bb, 0x00bc, 0x00bd, 0x00be, 0x00bf,

    0x00c0, 0x00c1, 0x00c2, 0x00c3, 0x00c4, 0x00c5, 0x00c6, 0x00c7,

    0x00c8, 0x00c9, 0x00ca, 0x00cb, 0x00cc, 0x00cd, 0x00ce, 0x00cf,

    0x00d0, 0x00d1, 0x00d2, 0x00d3, 0x00d4, 0x00d5, 0x00d6, 0x00d7,

    0x00d8, 0x00d9, 0x00da, 0x00db, 0x00dc, 0x00dd, 0x00de, 0x00df,

    0x00e0, 0x00e1, 0x00e2, 0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7,

    0x00e8, 0x00e9, 0x00ea, 0x00eb, 0x00ec, 0x00ed, 0x00ee, 0x00ef,

    0x00f0, 0x00f1, 0x00f2, 0x00f3, 0x00f4, 0x00f5, 0x00f6, 0x00f7,

    0x00f8, 0x00f9, 0x00fa, 0x00fb, 0x00fc, 0x00fd, 0x00fe, 0x00ff

 

  },

……

posted on 2019-03-07 09:44  tsecer  阅读(1186)  评论(0编辑  收藏  举报

导航