ASCII 和 Unicode
ASCII(美国标准信息交换代码)和Unicode都是字符编码标准。Unicode 已经取代了 ASCII 成为主要的字符编码方案,因为它更加全面并且支持更多语言和符号。Unicode 的广泛应用使得不同语言和文化之间的字符表示更加简便和统一。
范围
ASCII是一个7位字符编码标准,定义了128个字符,包括26个大写字母、26个小写字母、数字0-9、标点符号和控制字符。
Unicode是一个更大的字符集,用于表示世界上几乎所有的字符和符号,包括各种语言的字符、特殊符号、表情符号等。Unicode采用变长编码,最常见的是UTF-8和UTF-16。
编码方式
ASCII使用固定长度的编码,每个字符占用7比特(1个字节)。
Unicode使用变长编码,字符的存储长度取决于所使用的具体编码方案。例如,在UTF-8编码中,一个ASCII字符占用1字节,而某些Unicode字符可能需要2个或更多字节。
覆盖范围
ASCII适用于英语等较少字符的语言文本,无法表示其他语言(如中文、日文、俄文等)的字符。
Unicode可以覆盖几乎所有语言的字符,使得不同语言之间互相兼容,并且支持emoji、特殊符号和各种文字表现形式。
打印 ASCII 表
+------+------+------+------+---------------------------------+
| Dec | Hex | Oct | Char | Description |
+------+------+------+------+---------------------------------+
| 0 | 00 | 000 | | NUL (null terminator) |
| 1 | 01 | 001 | | SOH (start of header) |
| 2 | 02 | 002 | | STX (start of text) |
| 3 | 03 | 003 | | ETX (end of text) |
| 4 | 04 | 004 | | EOT (end of transmission) |
| 5 | 05 | 005 | | ENQ (enquiry) |
| 6 | 06 | 006 | | ACK (acknowledge) |
| 7 | 07 | 007 | | BEL (bell, alert) |
| 8 | 08 | 010 | | BS (backspace) |
| 9 | 09 | 011 | | HT (horizontal tab) |
| 10 | 0A | 012 | | LF (line feed, new line) |
| 11 | 0B | 013 | | VT (vertical tab) |
| 12 | 0C | 014 | | FF (form feed, new page) |
| 13 | 0D | 015 | | CR (carriage return) |
| 14 | 0E | 016 | | SO (shift out) |
| 15 | 0F | 017 | | SI (shift in) |
| 16 | 10 | 020 | | DLE (data link escape) |
| 17 | 11 | 021 | | DC1 (device control 1, XON) |
| 18 | 12 | 022 | | DC2 (device control 2) |
| 19 | 13 | 023 | | DC3 (device control 3, XOFF) |
| 20 | 14 | 024 | | DC4 (device control 4) |
| 21 | 15 | 025 | | NAK (negative acknowledge) |
| 22 | 16 | 026 | | SYN (synchronous idle) |
| 23 | 17 | 027 | | ETB (end of transmission block) |
| 24 | 18 | 030 | | CAN (cancel) |
| 25 | 19 | 031 | | EM (end of medium) |
| 26 | 1A | 032 | | SUB (substitute) |
| 27 | 1B | 033 | | ESC (escape) |
| 28 | 1C | 034 | | FS (file separator) |
| 29 | 1D | 035 | | GS (group separator) |
| 30 | 1E | 036 | | RS (record separator) |
| 31 | 1F | 037 | | US (unit separator) |
| 32 | 20 | 040 | | Space (space character) |
| 33 | 21 | 041 | ! | ! |
| 34 | 22 | 042 | " | " |
| 35 | 23 | 043 | # | # |
| 36 | 24 | 044 | $ | $ |
| 37 | 25 | 045 | % | % |
| 38 | 26 | 046 | & | & |
| 39 | 27 | 047 | ' | ' |
| 40 | 28 | 050 | ( | ( |
| 41 | 29 | 051 | ) | ) |
| 42 | 2A | 052 | * | * |
| 43 | 2B | 053 | + | + |
| 44 | 2C | 054 | , | , |
| 45 | 2D | 055 | - | - |
| 46 | 2E | 056 | . | . |
| 47 | 2F | 057 | / | / |
| 48 | 30 | 060 | 0 | 0 |
| 49 | 31 | 061 | 1 | 1 |
| 50 | 32 | 062 | 2 | 2 |
| 51 | 33 | 063 | 3 | 3 |
| 52 | 34 | 064 | 4 | 4 |
| 53 | 35 | 065 | 5 | 5 |
| 54 | 36 | 066 | 6 | 6 |
| 55 | 37 | 067 | 7 | 7 |
| 56 | 38 | 070 | 8 | 8 |
| 57 | 39 | 071 | 9 | 9 |
| 58 | 3A | 072 | : | : |
| 59 | 3B | 073 | ; | ; |
| 60 | 3C | 074 | < | < |
| 61 | 3D | 075 | = | = |
| 62 | 3E | 076 | > | > |
| 63 | 3F | 077 | ? | ? |
| 64 | 40 | 100 | @ | @ |
| 65 | 41 | 101 | A | A |
| 66 | 42 | 102 | B | B |
| 67 | 43 | 103 | C | C |
| 68 | 44 | 104 | D | D |
| 69 | 45 | 105 | E | E |
| 70 | 46 | 106 | F | F |
| 71 | 47 | 107 | G | G |
| 72 | 48 | 110 | H | H |
| 73 | 49 | 111 | I | I |
| 74 | 4A | 112 | J | J |
| 75 | 4B | 113 | K | K |
| 76 | 4C | 114 | L | L |
| 77 | 4D | 115 | M | M |
| 78 | 4E | 116 | N | N |
| 79 | 4F | 117 | O | O |
| 80 | 50 | 120 | P | P |
| 81 | 51 | 121 | Q | Q |
| 82 | 52 | 122 | R | R |
| 83 | 53 | 123 | S | S |
| 84 | 54 | 124 | T | T |
| 85 | 55 | 125 | U | U |
| 86 | 56 | 126 | V | V |
| 87 | 57 | 127 | W | W |
| 88 | 58 | 130 | X | X |
| 89 | 59 | 131 | Y | Y |
| 90 | 5A | 132 | Z | Z |
| 91 | 5B | 133 | [ | [ |
| 92 | 5C | 134 | \ | \ |
| 93 | 5D | 135 | ] | ] |
| 94 | 5E | 136 | ^ | ^ |
| 95 | 5F | 137 | _ | _ |
| 96 | 60 | 140 | ` | ` |
| 97 | 61 | 141 | a | a |
| 98 | 62 | 142 | b | b |
| 99 | 63 | 143 | c | c |
| 100 | 64 | 144 | d | d |
| 101 | 65 | 145 | e | e |
| 102 | 66 | 146 | f | f |
| 103 | 67 | 147 | g | g |
| 104 | 68 | 150 | h | h |
| 105 | 69 | 151 | i | i |
| 106 | 6A | 152 | j | j |
| 107 | 6B | 153 | k | k |
| 108 | 6C | 154 | l | l |
| 109 | 6D | 155 | m | m |
| 110 | 6E | 156 | n | n |
| 111 | 6F | 157 | o | o |
| 112 | 70 | 160 | p | p |
| 113 | 71 | 161 | q | q |
| 114 | 72 | 162 | r | r |
| 115 | 73 | 163 | s | s |
| 116 | 74 | 164 | t | t |
| 117 | 75 | 165 | u | u |
| 118 | 76 | 166 | v | v |
| 119 | 77 | 167 | w | w |
| 120 | 78 | 170 | x | x |
| 121 | 79 | 171 | y | y |
| 122 | 7A | 172 | z | z |
| 123 | 7B | 173 | { | { |
| 124 | 7C | 174 | | | | |
| 125 | 7D | 175 | } | } |
| 126 | 7E | 176 | ~ | ~ |
| 127 | 7F | 177 | | DEL (delete) |
+------+------+------+------+---------------------------------+
ascii_table.py
def print_ascii_table():
# 详细的ASCII描述字典,为控制字符和一些特殊字符提供简洁的描述
descriptions = {
0: "NUL (null terminator)", # 空字符,字符串结束标志
1: "SOH (start of header)", # 标题开始
2: "STX (start of text)", # 文本开始
3: "ETX (end of text)", # 文本结束
4: "EOT (end of transmission)", # 传输结束
5: "ENQ (enquiry)", # 请求
6: "ACK (acknowledge)", # 确认回应
7: "BEL (bell, alert)", # 响铃
8: "BS (backspace)", # 退格
9: "HT (horizontal tab)", # 水平制表符
10: "LF (line feed, new line)", # 换行
11: "VT (vertical tab)", # 垂直制表符
12: "FF (form feed, new page)", # 换页
13: "CR (carriage return)", # 回车
14: "SO (shift out)", # 锁定释放
15: "SI (shift in)", # 锁定
16: "DLE (data link escape)", # 数据链路转义
17: "DC1 (device control 1, XON)", # 设备控制1
18: "DC2 (device control 2)", # 设备控制2
19: "DC3 (device control 3, XOFF)", # 设备控制3
20: "DC4 (device control 4)", # 设备控制4
21: "NAK (negative acknowledge)", # 否认
22: "SYN (synchronous idle)", # 同步空闲
23: "ETB (end of transmission block)", # 传输块结束
24: "CAN (cancel)", # 取消
25: "EM (end of medium)", # 媒介结束
26: "SUB (substitute)", # 替补
27: "ESC (escape)", # ESC
28: "FS (file separator)", # 文件分隔符
29: "GS (group separator)", # 组分隔符
30: "RS (record separator)", # 记录分隔符
31: "US (unit separator)", # 单元分隔符
32: "Space (space character)", # 空格
127: "DEL (delete)" # 删除
# 从33到126为所有可打印字符,这些描述被省略以简化代码
}
# 打印表格头部
header = "| Dec | Hex | Oct | Char | Description |"
divider = "+" + "-" * 6 + "+" + "-" * 6 + "+" + "-" * 6 + "+" + "-" * 6 + "+" + "-" * 33 + "+"
print(divider)
print(header)
print(divider)
# 遍历128个ASCII值,并打印每个字符的详细信息
for i in range(128):
char = chr(i) # 将ASCII码转换为字符
is_printable = char.isprintable() # 检查字符是否可打印。控制字符、空格、换行符等不可显示字符将被视为不可打印。
display_char = char if is_printable else ' ' # 如果可打印,则显示字符;否则显示空格
desc = descriptions.get(i, display_char) # 获取描述或默认描述
print(f"| {i:>3} | {i:02X} | {i:03o} | {display_char:^4} | {desc:<31} |")
print(divider)
print_ascii_table()