code point of € ,and é
https://www.compart.com/en/unicode/U+20AC
Name: Euro Sign[1]
Unicode Version: 2.1 (May 1998)[2]
Block: Currency Symbols, U+20A0 - U+20CF[3]
Plane: Basic Multilingual Plane, U+0000 - U+FFFF[3]
Script: Code for undetermined script (Zyyy) [4]
Category: Currency Symbol (Sc) [1]
Bidirectional Class: European Terminator (ET) [1]
Combining Class: Not Reordered (0) [1]
Character is Mirrored: No [1]
GCGID: SC200000[5]
HTML Entity:
€
€
€
UTF-8 Encoding: 0xE2 0x82 0xAC
UTF-16 Encoding: 0x20AC
UTF-32 Encoding: 0x000020AC
https://www.utf8-chartable.de/unicode-utf8-table.pl
U+20AC € e2 82 ac EURO SIGN
UTF-8的encoding
Since the restriction of the Unicode code-space to 21-bit values in 2003, UTF-8 is defined to encode code points in one to four bytes, depending on the number of significant bits in the numerical value of the code point. The following table shows the structure of the encoding. The x characters are replaced by the bits of the code point.
First code point | Last code point | Byte 1 | Byte 2 | Byte 3 | Byte 4 |
---|---|---|---|---|---|
U+0000 | U+007F | 0xxxxxxx | |||
U+0080 | U+07FF | 110xxxxx | 10xxxxxx | ||
U+0800 | U+FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx | |
U+10000 | [nb 2]U+10FFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |
因为€对应的code point是,0x20AC,对应于三字节的位置。所以需要做一个转换。
0x20AC的二进制是0010000010101100
按照上面的进行处理,得到三个字节11100010 10000010 10101100,对应的十六进制就是0x E2 82 AC
utf-8的字符串€转换成其他编码进行识别的话
[Test] public void Test20210413001() { //€ //UTF-8 Encoding: 0xE2 0x82 0xAC //UTF - 16 Encoding: 0x20AC //UTF - 32 Encoding: 0x000020AC string str = "€"; var array = Encoding.UTF8.GetBytes(str); Console.WriteLine(GetHexString(array)); //can not get string, as the 0x20ac will convert to three bytes in utf-8 var bytes = new byte[] {0x20, 0xac}; var str2 = Encoding.UTF8.GetString(bytes); Console.WriteLine(str2); //936 gb2312 ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312) var str3 = Encoding.GetEncoding(936).GetString(array); Console.WriteLine(str3); //1252 windows-1252 ANSI Latin 1; Western European (Windows) var str4 = Encoding.GetEncoding(1252).GetString(array); Console.WriteLine(str4); //28591 iso-8859-1 ISO 8859-1 Latin 1; Western European (ISO) var str5 = Encoding.GetEncoding(28591).GetString(array); Console.WriteLine(str5); }
euro sign在windows-1252以及iso-8859-1里面对应的编码,分别是80和3F
//€ //UTF-8 Encoding: 0xE2 0x82 0xAC //UTF - 16 Encoding: 0x20AC //UTF - 32 Encoding: 0x000020AC string str = "€"; //1252 windows-1252 ANSI Latin 1; Western European (Windows) var array = Encoding.GetEncoding(1252).GetBytes(str); Console.WriteLine(GetHexString(array)); //28591 iso-8859-1 ISO 8859-1 Latin 1; Western European (ISO) var array6 = Encoding.GetEncoding(28591).GetBytes(str); Console.WriteLine(GetHexString(array6));
https://unicode.scarfboy.com/?s=U%2b4F60
这个可以直接根据字符,搜索得到code point,
https://unicode.scarfboy.com/?s=%E7%8E%A9
然后搜索结果里面,有一个U+73A9的链接,点击之后,就可以跳转
作者:Chuck Lu GitHub |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
2016-04-13 Why is try {…} finally {…} good; try {…} catch{} bad?