encoding and Endian

Unicode, Code Point is the value of evry character in Unicode table(int,long,ll)

Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex.
plane0(0000-FFFF), which is called Basic Multilingual Plane, contains most characters ,including Chinese
Code points in plane0 can be accessed as a single code unit in UTF-16, one to three bytes in UTF-8;
others are supplementary plane, accessed as surrogate pairs of UTF-16, four bytes in UTF-8

UTF-8, an 8-bit variable-width encoding which maximizes compatibility with ASCII;
UTF-16, a 16-bit, variable-width encoding;
UTF-32, a 32-bit, fixed-width encoding

stored in computer and endian:
characters can be encoded and then store the results in computer
a character with code point of 666, large than 0Xff, we usually use 1 bytes * 2 or 2 bytes * 1 to store the encoded value
we can read the value in C by:
char [] = "CharA" or wchar_t = 'CharA'

for a C executable, its memspace is like:
Mem---------MAX:0xfffffff--------------
kernel mem space------------
stack------------------bottom
        |-----------
        |-----------
        |--------------top
        
NULL------------------------
        |
        |
        
Heap------------------------
Data------------------------
Code------------------------
Mem---------MIN0x00000000---------------

for example, an value is 0x123456, 
Big-Endian: stack---------bottom          little-Endian: stack---------bottom
                    |0x56                                          |0x12
                    |0x34                                          |0x34
                    |0x12                                          |0x56
                    |-----top                                      |----top

int b =1;int *a = &b, char * p = (char*)a; if((p[0] == 0x1) Big-Endian

Big-Endian, store the most significant byte in the smallest address

posted @ 2017-02-27 14:09 HEIS老妖阅读(166) 评论(3) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

HEIS老妖

encoding and Endian

公告