encoding and Endian

Unicode, Code Point is the value of evry character in Unicode table(int,long,ll)

Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex.
plane0(0000-FFFF), which is called Basic Multilingual Plane, contains most characters ,including Chinese
Code points in plane0 can be accessed as a single code unit in UTF-16, one to three bytes in UTF-8;
others are supplementary plane, accessed as surrogate pairs of UTF-16, four bytes in UTF-8

UTF-8, an 8-bit variable-width encoding which maximizes compatibility with ASCII;
UTF-16, a 16-bit, variable-width encoding;
UTF-32, a 32-bit, fixed-width encoding

stored in computer and endian:
characters can be encoded and then store the results in computer
a character with code point of 666, large than 0Xff, we usually use 1 bytes * 2 or 2 bytes * 1 to store the encoded value
we can read the value in C by:
char [] = "CharA" or wchar_t = 'CharA'

for a C executable, its memspace is like:
Mem---------MAX:0xfffffff--------------
kernel mem space------------
stack------------------bottom |----------- |----------- |--------------top NULL------------------------ | | Heap------------------------ Data------------------------ Code------------------------ Mem---------MIN0x00000000--------------- for example, an value is 0x123456, Big-Endian: stack---------bottom little-Endian: stack---------bottom |0x56 |0x12 |0x34 |0x34 |0x12 |0x56 |-----top |----top
int b =1;int *a = &b, char * p = (char*)a; if((p[0] == 0x1) Big-Endian

Big-Endian, store the most significant byte in the smallest address

posted @ 2017-02-27 14:09  HEIS老妖  阅读(166)  评论(3编辑  收藏  举报