UTF-8
http://code.alexreisner.com/articles/character-encoding.html
对于以UTF-8编码的字节:
if it starts with 0 |
it’s an ASCII character |
if it starts with 10 |
it’s a continuation of a multi-byte character |
if it starts with 110 |
it’s the first byte of a 2-byte character |
if it starts with 1110 |
it’s the first byte of a 3-byte character |
if it starts with 11110 |
it’s the first byte of a 4-byte character |