中文unicode范围及unicode编解码

中文unicode范围 : [\u4e00-\u9fa5]  

 

普通字符串可以用多种方式编码成Unicode字符串,具体要看你究竟选择了哪种编码:
unicodestring = u"Hello world" 
# 将Unicode转化为普通Python字符串:"encode" 

utf8string = unicodestring.encode("utf-8") 
asciistring = unicodestring.encode("ascii") 
isostring = unicodestring.encode("ISO-8859-1") 
utf16string = unicodestring.encode("utf-16") 

 

# 将普通Python字符串转化为Unicode:"decode" 

plainstring1 = unicode(utf8string, "utf-8") 
plainstring2 = unicode(asciistring, "ascii") 
plainstring3 = unicode(isostring, "ISO-8859-1") 
plainstring4 = unicode(utf16string, "utf-16") 
assert plainstring1 == plainstring2 == plainstring3 == plainstring4

 

posted on 2015-12-23 17:19  星空守望者--jkmiao  阅读(680)  评论(0编辑  收藏  举报