Python字符串编码与解码
Python字符串编码与解码
在ipython shell中,设置默认编码为utf-8后,
In [15]: s1 = '编程'
In [16]: s2 = u'编程'
In [17]: print s1
编程
In [18]: print s2
编程
In [19]: s1
Out[19]: '\xe7\xbc\x96\xe7\xa8\x8b'
In [20]: s2
Out[20]: u'\u7f16\u7a0b'
In [21]: s1 == s2
/usr/bin/ipython:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
#! /usr/bin/python
Out[21]: False
16行中,字符串'编程'前的u表示s2采用unicode编码,而s1采用ipython shell默认的utf-8编码,虽然print的结果一样,但s1不等于s2.
然后:
In [22]: r1 = s1.decode('utf-8')
In [23]: r2 = s2.encode('utf-8')
In [24]: print r1
编程
In [25]: print r1
编程
In [27]: r1
Out[27]: u'\u7f16\u7a0b'
In [28]: r2
Out[28]: '\xe7\xbc\x96\xe7\xa8\x8b'
In [29]: r1 == r2
Out[29]: False
In [30]: r1 == s2
Out[30]: True
In [31]: r2 == s1
Out[31]: True
22行是对s1的解码,将其从utf-8编码转换成unicode
23行是对s2的编码,将其从unicode转换成utf-8编码
对于s2来说,是不存在解码一说,对于s1来说,也不能说将其编码成unicode。
如果想要将utf-8编码字符串转换成gbk编码,可以现将utf-8转换成unicode,再将unicode转换成gbk。