Python字符串编码与解码

Python字符串编码与解码

在ipython shell中,设置默认编码为utf-8后,


In [15]: s1 = '编程'

In [16]: s2 = u'编程'

In [17]: print s1
编程

In [18]: print s2
编程

In [19]: s1
Out[19]: '\xe7\xbc\x96\xe7\xa8\x8b'

In [20]: s2
Out[20]: u'\u7f16\u7a0b'

In [21]: s1 == s2
/usr/bin/ipython:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  #! /usr/bin/python
Out[21]: False

16行中,字符串'编程'前的u表示s2采用unicode编码,而s1采用ipython shell默认的utf-8编码,虽然print的结果一样,但s1不等于s2.

然后:


In [22]: r1 = s1.decode('utf-8')

In [23]: r2 = s2.encode('utf-8')

In [24]: print r1
编程

In [25]: print r1
编程

In [27]: r1
Out[27]: u'\u7f16\u7a0b'

In [28]: r2
Out[28]: '\xe7\xbc\x96\xe7\xa8\x8b'
In [29]: r1 == r2
Out[29]: False

In [30]: r1 == s2
Out[30]: True

In [31]: r2 == s1
Out[31]: True

22行是对s1的解码,将其从utf-8编码转换成unicode

23行是对s2的编码,将其从unicode转换成utf-8编码

对于s2来说,是不存在解码一说,对于s1来说,也不能说将其编码成unicode。

如果想要将utf-8编码字符串转换成gbk编码,可以现将utf-8转换成unicode,再将unicode转换成gbk。

posted @ 2015-08-02 21:04  Coder816  阅读(263)  评论(0编辑  收藏  举报