UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)

Rohit Agarwal的笔记 出处:https://notes.rohitagarwal.org/2013/05/28/fixing-unicodedecodeerror-in-python.html

在Python中修复UnicodeDecodeError

2013年5月28日

>>> a = "He said, “Hi, there.” She didn't reply."
>>> type(a)
<type 'str'>
>>> a
"He said, \xe2\x80\x9cHi, there.\xe2\x80\x9d She didn't reply."
>>> print a
He said, “Hi, there.” She didn't reply.

a 是用utf-8编码的字符串。

>>> b = unicode(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)

这没有用,因为ascii中python中的默认编码。因此,python无法以a假定的ascii编码进行解码

>>> b = unicode(a, "utf-8")
>>> type(b)
<type 'unicode'>
>>> b
u"He said, \u201cHi, there.\u201d She didn't reply."
>>> print b
He said, “Hi, there.” She didn't reply.

b不是字符串。它是一个unicode对象。我认为它没有编码。您可以使用不同的编码方式对其进行编码。

>>> c = b.encode("utf-8")
>>> type(c)
<type 'str'>
>>> c
"He said, \xe2\x80\x9cHi, there.\xe2\x80\x9d She didn't reply."
>>> print c
He said, “Hi, there.” She didn't reply.

c现在与相同a它是用utf-8编码的字符串。我们通过编码unicode对象来创建它。

>>> d = a.encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)

a已经以utf-8编码。这里发生的是python首先尝试解码a,然后编码a但是解码a失败,因为假定默认编码为ascii。

>>> e = a.decode("utf-8")
>>> type(e)
<type 'unicode'>
>>> e
u"He said, \u201cHi, there.\u201d She didn't reply."
>>> print e
He said, “Hi, there.” She didn't reply.

现在,e与相同b它是一个unicode对象。

>>> f = a.decode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)

只是为了展示我们之前所说的话。

>>> g = b.encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 9: ordinal not in range(128)

请注意,这是一个,UnicodeEncodeError而不是一个UnicodeDecodeError我们无法对包含超出ascii编码范围的字符的unicode对象进行编码。

posted on 2019-12-30 17:01  zhangmingda  阅读(109)  评论(0编辑  收藏  举报

导航