UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)
Rohit Agarwal的笔记 出处:https://notes.rohitagarwal.org/2013/05/28/fixing-unicodedecodeerror-in-python.html
在Python中修复UnicodeDecodeError
>>> a = "He said, “Hi, there.” She didn't reply."
>>> type(a)
<type 'str'>
>>> a
"He said, \xe2\x80\x9cHi, there.\xe2\x80\x9d She didn't reply."
>>> print a
He said, “Hi, there.” She didn't reply.
a
是用utf-8编码的字符串。
>>> b = unicode(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
这没有用,因为ascii中python中的默认编码。因此,python无法以a
假定的ascii编码进行解码。
>>> b = unicode(a, "utf-8")
>>> type(b)
<type 'unicode'>
>>> b
u"He said, \u201cHi, there.\u201d She didn't reply."
>>> print b
He said, “Hi, there.” She didn't reply.
b
不是字符串。它是一个unicode对象。我认为它没有编码。您可以使用不同的编码方式对其进行编码。
>>> c = b.encode("utf-8")
>>> type(c)
<type 'str'>
>>> c
"He said, \xe2\x80\x9cHi, there.\xe2\x80\x9d She didn't reply."
>>> print c
He said, “Hi, there.” She didn't reply.
c
现在与相同a
。它是用utf-8编码的字符串。我们通过编码unicode对象来创建它。
>>> d = a.encode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
a
已经以utf-8编码。这里发生的是python首先尝试解码a
,然后编码a
。但是解码a
失败,因为假定默认编码为ascii。
>>> e = a.decode("utf-8")
>>> type(e)
<type 'unicode'>
>>> e
u"He said, \u201cHi, there.\u201d She didn't reply."
>>> print e
He said, “Hi, there.” She didn't reply.
现在,e
与相同b
。它是一个unicode对象。
>>> f = a.decode("ascii")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
只是为了展示我们之前所说的话。
>>> g = b.encode("ascii")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 9: ordinal not in range(128)
请注意,这是一个,UnicodeEncodeError
而不是一个UnicodeDecodeError
。我们无法对包含超出ascii编码范围的字符的unicode对象进行编码。
posted on 2019-12-30 17:01 zhangmingda 阅读(109) 评论(0) 编辑 收藏 举报