python2 http响应JSON中文显示unicode \uXXX的问题
python2编码解码会以unicode作为中间码,要用decode和encode解码后再编码
其中decode解码,是把bytes以给定的编码格式解码成unicode
encode是以给定的编码格式将unicode编码为bytes
数据是以bytes形式传递和存储的,程序需要用正确的编码来将bytes解码显示
decode: From bytes To Unicode
encode: From Unicode To bytes
在python2中试了多种编解码组合,都无法解决中文显示为unicode形式的问题
最终发现是http框架对json数据做序列化的时候出的问题
python-json 相关代码注释如下
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
encoding='utf-8', default=None, sort_keys=False, **kw):
"""Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
will be skipped instead of raising a ``TypeError``.
If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only. If ``ensure_ascii`` is
false, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.
其中有关于 ensure_ascii 参数的说明
大意就是,如果ensure_ascii为true,任何非ascii字符都会被转义成\uXXXX
的形式
再看tornado中write方法的代码, write方法对dict类型数据统一用escape.json_encode序列化为json
两个方法代码如下
def write(self, chunk):
if self._finished:
raise RuntimeError("Cannot write() after finish()")
if not isinstance(chunk, (bytes, unicode_type, dict)):
message = "write() only accepts bytes, unicode, and dict objects"
if isinstance(chunk, list):
message += ". Lists not accepted for security reasons; see http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write"
raise TypeError(message)
if isinstance(chunk, dict):
chunk = escape.json_encode(chunk)
self.set_header("Content-Type", "application/json; charset=UTF-8")
chunk = utf8(chunk)
self._write_buffer.append(chunk)
===================================================
def json_encode(value):
return json.dumps(value).replace("</", "<\\/")
可以看到json_encode中 json dumps方法并没有给定ensure_ascii的值,所以ensure_ascii就是默认值True,也就是,被序列化的数据中的字符串所有非ascii的字符都会转义为unicode形式。
解决办法,就是手动处理json数据,将ensure_ascii设定为False。
json.dumps(value, ensure_ascii=False)
老项目没办法,新项目必定是python3了。
https://www.cnblogs.com/haiton/p/18159481
转载须注明出处!!!!