python2 http响应JSON中文显示unicode \uXXX的问题

python2编码解码会以unicode作为中间码，要用decode和encode解码后再编码
其中decode解码，是把bytes以给定的编码格式解码成unicode
encode是以给定的编码格式将unicode编码为bytes
数据是以bytes形式传递和存储的，程序需要用正确的编码来将bytes解码显示
decode: From bytes To Unicode
encode: From Unicode To bytes

在python2中试了多种编解码组合，都无法解决中文显示为unicode形式的问题
最终发现是http框架对json数据做序列化的时候出的问题

python-json 相关代码注释如下

def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        encoding='utf-8', default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.

    If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
    output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
    instance consisting of ASCII characters only.  If ``ensure_ascii`` is
    false, some chunks written to ``fp`` may be ``unicode`` instances.
    This usually happens because the input contains unicode strings or the
    ``encoding`` parameter is used. Unless ``fp.write()`` explicitly
    understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
    cause an error.

其中有关于 ensure_ascii 参数的说明
大意就是，如果ensure_ascii为true，任何非ascii字符都会被转义成\uXXXX的形式
再看tornado中write方法的代码, write方法对dict类型数据统一用escape.json_encode序列化为json
两个方法代码如下

    def write(self, chunk):
        if self._finished:
            raise RuntimeError("Cannot write() after finish()")
        if not isinstance(chunk, (bytes, unicode_type, dict)):
            message = "write() only accepts bytes, unicode, and dict objects"
            if isinstance(chunk, list):
                message += ". Lists not accepted for security reasons; see http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write"
            raise TypeError(message)
        if isinstance(chunk, dict):
            chunk = escape.json_encode(chunk)
            self.set_header("Content-Type", "application/json; charset=UTF-8")
        chunk = utf8(chunk)
        self._write_buffer.append(chunk)
    ===================================================
    def json_encode(value):
        return json.dumps(value).replace("</", "<\\/")

可以看到json_encode中 json dumps方法并没有给定ensure_ascii的值，所以ensure_ascii就是默认值True，也就是，被序列化的数据中的字符串所有非ascii的字符都会转义为unicode形式。

解决办法，就是手动处理json数据，将ensure_ascii设定为False。

json.dumps(value, ensure_ascii=False)

老项目没办法，新项目必定是python3了。

https://www.cnblogs.com/haiton/p/18159481
转载须注明出处！！！！

posted @ 2024-04-26 10:39 华腾海神阅读(0) 评论(0) 收藏举报

刷新页面返回顶部

华腾海神

python2 http响应JSON中文显示unicode \uXXX的问题

公告