Python3解决Nginx日志的中文乱码问题
Nginx中文日志出现乱码,如下所示:
{\x22code\x22: \x22000\x22, \x22msg\x22: \x22\x5Cu6210\x5Cu529f\x22, \x22data\x22: {\x22store_id\x22: 322589}, \x22subcode\x22: \x22100000\x22}
Python3进行解码
import json
msg = """
{\x22code\x22: \x22000\x22, \x22msg\x22: \x22\x5Cu6210\x5Cu529f\x22, \x22data\x22: {\x22store_id\x22: 322589}, \x22subcode\x22: \x22100000\x22}
"""
res_obj = json.loads(msg.encode('raw_unicode_escape').decode('utf8'))
print(json.dumps(res_obj, ensure_ascii=False))
结果如下所示:
{"code": "000", "msg": "成功", "data": {"store_id": 322589}, "subcode": "100000"}
总结
- Nginx默认不支持中文日志,会将中文转成16进制存储
- 通过Python3先编码在解码:
msg.encode('raw_unicode_escape').decode('utf8')
即可完成相应的转换 - Nginx可以在配置中支持中文的json,在定义 access log 格式时,加上 escape=json,如下所示:
log_format main escape=json '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
好记性不如烂笔头!