convert \uXXXX String to Unicode Characters in Python3.x

转换\uXXXX

if Python3.x:

  1. str.decode no longer exists in 3.x. that']s why Python 3.4: str : AttributeError: 'str' object has no attribute 'decode is thrown.
  2. Unicode literal string'\uxxxx\uxxxx' is different from string '\uxxxx\uxxxx'.
    if you don't understand what liternal means, check the py3.x ducumentation
./descape.py '\u627e\u4e0d\u5230\u8be5\u8bcd\u7684\u89e3\u91ca'
#!/usr/bin/env python3
# file : descape.py
# convert the escaped chars like `\u45e3` to unicode

import sys, re

def h2d(a):
    if len(a) != 4:
        return False
    j = 16 ** 3
    r = 0
    for i in range(0,len(a)):
        b = ord(a[i])- 48
        r += (b-39 if b > 9 else b) * j
        j //= 16
    return chr(r)

text = sys.argv[1]
# text is string. not unicode literals

def descape(utext):
    o = ''
    for ac in re.split(r'\\u([a-f0-9]{4})',text):
        if not ac or len(ac) != 4:
            continue
        cur =  ac
        o += h2d(cur)
    return o
print(descape(text))

json module

json.dumps()json.dump()有一个参数ensure_ascii默认是True,改为False 就不会把汉字编码成\uxxxx了

References:

  1. Python 3.4: str : AttributeError: 'str' object has no attribute 'decode
posted @ 2016-09-12 21:02  乌祁班岚图  阅读(520)  评论(0编辑  收藏  举报