代码改变世界

python 字符集转换-灰常慢

2013-08-01 09:21  江湖么名  阅读(379)  评论(0编辑  收藏  举报

代码

def toUni (text):
    str = text
    try:
        charstyle = chardet.detect(text)
        # print 'confidence: ', charstyle['confidence'] # 猜测精度
        if ( charstyle['encoding'] == 'GB2312' ):
            str = text.decode( charstyle['encoding'], 'replace')
        elif ( charstyle['encoding'] == 'gbk' ):
            str = text.decode( charstyle['encoding'], 'replace' )
        elif ( charstyle['encoding'] == 'utf-8' ):
            str = text.decode( charstyle['encoding'], 'replace' )
        else:
            str = text.decode( charstyle['encoding'], 'replace' )
    except Exception, e:
        print ('[changeToUni.except] %s' % str(e) )
        str = text
    return str

另外说一句,这个是非常耗费时间的,一般网页要1-3秒钟。。。非常不划算。