python编码问题

win下的Dos乱码

utf-8保存的文件,在win中映射为gbk,输出文字就是乱码的,或者读取网页的时候在dos下输出,因为dos是用gbk编码,这样就容易导致出错

解决办法:

print "大家好".decode('utf-8').encode('GBK')

另外还有一种情况是一些软件(notepad)在保存utf-8会在文件开头插入不可见字符BOM(0xEF 0xBB 0xBF)

可以用codecs模块来处理

import codecs
content = open("test.txt",'r').read()
filehandle.close()
if content[:3] == codecs.BOM_UTF8:
    content = content[3:]
print content.decode("utf-8")

ps:bom可以用来绕过一些文件内容的判断(xdcms 2015 代码审计第四题)

    private function check_content($name)
    {
        if(isset($_FILES[$name]["tmp_name"])) {
            $content = file_get_contents($_FILES[$name]["tmp_name"]);
            if(strpos($content, "<?") === 0) {
                return false;
            }
        }
        return true;
}

py头未设置字符集

s = "测试"
print s
  File "/Users/l3m0n/study/program/python/code_study/test3.py", line 1
SyntaxError: Non-ASCII character '\xe6' in file /Users/l3m0n/study/program/python/code_study/test3.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

python默认编码是ascii,所以print的时候会把中文当ascii处理导致出错

解决办法:

# coding=utf-8
或者
#!/usr/bin/python
# -*- coding: utf-8 -*-

字符连接出现错误

# coding=utf-8
s = "测试" + u"1下"
print s
Traceback (most recent call last):
  File "/Users/l3m0n/study/program/python/code_study/test3.py", line 2, in <module>
    s = "测试" + u"一下"
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

左边是中文字符串类型str,右边是unicode,这样str转换为unicode的时候会用系统默认ascii编码去解码,0-127,ascii能够处理,但是当str解出的大于128的时候,ascii就处理不来,于是抛出异常

两种方法解决:

1、str转换为unicode:
s = “测试".decode("gbk") + u"1下"

2、unicode进行utf-8编码
s = "测试" + u"1下”.decode("utf-8")

默认字符集出问题

Traceback (most recent call last):
  File "/Users/l3m0n/study/program/python/code_study/mangzhu.py", line 14, in <module>
    print sqli(1);
UnicodeEncodeError: 'ascii' codec can't encode characters in position 275-281: ordinal not in range(128)

解决:

import sys
reload(sys)
sys.setdefaultencoding('utf8')
posted @ 2016-05-20 16:19  l3m0n  阅读(350)  评论(0编辑  收藏  举报