pandas（python2）读取中文数据，处理中文列名

要点：

python修改默认编码为utf-8;
在读取csv或者 xls文件时写入参数encoding="gbk"；如果 gbk也不能 decode，使用收录字符更广的‘’gb18030‘’解码。
使用中文列名时 decode('utf-8'), 或者 u'中文列名'；一劳永逸> from __future__ import unicode_literals
使用codecs模块读取中文文本

# -*- coding: utf-8 -*-
import sys
reload(sys) 
sys.setdefaultencoding('utf8') 
import pandas as pd

path_1= 'brokerUserfeeList.xls'

x = pd.read_excel(path_, encoding="gbk")
print x.columns
print x["成交金额".decode('utf-8')]

#print x[u"成交金额"] #建议使用加u，或者import future，兼容python3

####output：

Index([u'序号', u'成交金额'], dtype='object')
0 11,053.00
1 43,935.40
2 467,327.83
3 32,811.07
4 17,651.10
5 4,629.80

=======================================================

Windows对于读取中文文本，可以使用读取后decode('gbk')，即解码成unicode

open(u'C:\\Users\\Administrator\\Desktop\\222.txt' ).read().decode('gbk')

写的时候就需要，用encode('gbk')把unicode编码成字节流再写入

ttt = u'看了看打扮卡了号地块编码，vas'

with open(ur'c:\Users\Administrator\Desktop\222222.txt', 'w') as f:
　　f.write(ttt.encode('gbk'))

推荐使用codecs 模块，codecs.open() 带encoding参数，直接搞定

with codecs.open(ur'c:\Users\Administrator\Desktop\2222.txt', 'w', encoding='gbk') as f:
    f.write(ttt)

posted @ 2017-02-20 22:27 willowj 阅读(11245) 评论(0) 编辑收藏举报

刷新页面返回顶部

willowj