1、模块作用
chardet模块用来实现字符串/文件编码检测
2、chardet下载与安装
下载地址:http://pypi.python.org/pypi/chardet/
下载chardet后,解压chardet压缩包,直接将chardet文件夹放在应用程序目录下,就可以使用import chardet开始使用chardet了,也可以将chardet拷贝到Python系统目录下,这样你所有的python程序只要用import chardet就可以了。
Python第三方模块中一般会自带setup.py文件,在CMD里切换目录至chardet,然后执行:
python setup.py install
如果执行上述命令,报错:ImportError: No module named setuptools,则要进行setuptools的安装。
3、setuptools下载与安装
下载setuptools-0.6c11.win32-py2.7.exe,直接双击exe文件,进行安装。安装完成之后,即可直接使用python setup.py install安装Python Library啦。
4、使用实例
chardet.detect()返回字典,其中confidence是检测精确度,encoding是编码形式
(1)网页编码判断:
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
>>> import urllib
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
>>> import urllib
>>> rawdata = urllib.urlopen('http://www.byhh.net/').read()
>>> chardet.detect(rawdata)
{'confidence': 0.99, 'encoding': 'GB2312'}
>>> chardet.detect(rawdata)
{'confidence': 0.99, 'encoding': 'GB2312'}
(2)文件编码判断
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
>>> f = open("C://log.txt", "r")
>>> r = f.readline()
>>> print chardet.detect(r)
{'confidence': 1.0, 'encoding': 'ascii'}
>>> r = f.readline()
>>> print chardet.detect(r)
{'confidence': 1.0, 'encoding': 'ascii'}
(3)变量编码判断
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
Type "copyright", "credits" or "license()" for more information.
>>> import chardet
>>> strPowerManage = "电源管理"
>>> chardet.detect(strPowerManage)
{'confidence': 0.99, 'encoding': 'GB2312'}
>>> chardet.detect(strPowerManage)
{'confidence': 0.99, 'encoding': 'GB2312'}