python批量将文件编码格式转换为 UTF8带标签的格式，解决linux环境下中文编码乱码的问题

指定一个文件夹，遍历文件夹内的文件和子文件夹内的文件，然后识别文件后缀为cpp的文件，通过chardet取检测文件的编码格式，如果不是UTF-8-SIG，则转换为UTF-8-SIG

python脚本格式如下

import os
import sys
import codecs
import chardet

def convert(filename,out_enc="UTF-8-SIG"):
  try:
    content=codecs.open(filename,'rb+').read()
    source_encoding=chardet.detect(content)["encoding"]
    print(source_encoding)
    
    if source_encoding != "UTF-8-SIG":#"GB2312":
      content=content.decode(source_encoding).encode(out_enc)
      codecs.open(filename,'wb+').write(content)
      print("covert file "+filename)
  except IOError as err:
    print("I/O error:{0}".format(err))

def removeBom(file):
  '''移除UTF-8文件的BOM字节'''
  data = open(file,'rb+').read()
  if data[:3] == codecs.BOM_UTF8:
    data = data[3:]
    data.decode("utf-8")
    # print(data.decode("utf-8"))


def explore(dir):
  for root,dirs,files in os.walk(dir):
    for file in files:
      if os.path.splitext(file)[1]=='.cpp':
       print(file)
       path=os.path.join(root,file)
       convert(path)
       # removeBom(path)

def main():
  explore(sys.argv[1])

if __name__=="__main__":
  main()

如果出现未找到chardet的错误，在cmd中执行下pip install chardet 命令，就可以安装chardet

然后用cmd执行执行命令 python ToUtf8.py test test是文件夹的名称；就可以批量实现文件的编码格式识别和转换了；

posted @ 2024-12-04 17:32 一字千金阅读(17) 评论(0) 编辑收藏举报

刷新页面返回顶部

一字千金

python批量将文件编码格式转换为 UTF8带标签的格式，解决linux环境下中文编码乱码的问题

公告