python将文本中的非汉字去掉,空白行去掉
通过此方法去掉文本中非汉字,并将空白的行删除:
首先是分两步来实现:
需要处理的文本文件:
当时这个 3 00:00:04,02 --> 00:00:05,13 有喝多的武将 4 00:00:05,18 --> 00:00:07,05 一看许姬很漂亮 5 00:00:07,12 --> 00:00:09,06 就欲行非礼就拽上去 6 00:00:09,09 --> 00:00:10,21 这许姬手也挺快 7 00:00:11,03 --> 00:00:12,14 黑咕隆能看不见呢 8 00:00:12,14 --> 00:00:13,08 顺手夸 9 00:00:13,13 --> 00:00:17,05 把这武将头盔顶上那鹰带给摘下来了 10 00:00:17,12 --> 00:00:19,03 哎就是头盔上绑着带了
下面代码实现去掉文件中非汉字:
import re def del_no_china(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: g = line.encode().decode() k = re.findall('[\u4e00-\u9fa5]', g) s = ''.join(k) if s.split(): outfopen.writelines(s) else: outfopen.writelines("") outfopen.writelines("\n") infopen.close() outfopen.close() del_no_china("处理前.txt", "处理中.txt")
上面的代码执行结果如下:
当时这个
有喝多的武将
一看许姬很漂亮
就欲行非礼就拽上去
这许姬手也挺快
黑咕隆能看不见呢
顺手夸
把这武将头盔顶上那鹰带给摘下来了
哎就是头盔上绑着带了
下面的代码实现去掉上面文本中的空白行:
def delblankline(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: if line.split(): outfopen.writelines(line) else: outfopen.writelines("") infopen.close() outfopen.close() delblankline("处理中.txt", "处理后.txt")
上面代码执行结果如下:
当时这个
有喝多的武将
一看许姬很漂亮
就欲行非礼就拽上去
这许姬手也挺快
黑咕隆能看不见呢
顺手夸
把这武将头盔顶上那鹰带给摘下来了
哎就是头盔上绑着带了
两步合在一起的代码为:
import re def del_no_china(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: g = line.encode().decode() print(g) k = re.findall('[\u4e00-\u9fa5]', g) s = ''.join(k) #print(s) if s.split(): outfopen.writelines(s) else: outfopen.writelines("") outfopen.writelines('\n') #实现换行 infopen.close() outfopen.close() del_no_china("处理前.txt", "处理中.txt") #第一个函数的作用是:去掉文本中的非汉字,字符! def delblankline(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: if line.split(): outfopen.writelines(line) else: outfopen.writelines("") infopen.close() outfopen.close() delblankline("处理中.txt", "处理后.txt") #第二个函数的作用是:去掉文本中的空白行。
最终效果也是一样的!
作者:龙飞
-------------------------------------------
个性签名:独学而无友,则孤陋而寡闻。做一个灵魂有趣的人!
如果觉得这篇文章对你有小小的帮助的话,记得在右下角点个“推荐”哦,博主在此感谢!