哪有什么岁月静好,不过是有人替你负重前行!

python将文本中的非汉字去掉,空白行去掉

通过此方法去掉文本中非汉字,并将空白的行删除:

首先是分两步来实现:

需要处理的文本文件:

当时这个

3
00:00:04,02 --> 00:00:05,13
有喝多的武将

4
00:00:05,18 --> 00:00:07,05
一看许姬很漂亮

5
00:00:07,12 --> 00:00:09,06
就欲行非礼就拽上去

6
00:00:09,09 --> 00:00:10,21
这许姬手也挺快

7
00:00:11,03 --> 00:00:12,14
黑咕隆能看不见呢

8
00:00:12,14 --> 00:00:13,08
顺手夸

9
00:00:13,13 --> 00:00:17,05
把这武将头盔顶上那鹰带给摘下来了

10
00:00:17,12 --> 00:00:19,03
哎就是头盔上绑着带了
处理前

下面代码实现去掉文件中非汉字:

import re
def del_no_china(infile, outfile):
    infopen = open(infile, 'r', encoding="utf-8")
    outfopen = open(outfile, 'w', encoding="utf-8")
    lines = infopen.readlines()
    for line in lines:
        g = line.encode().decode()
        k = re.findall('[\u4e00-\u9fa5]', g)
        s = ''.join(k)
        if s.split():
            outfopen.writelines(s)
        else:
            outfopen.writelines("")
        outfopen.writelines("\n")
    infopen.close()
    outfopen.close()
del_no_china("处理前.txt", "处理中.txt")

上面的代码执行结果如下:

当时这个



有喝多的武将



一看许姬很漂亮



就欲行非礼就拽上去



这许姬手也挺快



黑咕隆能看不见呢



顺手夸



把这武将头盔顶上那鹰带给摘下来了



哎就是头盔上绑着带了
处理中

下面的代码实现去掉上面文本中的空白行:

def delblankline(infile, outfile):
    infopen = open(infile, 'r', encoding="utf-8")
    outfopen = open(outfile, 'w', encoding="utf-8")
    lines = infopen.readlines()
    for line in lines:
        if line.split():
            outfopen.writelines(line)
        else:
            outfopen.writelines("")
    infopen.close()
    outfopen.close()
delblankline("处理中.txt", "处理后.txt")

上面代码执行结果如下:

当时这个
有喝多的武将
一看许姬很漂亮
就欲行非礼就拽上去
这许姬手也挺快
黑咕隆能看不见呢
顺手夸
把这武将头盔顶上那鹰带给摘下来了
哎就是头盔上绑着带了
处理后

 

两步合在一起的代码为:

import re
def del_no_china(infile, outfile):
    infopen = open(infile, 'r', encoding="utf-8")
    outfopen = open(outfile, 'w', encoding="utf-8")
    lines = infopen.readlines()
    for line in lines:
        g = line.encode().decode()
        print(g)
        k = re.findall('[\u4e00-\u9fa5]', g)
        s = ''.join(k)
        #print(s)
        if s.split():
            outfopen.writelines(s)
        else:
            outfopen.writelines("")
        outfopen.writelines('\n')  #实现换行
    infopen.close()
    outfopen.close()
del_no_china("处理前.txt", "处理中.txt")
#第一个函数的作用是:去掉文本中的非汉字,字符!

def delblankline(infile, outfile):
    infopen = open(infile, 'r', encoding="utf-8")
    outfopen = open(outfile, 'w', encoding="utf-8")
    lines = infopen.readlines()
    for line in lines:
        if line.split():
            outfopen.writelines(line)
        else:
            outfopen.writelines("")
    infopen.close()
    outfopen.close()
delblankline("处理中.txt", "处理后.txt")
#第二个函数的作用是:去掉文本中的空白行。

最终效果也是一样的!

 

posted @ 2021-10-18 12:04  longfei825  阅读(1440)  评论(0编辑  收藏  举报