如何过滤文本中的语言

要求：将A.txt 中的韩文去掉，只保留英文和中文

代码如下：

#!/user/bin/env python
# -*- coding:utf-8 -*-
with open("abc","r",encoding= "utf-8") as f1 , open("123","w") as f2:
    buf = f1.read()
    for i in buf:
        print(type(i))
        if i >= '\u4e00' and i <= '\u9fa5': # 中文unicode范围[\u4e00-\u9fa5]
            f2.write(i)
        if i >= '\u0061' and i <= '\u0087':#英文unicode范围[\u0061-\u0087]
            f2.write(i)

也可以使用正则表达式来过滤不需要的语种

posted @ 2017-08-10 17:55 会开车的好厨师阅读(178) 评论(0) 收藏举报

刷新页面返回顶部

会开车的好厨师

如何过滤文本中的语言

公告