一键语法错误增强工具 ChineseErrorCorrector

一键语法错误增强工具

欢迎使用我最近开源的使用一键语法错误增强工具，该工具可以进行14种语法错误的增强，不同行业可以根据自己的数据进行错误替换，来训练自己的语法和拼写模型，希望推动行业文本纠错的发展，欢迎Star，14种错误如下所示：

每种错误类型，对应的使用方法，如下所示：

环境的安装

1	`pip install ChineseErrorCorrector`

不同类型的数据增强

1.缺字漏字

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.lack_word("小明住在北京"))
 
# 输出：小明在北京

2.错别字错误

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.wrong_word("小明住在北京"))
# 输出：小明住在北鲸

3.缺少标点

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.lack_char("小明住在北京，热爱NLP。"))
# 输出：小明住在北京热爱NLP。

4.错用标点

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.wrong_char("小明住在北京"))
# 输出：小明住在北京。热爱NLP。

5.主语不明

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.unknow_sub("小明住在北京"))
# 输出：住在北京

6.谓语残缺

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.unknow_pred("小明住在北京"))
# 输出：小明在北京

7.宾语残缺

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.lack_obj("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，热爱。

8.其他成分残缺

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.lack_others("小明住在北京，热爱NLP。"))
# 输出：小明住北京，热爱NLP。

9.虚词多余

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.red_fun("小明住在北京，热爱NLP。"))
# 输出：小明所住的在北京，热爱NLP。

10.其他成分多余

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.red_component("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，热爱NLP。，看着

11.主语多余

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.red_sub("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，小明热爱NLP。

12.语序不当

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.wrong_sentence_order("小明住在北京，热爱NLP。"))
# 输出：热爱NLP。，小明住在北京

13.动宾搭配不当

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.wrong_ver_obj("小明住在北京，热爱NLP。"))
# 输出：None ，即无法进行此类错误的增强

14.其他搭配不当

from ChineseErrorCorrector.dat import GrammarErrorDat
 
cged_tool = GrammarErrorDat()
print(cged_tool.other_wrong("小明住在北京，热爱NLP。"))
# 输出：None, 即无法进行此类错误的增强

代码地址：https://github.com/TW-NLP/ChineseErrorCorrector

posted @ 2024-07-29 09:32 TW-NLP 阅读(161) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· 开源最强中文纠错大模型，超越华为17个点！

· 正则表达式的使用

· 一键式文本纠错工具，整合了BERT、ERNIE等多种模型，让您立即享受纠错的便利和效果

· jcorrector 中文文本纠错工具

· java 实现中英文拼写检查和错误纠正？可我只会写 CRUD 啊！

阅读排行：
· winform 绘制太阳，地球，月球运作规律
· 震惊！C++程序真的从main开始吗？99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码？零基础破解AI编程运行原理
· 超详细：普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾（3.3-3.9）

公告

昵称： TW-NLP
园龄： 1年4个月
粉丝： 2
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

TW-NLP 厚积薄发