Apache OpenNLP的初探
https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control
环境:IDEA+jdk8+maven 3.5.2
新建maven项目,添加nlp的maven依赖:
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.8.4</version>
</dependency>
然后就可以使用nlp的开发工具了。我们来看一些实例:
1 2 3 4 5 6 7 8 9 10 11 12 | //divide sentences public static void SentenceDetect() throws IOException { String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. " ; InputStream is = new FileInputStream( "E:\\NLP_Practics\\models\\en-sent.bin" ); SentenceModel model = new SentenceModel(is); SentenceDetectorME sdetector = new SentenceDetectorME(model); String sentences[] = sdetector.sentDetect(paragraph); for (String single : sentences) { System.out.println(single); } is.close(); } |
这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:\NLP_Practics\models\目录下。
关于更多模型的下载可以在地址:
http://maven.tamingtext.com/opennlp-models/models-1.5/
中找到。
我们来看下对应的输出结果:
1 2 3 | Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. |
是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。
我们再看一个英文分词的例子:
1 2 3 4 5 6 7 8 9 10 | //devide words public static void Tokenize() throws IOException { InputStream is = new FileInputStream( "E:\\NLP_Practics\\models\\en-token.bin" ); TokenizerModel model = new TokenizerModel(is); Tokenizer tokenizer = new TokenizerME(model); String tokens[] = tokenizer.tokenize( "Hi. How are you? This is Richard. Richard is still single. please help him find his girl" ); for (String a : tokens) System.out.println(a); is.close(); } |
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | Hi . How are you ? This is Richard . Richard is still single . please help him find his girl |
完整测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | package package01; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.tokenize.Tokenizer; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; public class Test01 { //divide sentences public static void SentenceDetect() throws IOException { String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. " ; InputStream is = new FileInputStream( "E:\\NLP_Practics\\models\\en-sent.bin" ); SentenceModel model = new SentenceModel(is); SentenceDetectorME sdetector = new SentenceDetectorME(model); String sentences[] = sdetector.sentDetect(paragraph); for (String single : sentences) { System.out.println(single); } is.close(); } //devide words public static void Tokenize() throws IOException { InputStream is = new FileInputStream( "E:\\NLP_Practics\\models\\en-token.bin" ); TokenizerModel model = new TokenizerModel(is); Tokenizer tokenizer = new TokenizerME(model); String tokens[] = tokenizer.tokenize( "Hi. How are you? This is Richard. Richard is still single. please help him find his girl" ); for (String a : tokens) System.out.println(a); is.close(); } public static void main(String[] args) throws IOException { // Test01.SentenceDetect(); Test01.Tokenize(); } } |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)