Apache OpenNLP的初探

https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control

 

 

环境:IDEA+jdk8+maven 3.5.2
新建maven项目,添加nlp的maven依赖:

<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.8.4</version>
</dependency>

然后就可以使用nlp的开发工具了。我们来看一些实例:

1
2
3
4
5
6
7
8
9
10
11
12
//divide sentences
public static void SentenceDetect() throws IOException {
    String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
    InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin");
    SentenceModel model = new SentenceModel(is);
    SentenceDetectorME sdetector = new SentenceDetectorME(model);
    String sentences[] = sdetector.sentDetect(paragraph);
    for (String single : sentences) {
        System.out.println(single);
    }
    is.close();
}

  

这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:\NLP_Practics\models\目录下。
关于更多模型的下载可以在地址:
http://maven.tamingtext.com/opennlp-models/models-1.5/
中找到。
我们来看下对应的输出结果:

1
2
3
Hi. How are you?
This is JD_Dog.
He is my good friends.He is very kind.but he is no more handsome than me.

  是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。
我们再看一个英文分词的例子:

1
2
3
4
5
6
7
8
9
10
//devide words
    public static void Tokenize() throws IOException {
        InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin");
        TokenizerModel model = new TokenizerModel(is);
        Tokenizer tokenizer = new TokenizerME(model);
        String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
        for (String a : tokens)
            System.out.println(a);
        is.close();
    }

  运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Hi
.
How
are
you
?
This
is
Richard
.
Richard
is
still
single
.
please
help
him
find
his
girl

  

 

完整测试代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package package01;
 
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
 
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
 
public class Test01 {
 
    //divide sentences
    public static void SentenceDetect() throws IOException {
        String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. ";
        InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin");
        SentenceModel model = new SentenceModel(is);
        SentenceDetectorME sdetector = new SentenceDetectorME(model);
        String sentences[] = sdetector.sentDetect(paragraph);
        for (String single : sentences) {
            System.out.println(single);
        }
        is.close();
    }
 
    //devide words
    public static void Tokenize() throws IOException {
        InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin");
        TokenizerModel model = new TokenizerModel(is);
        Tokenizer tokenizer = new TokenizerME(model);
        String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl");
        for (String a : tokens)
            System.out.println(a);
        is.close();
    }
 
    public static void main(String[] args) throws IOException {
//        Test01.SentenceDetect();
        Test01.Tokenize();
    }
 
}

  

posted @   尐鱼儿  阅读(444)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
点击右上角即可分享
微信分享提示