apache opennlp 简单使用(java)

  • maven

    <dependency>
       <groupId>org.apache.opennlp</groupId>
       <artifactId>opennlp-tools</artifactId>
       <version>1.9.4</version>
    </dependency>
    
  • model 文件 (http://opennlp.sourceforge.net/models-1.5/)

    • en-sent.bin
    • en-token.bin
    • en-pos-perceptron.bin
  • 新建项目, 并把model放到resources目录, 在此放到src/test/resources (目录不存在则创建)
    image

  • 句子分析

    private static String text = "Java is a programming language and computing platform first released by Sun Microsystems in 1995. It has evolved from humble beginnings to power a large share of today’s digital world, by providing the reliable platform upon which many services and applications are built. New, innovative products and digital services designed for the future continue to rely on Java, as well.";
    @Test
    public void sentenceDetectionTest() throws IOException {
    	Path path = Paths.get("src", "test", "resources", "en-sent.bin");
    	InputStream is = new FileInputStream(path.toFile());
    	SentenceModel model = new SentenceModel(is);
    	SentenceDetectorME sdetector = new SentenceDetectorME(model);
    	String[] sentences = sdetector.sentDetect(text);
    	for (String sentence : sentences) {
    		System.out.println(sentence);
    	}
    }
    
    
    //-- 结果
    // Java is a programming language and computing platform first released by Sun Microsystems in 1995.
    // It has evolved from humble beginnings to power a large share of today’s digital world, by providing the reliable platform upon which many services and applications are built.
    // New, innovative products and digital services designed for the future continue to rely on Java, as well.
    
  • 词性分析

    @Test
    public void posTagTest() throws IOException {
    	String text = "Java is a programming language and computing platform first released by Sun Microsystems in 1995. ";
    	Path path = Paths.get("src", "test", "resources", "en-sent.bin");
    	InputStream is = new FileInputStream(path.toFile());
    	SentenceDetectorME sdetector = new SentenceDetectorME(new SentenceModel(is));
    	String[] sentences = sdetector.sentDetect(text);
    	for (String sentence : sentences) {
    		Path tokenPath = Paths.get("src", "test", "resources", "en-token.bin");
    		InputStream tokeInputStream = new FileInputStream(tokenPath.toFile());
    		TokenizerME tokenizer = new TokenizerME(new TokenizerModel(tokeInputStream));
    		String[] tokens = tokenizer.tokenize(sentence);
    		Path posPath = Paths.get("src", "test", "resources", "en-pos-perceptron.bin");
    		InputStream posInputStream = new FileInputStream(posPath.toFile());
    		POSModel posModel = new POSModel(posInputStream);
    		POSTaggerME posTagger = new POSTaggerME(posModel);
    		String[] tagArray = posTagger.tag(tokens);
    		for (int i = 0; i < tokens.length; i++) {
    			System.out.printf("%s -- %s%n", tokens[i], tagArray[i]);
    		}
    	}
    }
    
    
    // -- 结果
    // Java -- NNP
    // is -- VBZ
    // a -- DT
    // programming -- NN
    // language -- NN
    // and -- CC
    // computing -- NN
    // platform -- NN
    // first -- JJ
    // released -- VBN
    // by -- IN
    // Sun -- NNP
    // Microsystems -- NNPS
    // in -- IN
    // 1995 -- CD
    // . -- .
    
posted @   cc-31415926  阅读(1430)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
点击右上角即可分享
微信分享提示