细节化 OpenNLP
6 细节化
功能介绍:文本分块由除以单词句法相关部分,如名词基,动词基的文字,但没有指定其内部结构,也没有其在主句作用。
API:该概括化提供了一个API来培养新的概括化的模式。下面的示例代码演示了如何做到这一点:
测试代码
package package01;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.*;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
public class Test06 {
public static void main(String[] args) throws IOException {
Test06.chunk();
}
/**
* 5.序列标注:Chunker
* @deprecated 通过使用标记生成器生成的tokens分为一个句子划分为一组块。What chunker does is to partition a sentence to a set of chunks by using the tokens generated by tokenizer.
*
* 输入值
* Hi. How are you? This is Mike.
*/
public static void chunk() throws IOException {
POSModel model = new POSModelLoader().load(new File("E:\\NLP_Practics\\models\\en-pos-maxent.bin"));
//PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);
// ObjectStream<String> lineStream = new PlainTextByLineStream(new StringReader(str));
Charset charset = Charset.forName("UTF-8");
InputStreamFactory isf = new MarkableFileInputStreamFactory(new File("E:\\myText.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);
//perfMon.start();
String line;
String whitespaceTokenizerLine[] = null;
String[] tags = null;
while ((line = lineStream.read()) != null) {
whitespaceTokenizerLine = WhitespaceTokenizer.INSTANCE.tokenize(line);
tags = tagger.tag(whitespaceTokenizerLine);
POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
System.out.println(sample.toString());
//perfMon.incrementCounter();
}
//perfMon.stopAndPrintFinalResult();
// chunker
InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-chunker.bin");
ChunkerModel cModel = new ChunkerModel(is);
ChunkerME chunkerME = new ChunkerME(cModel);
String result[] = chunkerME.chunk(whitespaceTokenizerLine, tags);
for (String s : result)
System.out.println(s);
Span[] span = chunkerME.chunkAsSpans(whitespaceTokenizerLine, tags);
for (Span s : span)
System.out.println(s.toString());
System.out.println("--------------5-------------");
is.close();
}
}
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Loading POS Tagger model ... done ( 0 .554s) Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NNP B-NP B-ADVP O B-NP I-NP B-VP O [ 0 .. 1 ) NP [ 1 .. 2 ) ADVP [ 3 .. 5 ) NP [ 5 .. 6 ) VP -------------- 5 ------------- |
https://github.com/godmaybelieve
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)