public class TestNorms { public void createIndex() throws IOException { Directory d = new SimpleFSDirectory(new File("d:/falconTest/lucene3/norms")); IndexWriter writer = new IndexWriter(d, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); Field field = new Field("desc", "", Field.Store.YES, Field.Index.ANALYZED); Document doc = new Document(); field.setValue("Hello students was drive"); doc.add(field); writer.addDocument(doc); writer.optimize(); writer.close(); } public void search() throws IOException { Directory d = new SimpleFSDirectory(new File("d:/falconTest/lucene3/norms")); IndexReader reader = IndexReader.open(d); IndexSearcher searcher = new IndexSearcher(reader); TopDocs docs = searcher.search(new TermQuery(new Term("desc","drove")), 10); System.out.println(docs.totalHits); } public static void main(String[] args) throws IOException { TestNorms test= new TestNorms(); test.createIndex(); test.search(); } }
public class PorterStemAnalyzer extends Analyzer { @Override public TokenStream tokenStream(String fieldName, Reader reader) { return new PorterStemFilter(new LowerCaseTokenizer(reader)); } }
把此分词器用在你的程序中,就能够识别单复数和规则的词型变化了。
public void createIndex() throws IOException { Directory d = new SimpleFSDirectory(new File("d:/falconTest/lucene3/norms")); IndexWriter writer = new IndexWriter(d, new PorterStemAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
Field field = new Field("desc", "", Field.Store.YES, Field.Index.ANALYZED); Document doc = new Document(); field.setValue("Hello students was driving cars professionally"); doc.add(field);
drove processing drove 3 PARADIGM 0: normal form 'DROVE' part of speech:0 PARADIGM 1: normal form 'DROVE' part of speech:2 PARADIGM 2: normal form 'DRIVE' part of speech:2
was processing was 3 PARADIGM 0: normal form 'BE' part of speech:3 PARADIGM 1: normal form 'BE' part of speech:3 PARADIGM 2: normal form 'BE' part of speech:3
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 地球OL攻略 —— 某应届生求职总结
· 周边上新:园子的第一款马克杯温暖上架
· Open-Sora 2.0 重磅开源!
· .NET周刊【3月第1期 2025-03-02】
· [AI/GPT/综述] AI Agent的设计模式综述