自定义中文全文索引
一、中文分词插件
NEO4J中文全文索引,分词组件使用IKAnalyzer。为了支持高版本LUCENE,IKAnalyzer需要做一些调整。
ELASTICSEARCH-IKAnlyzer 高版本实现参考
1、分词组件的调整
调整之后的分词组件 casia.isiteam.zdr.wltea
// 调整之后的实现
public final class IKAnalyzer extends Analyzer {
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">// 默认细粒度切分 true-智能切分 false-细粒度切分</span></span></span></span></span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">private</span></span></span></span></span> Configuration configuration <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">Configuration</span><span class="token punctuation">(</span><span class="token boolean"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">false</span></span></span></span></span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">/**
* IK分词器Lucene Analyzer接口实现类
* <p>
* 默认细粒度切分算法
*/</span></span></span></span></span>
<span class="token keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword">public</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> </span></span></span></span><span class="token function"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title">IKAnalyzer</span></span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">(</span></span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">)</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> </span></span></span></span><span class="token punctuation">{</span>
<span class="token punctuation">}</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">/**
* IK分词器Lucene Analyzer接口实现类
*
* </span></span></span><span class="hljs-doctag"><span class="hljs-comment"><span class="hljs-doctag"><span class="hljs-comment"><span class="hljs-doctag"><span class="hljs-comment"><span class="hljs-doctag">@param</span></span></span></span></span></span></span><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"> configuration IK配置
*/</span></span></span></span></span>
<span class="token keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword">public</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> </span></span></span></span><span class="token function"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title">IKAnalyzer</span></span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">(</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">Configuration configuration</span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">)</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> </span></span></span></span><span class="token punctuation">{</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">super</span></span></span></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">this</span></span></span></span></span><span class="token punctuation">.</span>configuration <span class="token operator">=</span> configuration<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">/**
* 重载Analyzer接口,构造分词组件
*/</span></span></span></span></span>
<span class="token annotation punctuation"><span class="hljs-meta"><span class="hljs-meta"><span class="hljs-meta"><span class="hljs-meta">@Override</span></span></span></span></span>
<span class="token keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword"><span class="hljs-function"><span class="hljs-keyword">protected</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> TokenStreamComponents </span></span></span></span><span class="token function"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title"><span class="hljs-function"><span class="hljs-title">createComponents</span></span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">(</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">String fieldName</span></span></span></span></span></span></span></span><span class="token punctuation"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params"><span class="hljs-function"><span class="hljs-params">)</span></span></span></span></span></span></span></span></span><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"><span class="hljs-function"> </span></span></span></span><span class="token punctuation">{</span>
Tokenizer _IKTokenizer <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">IKTokenizer</span><span class="token punctuation">(</span>configuration<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">return</span></span></span></span></span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">TokenStreamComponents</span><span class="token punctuation">(</span>_IKTokenizer<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
}
2、分词测试
自定义分词函数
RETURN zdr.index.iKAnalyzer('复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?',true) AS words
/**
* @param text:待分词文本
* @param useSmart:true 用智能分词,false 细粒度分词
* @return
* @Description: TODO(支持中英文本分词)
*/
@UserFunction(name = "zdr.index.iKAnalyzer")
@Description("Fulltext index iKAnalyzer - RETURN zdr.index.iKAnalyzer({text},true) AS words")
public List<String> iKAnalyzer(@Name("text") String text, @Name("useSmart") boolean useSmart) {
PropertyConfigurator<span class="token punctuation">.</span><span class="token function">configureAndWatch</span><span class="token punctuation">(</span><span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"dic"</span></span></span></span></span> <span class="token operator">+</span> File<span class="token punctuation">.</span>separator <span class="token operator">+</span> <span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"log4j.properties"</span></span></span></span></span><span class="token punctuation">)</span><span class="token punctuation">;</span>
Configuration cfg <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">Configuration</span><span class="token punctuation">(</span>useSmart<span class="token punctuation">)</span><span class="token punctuation">;</span>
StringReader input <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">StringReader</span><span class="token punctuation">(</span>text<span class="token punctuation">.</span><span class="token function">trim</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
IKSegmenter ikSegmenter <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">IKSegmenter</span><span class="token punctuation">(</span>input<span class="token punctuation">,</span> cfg<span class="token punctuation">)</span><span class="token punctuation">;</span>
List<span class="token generics function"><span class="token punctuation"><</span>String<span class="token punctuation">></span></span> results <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">ArrayList</span><span class="token operator"><</span><span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">try</span></span></span></span></span> <span class="token punctuation">{</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">for</span></span></span></span></span> <span class="token punctuation">(</span>Lexeme lexeme <span class="token operator">=</span> ikSegmenter<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> lexeme <span class="token operator">!=</span> <span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">null</span></span></span></span><span class="token punctuation">;</span> lexeme <span class="token operator">=</span> ikSegmenter<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
results<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span>lexeme<span class="token punctuation">.</span><span class="token function">getLexemeText</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">catch</span></span></span></span></span> <span class="token punctuation">(</span><span class="token class-name">IOException</span> e<span class="token punctuation">)</span> <span class="token punctuation">{</span>
e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">return</span></span></span></span></span> results<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
二、样例数据准备
# 构造样例数据
MERGE (a:Loc {name:'A'}) SET a.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (b:Loc {name:'B'}) SET b.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (c:Loc {name:'C'}) SET c.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (d:Loc {name:'D'}) SET d.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (e:Loc {name:'E'}) SET e.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (f:Loc {name:'F'}) SET f.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!吖啶基氨基甲烷磺酰甲氧基苯胺是一种药嘛?'
MERGE (a)-[:ROAD {cost:50}]->(b)
MERGE (a)-[:ROAD {cost:50}]->(c)
MERGE (a)-[:ROAD {cost:100}]->(d)
MERGE (b)-[:ROAD {cost:40}]->(d)
MERGE (c)-[:ROAD {cost:40}]->(d)
MERGE (c)-[:ROAD {cost:80}]->(e)
MERGE (d)-[:ROAD {cost:30}]->(e)
MERGE (d)-[:ROAD {cost:80}]->(f)
MERGE (e)-[:ROAD {cost:40}]->(f);
三、通过中文全文分词组件创建节点索引
自定义创建索引过程
CALL zdr.index.addChineseFulltextIndex('IKAnalyzer', 'Loc', ['description']) YIELD message RETURN message
@Procedure(value = "zdr.index.addChineseFulltextIndex", mode = Mode.WRITE)
@Description("CALL zdr.index.addChineseFulltextIndex(String indexName, String labelName, List<String> propKeys) YIELD message RETURN message," +
"为一个标签下的所有节点的指定属性添加索引")
public Stream<NodeIndexMessage> addChineseFulltextIndex(@Name("indexName") String indexName,
@Name("labelName") String labelName, @Name("properties") List<String> propKeys) {
Label label = Label.label(labelName);
List<span class="token generics function"><span class="token punctuation"><</span>NodeIndexMessage<span class="token punctuation">></span></span> output <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">ArrayList</span><span class="token operator"><</span><span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
// // 按照标签找到该标签下的所有节点
ResourceIterator<Node> nodes = db.findNodes(label);
System.out.println("nodes:" + nodes.toString());
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">int</span></span></span></span></span> nodesSize <span class="token operator">=</span> <span class="token number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number">0</span></span></span></span></span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">int</span></span></span></span></span> propertiesSize <span class="token operator">=</span> <span class="token number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number">0</span></span></span></span></span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">while</span></span></span></span></span> <span class="token punctuation">(</span>nodes<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
nodesSize<span class="token operator">++</span><span class="token punctuation">;</span>
Node node <span class="token operator">=</span> nodes<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"current nodes:"</span></span></span></span></span> <span class="token operator">+</span> node<span class="token punctuation">.</span><span class="token function">toString</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">// 每个节点上需要添加索引的属性</span></span></span></span></span>
Set<span class="token operator"><</span>Map<span class="token punctuation">.</span>Entry<span class="token generics function"><span class="token punctuation"><</span>String<span class="token punctuation">,</span> Object<span class="token punctuation">></span></span><span class="token operator">></span> properties <span class="token operator">=</span> node<span class="token punctuation">.</span><span class="token function">getProperties</span><span class="token punctuation">(</span>propKeys<span class="token punctuation">.</span><span class="token function">toArray</span><span class="token punctuation">(</span><span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number"><span class="hljs-number">0</span></span></span></span></span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">entrySet</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"current node properties"</span></span></span></span></span> <span class="token operator">+</span> properties<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">// 查询该节点是否已有索引,有的话删除</span></span></span></span></span>
Index<span class="token generics function"><span class="token punctuation"><</span>Node<span class="token punctuation">></span></span> index <span class="token operator">=</span> db<span class="token punctuation">.</span><span class="token function">index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">forNodes</span><span class="token punctuation">(</span>indexName<span class="token punctuation">,</span> FULL_INDEX_CONFIG<span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"current node index"</span></span></span></span></span> <span class="token operator">+</span> index<span class="token punctuation">)</span><span class="token punctuation">;</span>
index<span class="token punctuation">.</span><span class="token function">remove</span><span class="token punctuation">(</span>node<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment"><span class="hljs-comment">// 为了该节点的每个需要添加索引的属性添加全文索引</span></span></span></span></span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">for</span></span></span></span></span> <span class="token punctuation">(</span>Map<span class="token punctuation">.</span>Entry<span class="token generics function"><span class="token punctuation"><</span>String<span class="token punctuation">,</span> Object<span class="token punctuation">></span></span> property <span class="token operator">:</span> properties<span class="token punctuation">)</span> <span class="token punctuation">{</span>
propertiesSize<span class="token operator">++</span><span class="token punctuation">;</span>
index<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span>node<span class="token punctuation">,</span> property<span class="token punctuation">.</span><span class="token function">getKey</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> property<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
String message <span class="token operator">=</span> <span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">"IndexName:"</span></span></span></span></span> <span class="token operator">+</span> indexName <span class="token operator">+</span> <span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">",LabelName:"</span></span></span></span></span> <span class="token operator">+</span> labelName <span class="token operator">+</span> <span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">",NodesSize:"</span></span></span></span></span> <span class="token operator">+</span> nodesSize <span class="token operator">+</span> <span class="token string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string"><span class="hljs-string">",PropertiesSize:"</span></span></span></span></span> <span class="token operator">+</span> propertiesSize<span class="token punctuation">;</span>
NodeIndexMessage indexMessage <span class="token operator">=</span> <span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">new</span></span></span></span></span> <span class="token class-name">NodeIndexMessage</span><span class="token punctuation">(</span>message<span class="token punctuation">)</span><span class="token punctuation">;</span>
output<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span>indexMessage<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword"><span class="hljs-keyword">return</span></span></span></span></span> output<span class="token punctuation">.</span><span class="token function">stream</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
四、中文分词索引查询
自定义查询索引过程
CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:吖啶基氨基甲烷磺酰甲氧基苯胺', 100) YIELD node RETURN node
CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:复联* AND year:1999', 100) YIELD node,weight RETURN node.name,node.year,node.description,weight
@Procedure(value = "zdr.index.chineseFulltextIndexSearch", mode = Mode.WRITE)
@Description("CALL zdr.index.chineseFulltextIndexSearch(String indexName, String query, long limit) YIELD node RETURN node," +
"执行LUCENE全文检索,返回前{limit个结果}")
public Stream<ChineseHit> chineseFulltextIndexSearch(@Name("indexName") String indexName,
@Name("query") String query, @Name("limit") long limit) {
if (!db.index().existsForNodes(indexName)) {
log.debug("如果索引不存在则跳过本次查询:`%s`", indexName);
return Stream.empty();
}
return db.index()
.forNodes(indexName, FULL_INDEX_CONFIG)
.query(new QueryContext(query).sortByScore().top((int) limit))
.stream()
.map(ChineseHit::new); // provider
}
【跨标签类型检索】使用addChineseFulltextIndex给标签下节点属性添加的索引,默认可以使用chineseFulltextIndexSearch合并检索出来
// 增加一个非Loc标签的节点,然后使用检索
CREATE (n:LocProvince {name:'P'}) SET n.description='复联终章快上映了好激动,据说知识图谱与人工智能技术应用到了那部电影!' RETURN n
// 节点增加索引(索引名与已有相同)
CALL zdr.index.addChineseFulltextIndex('IKAnalyzer', 'LocProvince', ['description','year']) YIELD message RETURN message
// 通过属性检索节点
CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:复联', 100) YIELD node,weight RETURN node
五、总结
上述NEO4J中文全文索引解决方法,索引不会自动更新,修改节点属性以及新增节点时都需要重新建立索引。
NEO4J默认索引实现参考:neo4j-lucene-index