打通es及lucene应用,lucene应用es Query,应用完整的es query
首先说,勉强实现了,但不完美,有些额外的应用条件,这也是做减法的最后一篇
因为最初的目标只是提取es query_string的分词对象,其实这一步已经足够满足打通es和其他java,大数据生态
最终因为目标明确,只作query_string,因此选择了做减法的方式
后来又加入了结合非分词查询 类似 term,terms,range
虽然成功了,但是分词和非分词查询是独立的
非分词查询解析json串,分别对每一个块解析,再聚合
类似
{
"terms": {
"domain": [
"www.github.com"
]
}
}
{
"range": {
"date": {
"gte": "2021-02-03T00:00:00",
"lte": "2021-02-04T00:00:00"
}
}
}
而es的查询是可以分词和非分词组合的,并且层层嵌套,尝试下能不直接应用
其实想到这个需求的时候,就很不乐观
{
"query": {
"bool": {
"must": [
{
"terms": {
"domain": [
"www.github.com"
]
}
},
{
"range": {
"date": {
"gte": "2021-02-03T00:00:00",
"lte": "2021-02-04T00:00:00"
}
}
}
]
}
}
}
找到了es query解析类
对应的测试类
提取测试用例代杩
测试用例-不需要分词解析
{
"query": {
"bool": {
"must": [
{
"terms": {
"kw_source": [
"douyin.com"
]
}
},
{
"range": {
"date_idate": {
"gte": "2021-02-03T00:00:00",
"lte": "2021-02-04T00:00:00"
}
}
}
]
}
}
}
对不需要分词的字段,输出如下,转换成功,输出
searchSourceBuilder query: {
"bool" : {
"must" : [
{
"terms" : {
"domain" : [
"www.github.com"
],
"boost" : 1.0
}
},
{
"range" : {
"date" : {
"from" : "2021-02-03T00:00:00",
"to" : "2021-02-04T00:00:00",
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
searchSourceBuilder query: +domain:(www.github.com) +date:[2021-02-03T00:00:00 TO 2021-02-04T00:00:00]
如果加上分词字段
{
"query": {
"bool": {
"must": [
{
"query_string": {
"analyzer": "ik_smart",
"query": "hello word",
"fields": [
"content",
"title"
]
}
},
{
"terms": {
"domain": [
"www.github.com"
]
}
},
{
"range": {
"date": {
"gte": "2021-02-03T00:00:00",
"lte": "2021-02-04T00:00:00"
}
}
}
]
}
}
}
[cclient_test_index] QueryShardException[[query_string] analyzer [ik_smart] not found
]
at org.elasticsearch.index.query.QueryStringQueryBuilder.doToQuery(QueryStringQueryBuilder.java:1011)
at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:105)
at org.elasticsearch.index.query.BoolQueryBuilder.addBooleanClauses(BoolQueryBuilder.java:415)
at org.elasticsearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:383)
其实这里就知道问题在哪了,还是最根本的问题,es通过config,plugin加载插件和分词,通过mapping 设置索引的格式,并应用相查询
而插件和分词我这里都没有,没有ik_smart插件
如果我删除 "analyzer": "ik_smart",
虽然能执行,但是输出错误
searchSourceBuilder query: +MatchNoDocsQuery("unmapped fields [null]") +kw_source:(douyin.com) +date_idate:[2021-02-03T00:00:00 TO 2021-02-04T00:00:00]
我自已的方案通过外部给QueryStringQueryParser 调置filed和analyzer,用自建的QueryStringQueryParser实例执行Query 解析,生成Query
但是es官方是通过QueryStringQueryBuilder执行的,其内部会自已实例化QueryStringQueryParser,并没有外部设置的field和analyzer
因此需要更改QueryStringQueryBuilder内的QueryStringQueryParser实例,为所有QueryStringQueryParser实例设置field和analyzer
这接上一篇,上一篇用QueryStringQueryParser,所以不想改QueryStringQueryBuilder
QueryStringQueryBuilder
功能要求已经越来越逼近es官方原生了,如果一开始就是这个目的,我在初期就是选择作加法的方案
我已经懒得改QueryStringQueryBuilder,准备换做加法的路子,模拟真实的es环境
善事善终吧
QueryStringQueryBuilder 内的QueryStringQueryParser 太多了,这部分代码就有5个实例,为每个实例都设置,重复代码太多
if (defaultField != null) {
if (Regex.isMatchAllPattern(defaultField)) {
queryParser = new QueryStringQueryParser(context, lenient == null ? true : lenient);
} else {
queryParser = new QueryStringQueryParser(context, defaultField, isLenient);
}
} else if (fieldsAndWeights.size() > 0) {
final Map<String, Float> resolvedFields = QueryParserHelper.resolveMappingFields(context, fieldsAndWeights);
queryParser = new QueryStringQueryParser(context, resolvedFields, isLenient);
} else {
List<String> defaultFields = context.defaultFields();
if (context.getMapperService().allEnabled() == false &&
defaultFields.size() == 1 && AllFieldMapper.NAME.equals(defaultFields.get(0))) {
// For indices created before 6.0 with _all disabled
defaultFields = Collections.singletonList("*");
}
boolean isAllField = defaultFields.size() == 1 && Regex.isMatchAllPattern(defaultFields.get(0));
if (isAllField) {
queryParser = new QueryStringQueryParser(context, lenient == null ? true : lenient);
} else {
final Map<String, Float> resolvedFields = QueryParserHelper.resolveMappingFields(context,
QueryParserHelper.parseFieldsAndWeights(defaultFields));
queryParser = new QueryStringQueryParser(context, resolvedFields, isLenient);
}
}
考虑加在构造函数里,做到在外部为QueryStringQueryParser设置静态变量out_forceAnalyzer,out_multiFields,实例化时统一加载
static Analyzer out_forceAnalyzer=null;
static List<String> out_multiFields=null;
private QueryStringQueryParser(QueryShardContext context, String defaultField,
Map<String, Float> fieldsAndWeights,
boolean lenient, Analyzer analyzer) {
super(defaultField, analyzer);
this.context = context;
this.fieldsAndWeights = Collections.unmodifiableMap(fieldsAndWeights);
this.queryBuilder = new MultiMatchQuery(context);
queryBuilder.setZeroTermsQuery(MatchQuery.ZeroTermsQuery.NULL);
queryBuilder.setLenient(lenient);
this.lenient = lenient;
if(out_forceAnalyzer!=null){
this.forceAnalyzer=out_forceAnalyzer;
}
if(out_multiFields!=null){
this.multiFields=out_multiFields;
}
}
QueryStringQueryParser.outMultiFields= Arrays.asList("content","title");
QueryStringQueryParser.outForceAnalyzer=new StandardAnalyzer();
try (XContentParser parser = queryParser.createParser(JsonXContent.jsonXContent, restContent)) {
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.fromXContent(parser);
System.out.println("searchSourceBuilder: "+searchSourceBuilder);
System.out.println("searchSourceBuilder query: "+searchSourceBuilder.query());
System.out.println("searchSourceBuilder query: "+searchSourceBuilder.query().toQuery(qc));
}
searchSourceBuilder query: +((title:hello title:word) | (content:hello content:word)) +domain:(www.github.com) +date:[2021-02-03T00:00:00 TO 2021-02-04T00:00:00]
验证成功
也做到了前一篇,完成了QueryStringQueryBuilder实现生成Query的尾巴
目前对不含query_string分词字段的es查询语句可以做到直接解析
但是对含query_string分词字段的es查询语句只能做到部分成功,且有要求
-
1 如果analyzer 为官方默认集成在es内的则不需要变更,例如standard
-
2 如果analyzer 为第三方插件,安装的,例如https://github.com/medcl/elasticsearch-analysis-ik
"query_string": { "analyzer": "ik_smart", "query": "hello word", "fields": [ "content", "title" ] }
则需要删除"analyzer": "ik_smart"这一行,不然会报错[cclient_test_index] QueryShardException[[query_string] analyzer [ik_smart] not found
原因也说过很多次了,这种方案,没有加载第三方分词插件的部分
-
需要手动设置需要的分词项及分词算法,这里的示例是new StandardAnalyzer(); 实际可以外部传入ik的StandardAnalyzer
QueryStringQueryParser.outMultiFields= Arrays.asList("content","title"); QueryStringQueryParser.outForceAnalyzer=new StandardAnalyzer();
方案讲解告一段落
也不会有更进一步了
因为要求越来越高的es兼容,而我做减法的方式,丢失了一部分信息,应该换条路来调研
目前的方案虽不完美,但足够胜任一部分查询场景
只限查询,没有es的聚合,聚合需要外部实现,本来搞些方案就是为了打通其他olap系统,聚合可以用这些来做,类似sql,druid clickhouse kylin等方案
另外以上其于6.8版本做的,es大版本已经7.10.2/7.11.1了,如果有后续,会针对新版本来做