es基础概念&用法
格式化文档:https://www.wolai.com/tKv51LVKTRAH11tirCMwgg
分词
Standard Analyzer(标准分词,默认)
- 对于英文根据空格,特殊字符(-!@$#%^&*())__+=#等)进行切分
- 对于中文以单个字进行拆分
- 不支持特殊字符的分词,如 ,如果遇到特殊字符会被切分,字符
- 示例
POST _analyze
{
"analyzer": "standard",
"text": "logTag=request_out-test!gantanghao(kuohao,中文"
}
Simple Analyzer(简单切分)
以不是字母,中文的字符进行切分
- 示例
POST _analyze
{
"analyzer": "simple",
"text": "logTag=request_out-test!gantanghao(kuohao,中文s1ss"
}
Whitespace Analyzer(空格切分)
只按空格切分
- 示例
POST _analyze
{
"analyzer": "whitespace",
"text": "logTag=request_out-test !gantanghao(kuohao,中文s1ss"
}
查询分割结果
POST _analyze
{
"analyzer": "simple",
"text": "logTag=request_out,中文测试"
}
查询
简单查询
match&match phrase
match为普通分词匹配,只要被查询的语句的分词能匹配上目标语句的分词就可以被查询到
match phrase为间隔分词匹配,如果查询语句被分词的话,分词的间隔需和被查询语句对应上才会被查询到,间隔默认为0
term
精确匹配未分词的字段,应尽量避免使用term查询text,如果使用,term将会去匹配被查询字段的所有分词
复杂查询
must&filter
两者都为子查询必须匹配,但是filter查询将会忽略评分并能使用上缓存
should
等同于mysql的or查询
- 当
should
在和must
或filter
同一层级时,将会默认不起效果(minimum_should_match
被置为0) 若想在这种情况下生效,需额外配置"minimum_should_match": 1
参数minimum_should_match
的意思为should
条件必须被符合几次 或者使用must再包装一层should
//select _id,dtTime,appName,dateTime from plume_log_run_202205* where dtTime >= now-7d and dtTime < now and (appName = 'score-new' or appName = 'device-http' )
//使用minimum_should_match
GET plume_log_run_202205*/_search
{
"size": 100,
"from": 0,
"_source": [
"_id",
"dtTime",
"appName",
"dateTime"
],
"query": {
"bool": {
"filter": [
{
"range": {
"dtTime": {
"gte": "now-7d",
"lt": "now"
}
}
}
],
"should": [
{
"term": {
"appName": "score-new"
}
},
{
"match": {
"appName": "device-http"
}
}
],
"minimum_should_match": 1
}
},
"highlight": {
"fields": {
"content": {
"fragment_size": 2147483647
}
}
},
"sort": [
{
"dtTime": "desc"
}
]
}
//使用must嵌套
GET plume_log_run_202205*/_search
{
"size": 100,
"from": 0,
"_source": [
"_id",
"dtTime",
"appName",
"dateTime"
],
"query": {
"bool": {
"filter": [
{
"range": {
"dtTime": {
"gte": "now-7d",
"lt": "now"
}
}
}
],
"must": {
"bool": {
"should": [
{
"term": {
"appName": "score-new"
}
},
{
"match": {
"appName": "device-http"
}
}
]
}
}
}
},
"highlight": {
"fields": {
"content": {
"fragment_size": 2147483647
}
}
},
"sort": [
{
"dtTime": "desc"
}
]
}
must_not
子查询必须不匹配 示例:
sql查询
直接查询
返回格式文档
分页文档
查询语法
- 示例1:普通查询(只支持带keyword的字段)
POST /_sql?format=txt
{
"query": "SELECT appName,dtTime,dateTime FROM \"plume_log_run_20220527_*\" order by dtTime desc limit 50"
}
- 示例2:match(普通匹配)
POST /_sql/translate
{
"query": "select * from m_all_plume_log_run_202205 where match (content,'according to error message')"
}
- 示例3:queryString+groupby
POST /_sql?format=txt
{
"query": "select appName,count(*) from m_all_plume_log_run_202205 where query ('content:\"according to error message\"') group by appName"
}
翻译成dsl
- 示例
POST /_sql/translate
{
"query": "select * from test1 where a='a' and b='b'"
}
个人总结
和普通的SQL(Structured Query Language,结构化查询语言)相比,es的查询语法为DSL(Domain Specific Language,领域特定语言)
规则:
- 复合查询(
must/filter/should/must_not
)必须由bool进行组合,就算只存在单个也必须由bool
包裹 - 复合查询(
must/filter/should/must_not
)的同级别必须为复合查询(must/filter/should/must_not
),同理,基础查询的同级别必须为基础查询 - 多个基础查询必须组合在复合查询(
must/filter/should/must_not
)里,不能单独存在于query
或者bool
里 - 复合查询(
must/filter/should/must_not
)建议都使用[]进行后续填充,防止后续修改时混乱
- 创建测试数据
PUT test1
{
"mappings": {
"properties": {
"a": {
"type": "keyword"
},
"b": {
"type": "keyword"
},
"c": {
"type": "keyword"
},
"d": {
"type": "keyword"
},
"e": {
"type": "keyword"
},
"f": {
"type": "keyword"
},
"dtTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
POST /test1/_doc/_bulk
{ "index":{} }
{"a":"a","dtTime":1653536394590,"b":"b","c":"c","d":"d","e":"e","f":"f"}
{ "index":{} }
{"a":"1","dtTime":1653536394590,"b":"b","c":"c","d":"d","e":"e","f":"f"}
{ "index":{} }
{"a":"1","dtTime":1653536394590,"b":"2","c":"c","d":"d","e":"e","f":"f"}
{ "index":{} }
{"a":"a","dtTime":1653536394590,"b":"3","c":"c","d":"d","e":"e","f":"f"}
简单的看,从sql转换为dsl只需把关键字上提,如
- select * from test1 where a='a' and b='b'
#将and转换为must
#又由于must必须在bool里
#所以最后结果为:
GET test1/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"a": "a"
}
},
{
"term": {
"b": "b"
}
}
]
}
}
}
- select * from test1 where (a='a' or a='1') and b='b'
#先将and转为must
#再将or转为should
#由于should不能作为must的子节点,所以用bool拼接
GET test1/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"a": {
"value": "a"
}
}
},
{
"term": {
"a": {
"value": "1"
}
}
}
]
}
},
{
"term": {
"b": "b"
}
}
]
}
}
}
- select * from test1 where (a="1" and b="2") or b='3'
#先将or转为should
#在should里使用bool拼接must
GET test1/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"b": {
"value": "3"
}
}
},
{
"bool": {
"must": [
{
"term": {
"a": {
"value": "1"
}
}
},
{
"term": {
"b": {
"value": "2"
}
}
}
]
}
}
]
}
}
}
新增
添加索引
sort.field
:磁盘排序
number_of_shards
:分片数量
number_of_replicas
:副本数量
refresh_interval
:索引刷新间隔时间,间隔该时间后数据才会被查询到
dynamic_templates
:动态字段映射配置
- 示例
PUT test2
{
"settings": {
"index": {
"sort.field": [
"dtTime",
"seq"
],
"sort.order": [
"desc",
"desc"
]
},
"number_of_shards": 10,
"number_of_replicas": 0,
"refresh_interval": "30s"
},
"mappings": {
"dynamic_templates": [
{
"test_float": {
"match_mapping_type": "string",
"mapping": {
"norms": "false"
}
}
}
],
"properties": {
"appName": {
"type": "keyword"
},
"env": {
"type": "keyword"
},
"appNameWithEnv": {
"type": "keyword"
},
"logLevel": {
"type": "keyword"
},
"serverName": {
"type": "keyword"
},
"traceId": {
"type": "keyword"
},
"dtTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"seq": {
"type": "long"
}
}
}
}
插入数据
单条插入
- 示例
POST /test1/_doc
{
"a": "2022-05-26 11:39:54.590",
"dtTime": 1653536394590,
"b": "INFO",
"c": "run(ClientWorker.java:522)",
"d": "bms-_-dev",
"e": "bms",
"f": "192.168.10.47"
}
批量插入
- 示例
POST /test1/_doc/_bulk
{ "index":{} }
{"a":"a","dtTime":1653536394590,"b":"b","c":"c","d":"d","e":"e","f":"f"}
{ "index":{} }
{"a":"a","dtTime":1653536394590,"b":"b","c":"c","d":"d","e":"e","f":"f"}
其他参数
禁止自动创建索引
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}