Elasticsearch的SQL查询
使用SQL查询ES有一定的局限性,没有原生的Query DSL那么强大,对于嵌套属性和某些函数的支持并不怎么好,但是平时用来查询下数据基本够用了。
在线sql转dsl:http://www.atotoa.com/
elasticsearch-sql插件
注:本人安装的是es 7.x(7.10.2),默认是支持SQL的,所以不用安装elasticsearch-sql插件。
1、下载 https://github.com/NLPchina/elasticsearch-sql
2、下载elasticsearch-sql,修改名字为sql。放在es解压目录的plugins文件中即可。重启es则sql安装完毕
支持的SQL语法
首先我们创建一个索引:
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields":{"keyword": {"type": "keyword","ignore_above": 256}} //可以对一个字段提供多种索引模式,同一个字段的值,一个分词,一个不分词
},
"author": {
"type": "text",
"fields":{"keyword": {"type": "keyword","ignore_above": 256}}
},
"release_date": {
"type": "date",
"format":"yyyy-MM-dd"
},
"page_count": {
"type": "integer"
}
}
}
}
这里要特别注意:如果要用到like这种模糊操作,但是type是text的,那么就还需要为该字段设置不分词的索引方式,即type=keyword,如上面的name和author字段。
批量插入数据(使用Postman)
PUT /book_info/_bulk
{ "index" : { "_id" : "101" } }
{ "name" : "hello","author":"tom","release_date":"2020-01-26","page_count": 500 }
{ "index" : { "_id" : "102" } }
{ "name" : "world","author":"Marry","release_date":"2018-01-26","page_count": 352 }
{ "index" : { "_id" : "103" } }
{ "name" : "test","author":"test","release_date":"2019-03-26","page_count": 436 }
{ "index" : { "_id" : "104" } }
{ "name" : "demo","author":"demo","release_date":"2020-06-26","page_count": 337 }
{ "index" : { "_id" : "105" } }
{ "name" : "zhangsan","author":"zhangsan","release_date":"2015-05-21","page_count": 562 }
{ "index" : { "_id" : "106" } }
{ "name" : "zhangsan02","author":"zhangsan","release_date":"2015-05-21","page_count": 562 }
#这里需要换行
语法格式
语法格式:/_sql?format=txt
format 返回格式,不设置则默认返回JSON (支持,csv,txt,json,yaml。
请求体body参数:
- query:要查询的sql参数,如:"select * from user"
- fetch_size:一个响应中要返回的最大行(或条目)数,默认是1000
- filter:可选的Elasticsearch查询DSL,以进行其他过滤。
- request_timeout:请求失败之前的超时,默认是90s
- page_timeout:分页请求失败之前的超时,默认是45s
- columnar:以列方式而不是基于行的方式返回结果。有效期为json,yaml,cbor和smile。默认是false。
- field_multi_value_leniency:当遇到一个字段的多个值(默认值)或宽大处理时抛出异常,并从列表中返回第一个值(不保证将要返回的值-通常以自然的升序方式第一个)。默认是false。
- index_include_frozen:是否在查询执行中包括冻结索引(默认值)。默认是false。
条件查询
1、单条件查询
POST /_sql?format=txt
{
"query": "SELECT * FROM book_info where name = 'hello'"
}
2、多条件查询
POST /_sql?format=txt
{
"query": "SELECT * FROM book_info where name = 'hello' and author='tom'"
}
排序+限制行数
POST /_sql?format=txt
{
"query": "SELECT * FROM book_info ORDER BY release_date DESC LIMIT 2"
}
排序+限制返回字段
POST /_sql?format=txt
{
"query": "SELECT name,release_date,page_count FROM book_info ORDER BY release_date DESC LIMIT 3"
}
时间比较查询
POST /_sql?format=txt
{
"query": "SELECT * FROM book_info WHERE release_date < '2020-01-01' LIMIT 3"
}
聚合查询
POST /_sql?format=txt
{
"query": "SELECT sum(page_count) FROM book_info"
}
POST /_sql?format=txt
{
"query": "SELECT sum(page_count),name FROM book_info group by name"
}
POST /_sql?format=txt
{
"query": "SELECT avg(page_count) FROM book_info"
}
POST /_sql?format=txt
{
"query": "SELECT min(page_count) FROM book_info"
}
POST /_sql?format=txt
{
"query": "SELECT max(page_count) FROM book_info"
}
count查询
POST /_sql?format=txt
{
"query": "SELECT count(*),count(distinct author) FROM book_info "
}
分组查询
POST /_sql?format=txt
{
"query": "SELECT author,count(*) as count FROM book_info group by author"
}
SQL转DSL
普通sql
POST /_sql/translate
{
"query": "SELECT name,author FROM book_info where name ='hello'",
"fetch_size": 1
}
右模糊
POST /_sql/translate
{
"query": "SELECT name,author FROM book_info where name like 'hello%'",
"fetch_size": 10
}
时间区间
POST /_sql/translate
{
"query": "SELECT name,author,release_date FROM book_info WHERE release_date >'2019-01-01' and release_date < '2020-01-01' LIMIT 3"
}
常用的SQL总结
在ES中使用SQL查询的语法与在数据库中使用基本一致,具体格式如下:
SELECT select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
WHERE
可以使用WHERE语句设置查询条件,比如查询state字段为VA的记录,查询语句如下:
POST /_sql?format=txt
{
"query": "SELECT account_number,address,age,balance,state FROM account WHERE state='VA' LIMIT 10 "
}
GROUP BY
我们可以使用GROUP BY语句对数据进行分组,统计出分组记录数量,最大age和平均balance等信息,查询语句如下:
POST /_sql?format=txt
{
"query": "SELECT state,COUNT(*),MAX(age),AVG(balance) FROM account GROUP BY state LIMIT 10"
}
HAVING
我们可以使用HAVING语句对分组数据进行二次筛选,比如筛选分组记录数量大于15的信息,查询语句如下:
POST /_sql?format=txt
{
"query": "SELECT state,COUNT(*),MAX(age),AVG(balance) FROM account GROUP BY state HAVING COUNT(*)>15 LIMIT 10"
}
ORDER BY
我们可以使用ORDER BY语句对数据进行排序,比如按照balance字段从高到低排序,查询语句如下:
POST /_sql?format=txt
{
"query": "SELECT account_number,address,age,balance,state FROM account ORDER BY balance DESC LIMIT 10 "
}
DESCRIBE
我们可以使用DESCRIBE语句查看表(ES中为索引)中有哪些字段,比如查看account表的字段,查询语句如下:
POST /_sql?format=txt
{
"query": "DESCRIBE account"
}
SHOW TABLES
我们可以使用SHOW TABLES查看所有的表(ES中为索引)。
POST /_sql?format=txt
{
"query": "SHOW TABLES"
}
支持的函数
使用SQL查询ES中的数据,不仅可以使用一些SQL中的函数,还可以使用一些ES中特有的函数。
1、我们可以使用SHOW FUNCTIONS语句查看所有支持的函数,比如搜索所有带有DATE字段的函数可以使用如下语句:
POST /_sql?format=txt
{
"query": "SHOW FUNCTIONS LIKE '%DATE%'"
}
2、全文搜索函数是ES中特有的,当使用MATCH或QUERY函数时,会启用全文搜索功能,SCORE函数可以用来统计搜索评分。
(1)使用MATCH函数查询address中包含Street的记录。
POST /_sql?format=txt
{
"query": "SELECT account_number,address,age,balance,SCORE() FROM account WHERE MATCH(address,'Street') LIMIT 10"
}
(2)使用QUERY函数查询address中包含Street的记录。
POST /_sql?format=txt
{
"query": "SELECT account_number,address,age,balance,SCORE() FROM account WHERE QUERY('address:Street') LIMIT 10"
}