学习过es,但是每次学习,感觉都不同,今天重新做一次梳理。
gitee:https://gitee.com/juncaoit/fast
一:ELK介绍
1.说明
elasticsearch,logstash,kibana
2.介绍
elasticsearch:全文搜搜引擎,基于java。
logstash:具有实时传输的数据收集引擎,用来数据收集
kibana:提供分析与可视化。可以再es索引中查找,交互数据,生成各种维度的表格,图形
3.为什么使用
elasticsearch:
数据量庞大
搜索要求快,准,多维度的使用
logstash:
数据源丰富,数据库,日志,分散的数据都可以收集
kibana:
分析展示,展示数据的价值
二:单机安装es
1.上传
规划目录:
上传:
2.解压
tar -zxvf elasticsearch-7.6.1-linux-x86_64.tar.gz -C ../software/
3.配置jdk
删除不必要的jdk:
配置:
export JAVA_HOME=/opt/software/elasticsearch-7.6.1/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
在es的目录下,有jdk,版本是13的
4.添加用户
groupadd tedu
useradd tedu -g tedu
chown -R tedu:tedu elasticsearch-7.6.1/
5.启动
su tedu
./bin/elasticsearch
6.测试
curl "localhost:9200"
三:集群安装es
1.三台安装
2.修改配置文件config/
es01节点(后面只是name不同)
cluster.name: es-cluster
node.name: es01
path.data: /opt/software/elasticsearch-7.6.1/data
path.logs: /opt/software/elasticsearch-7.6.1/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300
discovery.seed_hosts: ["192.168.19.132", "192.168.19.133", "192.168.19.134"]
cluster.initial_master_nodes: ["192.168.19.132"]
http.cors.enabled: true
http.cors.allow-origin: "*"
3.出现的问题进行处理
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
【1】的处理:
vi /etc/security/limits.conf
添加:
* soft nofile 65536
* hard nofile 65536
切换成root,使用下面命令查询效果
ulimit -Hn
ulimit -Sn
【2】的处理
vi /etc/sysctl.conf
添加:
vm.max_map_count=262144
然后重启:
reboot
4.测试
四:head插件
1.安装nodejs
head插件的运行环境是node
wget https://nodejs.org/dist/v15.0.0/node-v15.0.0-linux-x64.tar.gz
2.解压
tar -zxvf node-v15.0.0-linux-x64.tar.gz -C ../software/
3.配置环境变量
export NODE_HOME=/opt/software/node-v15.0.0-linux-x64
export PATH=$NODE_HOME/bin:$PATH
验证:
4.下载head插件
git:https://github.com/mobz/elasticsearch-head
yum -y install git
git clone https://github.com/mobz/elasticsearch-head.git
git一直没有拉下来,下载了zip
解压:
unzip elasticsearch-head-master.zip -d ../software/
5.npm安装
在head插件文件夹的根目录下执行npm安装
npm install -g grunt
yum install bzip2
npm insall
6.配置head文件
GruntFile.js中97行配置,添加hostname
注意添加引号
7.head根目录下启动
grunt server
8.访问
192.168.19.132:9100
启动两个es节点,观察
、
五:kibana
1.解压
版本需要对应
tar -zxvf kibana-7.6.1-linux-x86_64.tar.gz -C ../software/
2.配置
config/kibana.yml
server.host: "192.168.19.132"
elasticsearch.hosts: ["http://192.168.19.132:9200"]
3.启动
bin/kibana --allow-root
4.访问
http://192.168.19.132:5601
六:重要概念
1.分片
单台存储是有限的,es可以将一个index的数据分为多个分片
2.rest方式
curl格式:curl -H 请求头 -d 请求体 -X POST 接口地址
例如:
新增一个索引文件,并且以漂亮格式展示响应
curl -X PUT http://localhost:9200/person/_doc/1?pretty -H "Content-type:application/json" -d '{"name":"laoshi"}'
效果:
head上:
七:索引管理
1.索引的创建
put请求,表示新增
# 新增索引
PUT /index01
2.插入文档
put请求
# 索引中添加文档数据
PUT /index01/_doc/1
{
"name":"tom"
}
3.查询文档
# 查询文档
GET /index01/_doc/1
4.更新文档
使用put再做一次相同的改变
# 更新
PUT /index01/_doc/1
{
"name":"tom2"
}
只有version有变化
5.删除文档
# 删除文档
DELETE /index01/_doc/1
6.删除索引
#删除索引
DELETE /index01
7.批量索引
减少网络往返
# 批量索引
PUT /index05/_bulk
{"index":{"_id":"1"}}
{"id":"1", "name":"雅典娜", "job":"html", "age":"38", "salary":20000, "gender":"female", "like":"牛奶"}
{"index":{"_id":"2"}}
{"id":"2", "name":"马云云", "job":"html", "age":"22", "salary":35000, "gender":"male", "like":"香蕉"}
{"index":{"_id":"3"}}
{"id":"1", "name":"强东", "job":"go", "age":"24", "salary":10000, "gender":"male", "like":"苹果"}
{"index":{"_id":"4"}}
{"id":"1", "name":"小马", "job":"python", "age":"18", "salary":50000, "gender":"male", "like":"李子"}
八:搜索功能
1.match_all
query查询类型
# 查询功能
GET /index05/_search
{
"query": {
"match_all": {}
}
}
2.term(词项)
词项,分词计算的基本单位
中文会被拆分,例如李老师,则是李,老,师
term查询,返回的文档包含了提供的确切词项的文档,如果没有包含,则不展示
#term查询
GET /index05/_search
{
"query": {
"term": {
"name": {
"value": "马"
}
}
}
}
效果:【符合预期】
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.7549127,
"hits" : [
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.7549127,
"_source" : {
"id" : "1",
"name" : "小马",
"job" : "python",
"age" : "18",
"salary" : 50000,
"gender" : "male",
"like" : "李子"
}
},
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6407243,
"_source" : {
"id" : "2",
"name" : "马云云",
"job" : "html",
"age" : "22",
"salary" : 35000,
"gender" : "male",
"like" : "香蕉"
}
}
]
}
}
在看一个,term是马云
#term查询
GET /index05/_search
{
"query": {
"term": {
"name": {
"value": "马云"
}
}
}
}
效果:为空
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
3.boost加权
是的查询结果的文档评分最终会乘以boost的结果进行返回
在value下添加
主要是组合查询了,不同的条件添加不同的权重
#boost加权
GET /index05/_search
{
"query": {
"term": {
"name": {
"value": "马",
"boost": 2
}
}
}
}
效果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.5098253,
"hits" : [
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.5098253,
"_source" : {
"id" : "1",
"name" : "小马",
"job" : "python",
"age" : "18",
"salary" : 50000,
"gender" : "male",
"like" : "李子"
}
},
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.2814486,
"_source" : {
"id" : "2",
"name" : "马云云",
"job" : "html",
"age" : "22",
"salary" : 35000,
"gender" : "male",
"like" : "香蕉"
}
}
]
}
}
4.range
返回一个范围内包含的文档
#range
GET /index05/_search
{
"query": {
"range": {
"salary": {
"gte": 10000,
"lte": 30000
}
}
}
}
5.exist
包含这个字段,则返回
文档的字段不存在的原因:
写入的索引字段值在json中是null或者[]
字段设置了“index”:false的映射导致不会写到索引中
字段设置了ignore_above,当超过长度不会写入索引
# exist
GET /index05/_search
{
"query": {
"exists": {
"field": "name"
}
}
}
6.match
先分词计算
这里可以发现与term不一样。通过分词之后的数据进行查询
默认是或的关系
#match
GET /index05/_search
{
"query": {
"match": {
"name": "马云"
}
}
}
效果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.2080264,
"hits" : [
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.2080264,
"_source" : {
"id" : "2",
"name" : "马云云",
"job" : "html",
"age" : "22",
"salary" : 35000,
"gender" : "male",
"like" : "香蕉"
}
},
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.7549127,
"_source" : {
"id" : "1",
"name" : "小马",
"job" : "python",
"age" : "18",
"salary" : 50000,
"gender" : "male",
"like" : "李子"
}
}
]
}
}
调整逻辑关系
相关性更加明确
GET /index05/_search
{
"query": {
"match": {
"name":{
"query": "马云",
"operator": "and"
}
}
}
}
效果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.2080264,
"hits" : [
{
"_index" : "index05",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.2080264,
"_source" : {
"id" : "2",
"name" : "马云云",
"job" : "html",
"age" : "22",
"salary" : 35000,
"gender" : "male",
"like" : "香蕉"
}
}
]
}
}
7.bool
定义多个子查询
GET /index05/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "马云"
}
},
{
"term": {
"job": {
"value": "python"
}
}
}
]
}
}
}
must与must_not结合使用的逻辑关系:
should:查询结果可能是也可能不是这个条件的子集,should和must同时使用,should的唯一作用就是影响最终的相关性的评分计算。
filter:查询结果必须是该条件的子集,但是满足filter子条件的结果要忽略评分,也就是其他的子条件的查询评分不会为filter的存在而变化
九:索引的映射设置
1.mapping
决定如何存储,如何生成存储,定义字段类型
存在静态映射与动态映射
2.动态映射
## 动态索引 PUT /index01/_doc/1 { "name":"小马", "job":"python", "age":"18", "salary":50000, "gender":"male", "like":"李子", "address":"北京市大兴区", "sorted":false, "emplotedTime":"2020-01-12", "location": { "lat":"41.43", "lon":"67.98" }, "ip":"192.168.5.102" } GET /index01/_mapping
效果:
{ "index01" : { "mappings" : { "properties" : { "address" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "age" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "emplotedTime" : { "type" : "date" }, "gender" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "ip" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "job" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "like" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "location" : { "properties" : { "lat" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "lon" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "salary" : { "type" : "long" }, "sorted" : { "type" : "boolean" } } } } }
3.结构
字符串类型:
fields是一个可选的属性,它表示给当前字段的扩展属性,扩展了一个keyword。具备了text'的特点,也具备了keyword的特点
上面的可以查询到,因为分词;下面的反而搜不到,因为存的是一个整的词。
GET /index01/_search { "query": { "term": { "address": { "value": "北" } } } } GET /index01/_search { "query": { "term": { "address.keyword": { "value": "北" } } } }
4.整数类型
默认long
5.浮点类型
默认double
6.日期
默认对应是date,是因为几种类型被识别
7.对象
8.添加静态映射
#添加mapping PUT /index02 { "mappings": { "properties": { "email":{ "type": "keyword" } } } } GET /index02/_mapping PUT /index02/_doc/1 { "email":"1354488@qq.com", "name":"tom" } GET /index02/_doc/1 GET /index02/_mapping
效果:
不存在的则按照动态mapping生成。
{ "index02" : { "mappings" : { "properties" : { "email" : { "type" : "keyword" }, "name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } } }
9.索引之后,添加静态映射
# 后添加映射 PUT /index04 PUT /index04/_mapping { "properties":{ "name":{ "type":"text" } } } GET /index04/_mapping
十:分词器与热词设置
1.分词
主要有Tokenization与Normalization
Tokenization:将文本分成一小块一小块,称之为token
Mormalization:词条允许在单个术语上进行匹配,q允许精确匹配,还可以使用相关性查询
2.分词器
自带的分词器,standard analyzer
# 分词器测试 POST /_analyze { "text": ["王者荣耀"], "analyzer": "standard" }
效果:
{ "tokens" : [ { "token" : "王", "start_offset" : 0, "end_offset" : 1, "type" : "<IDEOGRAPHIC>", "position" : 0 }, { "token" : "者", "start_offset" : 1, "end_offset" : 2, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "荣", "start_offset" : 2, "end_offset" : 3, "type" : "<IDEOGRAPHIC>", "position" : 2 }, { "token" : "耀", "start_offset" : 3, "end_offset" : 4, "type" : "<IDEOGRAPHIC>", "position" : 3 } ] }
3.ik分词器
上传
解压到es的plugins/ik
unzip elasticsearch-analysis-ik-7.6.1.zip -d ../software/elasticsearch-7.6.1/plugins/ik
启动es
没有上传ik的不用重启
是否加载
[es01] try load config from /opt/software/elasticsearch-7.6.1/plugins/ik/config/IKAnalyzer.cfg.xml
检验:
#ik POST /_analyze { "text": ["疑是银河落九天"], "analyzer": "ik_max_word" }
效果:
{ "tokens" : [ { "token" : "疑是银河落九天", "start_offset" : 0, "end_offset" : 7, "type" : "CN_WORD", "position" : 0 }, { "token" : "疑是", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "银河", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "落九天", "start_offset" : 4, "end_offset" : 7, "type" : "CN_WORD", "position" : 3 }, { "token" : "九天", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 4 }, { "token" : "九", "start_offset" : 5, "end_offset" : 6, "type" : "TYPE_CNUM", "position" : 5 }, { "token" : "天", "start_offset" : 6, "end_offset" : 7, "type" : "COUNT", "position" : 6 } ] }
4.分词进步
#ik POST /_analyze { "text": ["王者荣耀"], "analyzer": "ik_max_word" }
效果:不认识王者荣耀四个字
{ "tokens" : [ { "token" : "王者", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "荣耀", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 1 } ] }
5.本地ik词典的配置
在我们解压的ik分词器文件夹中/plugins/ik/config有一个xml配置文件可以指定词典使用。
非热加载的方式处理:
/opt/software/elasticsearch-7.6.1/plugins/ik/config vi my_main.dic
添加:
测试效果:
{ "tokens" : [ { "token" : "王者荣耀", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "王者", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "荣耀", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 } ] }
6.本地ik词典的配置
下载安装tomcat
tar -zxvf apache-tomcat-9.0.64.tar.gz -C ../software/
然后进入root目录
/opt/software/apache-tomcat-9.0.64/webapps/ROOT
vi hot.dic
启动tomcat
# bin/startup.sh
访问:
在ik远程字典中进行配置,然后启动es
校验效果:
{ "tokens" : [ { "token" : "王者荣耀", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "王者荣", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 1 }, { "token" : "王者", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 2 }, { "token" : "荣耀", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 3 } ] }
十一:java调用
1.pom
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.6.2</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-client</artifactId> <version>7.6.2</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>7.6.2</version> </dependency>
2.新增索引
/** * 新建索引 */ @Override public CreateIndexResponse createIndex(String index) throws IOException { CreateIndexRequest createRequest = new CreateIndexRequest(index); createRequest.settings(Settings.builder() .put("number_of_shards", "3") .put("number_of_replicas", "2")); CreateIndexResponse createIndexResponse = highLevelClient.indices().create(createRequest, RequestOptions.DEFAULT); return createIndexResponse; }
3.删除索引
/** * 删除索引 */ @Override public AcknowledgedResponse deleteIndex(String index) throws IOException { DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest(index); return highLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT); }
4.新增文档
/** * 增加文档 */ @Override public boolean add(EsDto esDto, String index, String id){ // 执行 IndexRequest indexRequest = new IndexRequest(index).id(id).source(esDto.getJsonStr(), XContentType.JSON); try { IndexResponse response = highLevelClient.index(indexRequest, RequestOptions.DEFAULT); log.info("增加返回结果:{}", JSONObject.toJSON(response)); } catch (IOException e) { e.printStackTrace(); } return true; }
5.查询文档
/** * 查询文档 */ @Override public Map get(String index, String id){ GetRequest getRequest = new GetRequest(index, id); try { GetResponse response = highLevelClient.get(getRequest, RequestOptions.DEFAULT); return response.getSource(); } catch (IOException e) { e.printStackTrace(); } return Maps.newHashMap(); }
6.判断是否存在
/** * 是否存在文档 */ @Override public Boolean exist(String index, String id){ GetRequest getRequest = new GetRequest(index, id); try { boolean exists = highLevelClient.exists(getRequest, RequestOptions.DEFAULT); return exists; } catch (IOException e) { e.printStackTrace(); } return false; }
7.删除文档
/** * 删除文档 */ @Override public boolean delete(String index, String id){ DeleteRequest deleteRequest = new DeleteRequest(index, id); try { DeleteResponse response = highLevelClient.delete(deleteRequest, RequestOptions.DEFAULT); log.info("删除文档返回结果:{}", JSONObject.toJSON(response)); return true; } catch (IOException e) { e.printStackTrace(); } return false; }
8.更新文档
index必须存在
/** * 更新文档 */ @Override public boolean update(EsDto esDto, String index, String id){ UpdateRequest updateRequest = new UpdateRequest(index, id).doc(esDto.getJsonStr()); try { UpdateResponse response = highLevelClient.update(updateRequest, RequestOptions.DEFAULT); log.info("更新文档返回结果:{}", JSONObject.toJSON(response)); return true; } catch (IOException e) { e.printStackTrace(); } return false; }
9.批量新增
索引可以不存在
public void bulk(String index) throws IOException { BulkRequest request = new BulkRequest(); request.add(new IndexRequest(index).id("2") .source(XContentType.JSON, "age", "18", "address", "长江中下游")); request.add(new IndexRequest(index).id("3") .source(XContentType.JSON, "age", "20", "address", "长江下游")); highLevelClient.bulk(request, RequestOptions.DEFAULT); }
10.bool查询
/** * bool查询 */ @Override public List<String> searchBool(String index, String key, String value) { List<String> result = Lists.newArrayList(); BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); // query1 TermQueryBuilder query1 = QueryBuilders.termQuery(key, value); // query2 ExistsQueryBuilder query2 = QueryBuilders.existsQuery(key); // 组合 boolQueryBuilder.must(query1); boolQueryBuilder.must(query2); SearchResponse searchResponse = commonQuery(index, boolQueryBuilder, 0, 10); if(Objects.nonNull(searchResponse)){ // 解析 SearchHit[] hits = searchResponse.getHits().getHits(); List resultList = Lists.newArrayList(); for (SearchHit hit : hits){ String sourceAsString = hit.getSourceAsString(); result.add(sourceAsString); } return result; } return result; } /** * 公共的查询 */ private SearchResponse commonQuery(String index, QueryBuilder queryBuilder, int from, int size){ SearchRequest searchRequest = new SearchRequest(index); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(queryBuilder); searchSourceBuilder.from(from); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); try { return highLevelClient.search(searchRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } return null; }
11.match查询
/** * match查询 */ @Override public SearchResponse searchMatch(String index, String key, String value){ // 组装 MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery(key, value); // 公共查询 return commonQuery(index, matchQueryBuilder, 0, 10); }
十二:聚合
1.stat
GET /index01/_search { "aggs": { "NAME": { "stats": { "field": "salary" } } } }
效果:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "index01", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "小马", "job" : "python", "age" : "18", "salary" : 50000, "gender" : "male", "like" : "李子", "address" : "北京市大兴区", "sorted" : false, "emplotedTime" : "2020-01-12", "location" : { "lat" : "41.43", "lon" : "67.98" }, "ip" : "192.168.5.102" } } ] }, "aggregations" : { "NAME" : { "count" : 1, "min" : 50000.0, "max" : 50000.0, "avg" : 50000.0, "sum" : 50000.0 } } }
2.job中都会html的个数
桶的概念
GET /index05/_search { "aggs": { "NAME": { "terms": { "field": "job.keyword", "size": 10 } } } }
效果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "index05", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "id" : "1", "name" : "雅典娜", "job" : "html", "age" : "38", "salary" : 20000, "gender" : "female", "like" : "牛奶" } }, { "_index" : "index05", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "_source" : { "id" : "2", "name" : "马云云", "job" : "html", "age" : "22", "salary" : 35000, "gender" : "male", "like" : "香蕉" } }, { "_index" : "index05", "_type" : "_doc", "_id" : "3", "_score" : 1.0, "_source" : { "id" : "1", "name" : "强东", "job" : "go", "age" : "24", "salary" : 10000, "gender" : "male", "like" : "苹果" } }, { "_index" : "index05", "_type" : "_doc", "_id" : "4", "_score" : 1.0, "_source" : { "id" : "1", "name" : "小马", "job" : "python", "age" : "18", "salary" : 50000, "gender" : "male", "like" : "李子" } } ] }, "aggregations" : { "NAME" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "html", "doc_count" : 2 }, { "key" : "go", "doc_count" : 1 }, { "key" : "python", "doc_count" : 1 } ] } } }
3.子聚合
不同job下对like的喜欢不同做统计
GET /index05/_search { "size": 0, "aggs": { "job-info": { "terms": { "field": "job.keyword" }, "aggs": { "jon-like-info": { "terms": { "field": "like.keyword", "size": 10 } } } } } }
效果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "job-info" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "html", "doc_count" : 2, "jon-like-info" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "牛奶", "doc_count" : 1 }, { "key" : "香蕉", "doc_count" : 1 } ] } }, { "key" : "go", "doc_count" : 1, "jon-like-info" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "苹果", "doc_count" : 1 } ] } }, { "key" : "python", "doc_count" : 1, "jon-like-info" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "李子", "doc_count" : 1 } ] } } ] } } }
4.不同工种的薪资水平
GET /index05/_search { "size": 0, "aggs": { "job-info": { "terms": { "field": "job.keyword" }, "aggs": { "jon-salary": { "stats": { "field": "salary" } } } } } }
5.先查询,后sum
GET /index05/_search
{
"query": {
"match": {
"name": "马"
}
},
"aggs": {
"salary_sum": {
"sum": {
"field": "salary"
}
}
}
}
6.多层聚合
分桶的时候,才能一层层进行聚合
# 三層聚合
GET /index05/_search
{
"size": 0,
"aggs": {
"job-info": {
"terms": {
"field": "job.keyword"
},
"aggs": {
"jon-salary": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"NAME": {
"stats": {
"field": "salary"
}
}
}
}
}
}
}
}
7.top-hits
# top-hits
GET /index05/_search
{
"size": 0,
"aggs": {
"NAME": {
"top_hits": {
"size": 2,
"sort": [
{
"salary": {
"order": "desc"
}
}
],
"_source": {
"includes": ["id", "name"]
}
}
}
}
}
8.range
#rangge
GET /index05/_search
{
"size": 0,
"aggs": {
"range_info": {
"range": {
"field": "salary",
"ranges": [
{
"key": "D",
"from": 5000,
"to": 10000
},
{
"key": "C",
"from": 10000,
"to": 20000
}
]
}
}
}
}
9.直方图
# 直方图 GET /index05/_search { "size": 0, "aggs": { "NAME": { "histogram": { "field": "salary", "interval": 5000 } } } }
进一步做正真的直方图:
# 批量索引 PUT /index06/_bulk {"index":{"_id":"1"}} {"id":"1", "name":"雅典娜", "job":"html", "age":38, "salary":20000, "gender":"female", "like":"牛奶"} {"index":{"_id":"2"}} {"id":"2", "name":"马云云", "job":"html", "age":22, "salary":35000, "gender":"male", "like":"香蕉"} {"index":{"_id":"3"}} {"id":"1", "name":"强东", "job":"go", "age":24, "salary":10000, "gender":"male", "like":"苹果"} {"index":{"_id":"4"}} {"id":"1", "name":"小马", "job":"python", "age":18, "salary":50000, "gender":"male", "like":"李子"} {"index":{"_id":"5"}} {"id":"1", "name":"小军", "job":"java", "age":18, "salary":50000, "gender":"male", "like":"雪花"}
# 直方图 GET /index06/_search { "size": 0, "aggs": { "age-info": { "histogram": { "field": "age", "interval": 5 }, "aggs": { "salary-info": { "avg": { "field": "salary" } } } } } }
效果:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "age-info" : { "buckets" : [ { "key" : 15.0, "doc_count" : 2, "salary-info" : { "value" : 50000.0 } }, { "key" : 20.0, "doc_count" : 2, "salary-info" : { "value" : 22500.0 } }, { "key" : 25.0, "doc_count" : 0, "salary-info" : { "value" : null } }, { "key" : 30.0, "doc_count" : 0, "salary-info" : { "value" : null } }, { "key" : 35.0, "doc_count" : 1, "salary-info" : { "value" : 20000.0 } } ] } } }
10.min_bucket
找到最小的那个
#min_bucket GET /index05/_search { "size": 0, "aggs": { "job-info": { "terms": { "field": "job.keyword" }, "aggs": { "jon-salary": { "avg": { "field": "salary" } } } }, "min_bulk_info":{ "min_bucket": { "buckets_path": "job-info>jon-salary" } } } }
11.聚合总结
bucket
、
metric:
可以使用avg的单值聚合,也可以是stats多值聚合
pipeline:
十三:Logstash
1.解压
unzip logstash-7.6.1.zip -d ../software/
2.修改配置
vi jvm.options
注释:
3.测试
bin/logstash -e 'input {stdin{}} output {stdout{}}'
4.入门案例
将-e后面的内容写入到配置文件中
bin/logstash -f conf/demo1
案例二
bin/logstash -f conf/demo2
input{ exec{ command => 'ls' interval => 30 } } output{ stdout{} }
5.原理
6.input插件-exec
在入门案例中已经说明
7.input插件-file
监控文件中的新事件,相当于tail
input{ file{ path => '/opt/software/logstash-7.6.1/conf/tomcat.log' } } output{ stdout{} }
8.input插件-jdbc
有两种方式,后续有时间,再进行验证效果
9.output插件-stdout
上面已经说过,可以加编码格式
10.output组件-es
input{ stdin{} } output{ elasticsearch{ action => "index" hosts => ["192.168.19.132:9200"] index => "index111" } }
效果:
#logstash GET /index111/_search { "query": { "match_all": {} } }
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "index111", "_type" : "_doc", "_id" : "LBJFg4IB0P-Q-DvAjHjt", "_score" : 1.0, "_source" : { "@timestamp" : "2022-08-09T15:42:29.192Z", "host" : "com.jun", "message" : "halou", "@version" : "1" } } ] } }
11.filter组件-grok插件
12.filter组件-grok插件-oniguruma语法
十四:kibana
1.可视化
含义: