Elasticsearch入门
1、什么是Elasticsearch?
Elasticsearch 是基于 Lucene 的 Restful 的分布式实时全文搜索引擎,每个字段都被索引并可被搜索,可以快速存储、搜索、分析海量的数据。
全文检索是指对每一个词建立一个索引,指明该词在文章中出现的次数和位置。当查询时,根据事先建立的索引进行查找,并将查找的结果反馈给用户的检索方式。这个过程类似于通过字典中的检索字表查字的过程。
2、ES使用场景
(1) 搜索引擎:用于快速检索文档、商品、新闻等。
(2) 日志分析:通过分析日志数据,帮助企业了解业务的性能情况。
(3) 实时监控:帮助企业实时检测系统性能、监控数据变化,以保证数据正常运行。
(4) 数据分析:帮助数据科学家和数据分析师进行数据分析,以获取有价值的信息。
(5) 商业智能:帮助企业制定数据驱动的决策,实现商业的成功。
(6) 安全性:帮助企业保证数据的安全性,保证数据不被非法窃取。
(7) 应用程序开发:帮助开发人员开发基于搜索的应用程序,以增加用户体验。
3、ES和数据库的对比
映射(Mapping):类比于MySQL中schema和数据库的设计(比如字段类型,长度…),ES中通过mapping定义哪些字段是否可以分词操作,哪些字段是否可以被查询等。
分片(Shards):类比于MySQL的水平分表,作用是容量扩容,提高吞吐量。
副本(Replicas):分片数据的副本,保障数据安全。
分配(allocation):将分片分给某个节点的过程(包括主分片和副本),有master节点完成。
4、倒排索引
倒排索引和传统的索引(正向索引)的结构相反,其优点在于它可以快速定位包含关键字的文档,而且支持复杂的搜索操作,如词组搜索、通配符搜索等,广泛应用于搜索引擎、日志分析、推荐系统等。
5、能说说Es写索引的逻辑吗?
ElasticSearch 是集群的 = 主分片 + 副本分片
写索引只能写主分片,然后主分片同步到副本分片上。但主分片不是固定的,可能网络原因,之前
还是 Node1 是主分片,后来就变成了 Node2 经过选举成了主分片了。
客户端如何知道哪个是主分片呢? 看下面过程。
(1) 客户端向某个节点 NodeX 发送写请求
(2) NodeX 通过文档信息,请求会转发到主分片的节点上
(3) 主分片处理完,通知到副本分片同步数据,向 Nodex 发送成功信息
(4) Nodex 将处理结果返回给客户端
6、Es查询数据的流程?
搜索被执行成一个两阶段过程,即 Query Then Fetch:
1、Query阶段:
客户端发送请求到 coordinate node,协调节点将搜索请求广播到所有的 primary shard 或 replica,每个分片在本地执行搜索并构建一个匹配文档的大小为 from + size 的优先队列。接着每个分片返回各自优先队列中 所有 docId 和 打分值 给协调节点,由协调节点进行数据的合并、排序、分页等操作,产出最终结果。
2、Fetch阶段:
协调节点根据 Query阶段产生的结果,去各个节点上查询 docId 实际的 document 内容,最后由协调节点返回结果给客户端。
7、Es如果保证读写一致?
(1) 可以通过版本号使用乐观并发控制,以确保新版本不会被旧版本覆盖,由应用层来处理具体的冲突;
(2) 另外对于写操作,一致性级别支持 quorum/one/all,默认为 quorum,即只有当大多数分片可用时才允许写操作。但即使大多数可用,也可能存在因为网络等原因导致写入副本失败,这样该副本被认为故障,分片将会在一个不同的节点上重建。
(3) 对于读操作,可以设置 replication 为 sync(默认),这使得操作在主分片和副本分片都完成后才会返回;如果设置 replication 为 async 时,也可以通过设置搜索请求参数_preference 为 primary 来查询主分片,确保文档是最新版本。
8、document的路由原理
每一条document写入到ElasticSearch中的分片中,具体写入到哪一个分片是通过如下的公式来来实现的:
9、Elasticsearch 更新和删除文档的流程?
删除和更新都是写操作,但是由于 Elasticsearch 中的文档是不可变的,因此不能被删除或者改动以展示其变更;所以 ES 利用 .del 文件 标记文档是否被删除,磁盘上的每个段都有一个相应的.del 文件
(1) 如果是删除操作,文档其实并没有真的被删除,而是在 .del 文件中被标记为 deleted 状态。该文档依然能匹配查询,但是会在结果中被过滤掉。
(2)如果是更新操作,就是将旧的 doc 标识为 deleted 状态,然后创建一个新的 doc。
memory buffer 每 refresh 一次,就会产生一个 segment 文件 ,所以默认情况下是 1s 生成一个 segment 文件,这样下来 segment 文件会越来越多,此时会定期执行 merge。每次 merge 的时候,会将多个 segment 文件合并成一个,同时这里会将标识为 deleted 的 doc 给物理删除掉,不写入到新的 segment 中,然后将新的 segment 文件写入磁盘,这里会写一个 commit point ,标识所有新的 segment 文件,然后打开 segment 文件供搜索使用,同时删除旧的 segment 文件。
10、Es的分布式原理?
Elasticsearch 会对存储的数据进行切分,划分到不同的分片上,同时每一个分片会生成多个副本,从而保证分布式环境的高可用。ES集群中的节点是对等的,节点间会选出集群的 Master,由 Master 会负责维护集群状态信息,并同步给其他节点。
Es的性能会不会很低:不会,ES只有建立 index 和 type 时需要经过 Master,而数据的写入有一个简单的 Routing 规则,可以路由到集群中的任意节点,所以数据写入压力是分散在整个集群的。
11、query 和 filter 的区别?
(1) query:查询操作不仅仅会进行查询,还会计算分值,用于确定相关度;
(2) filter:查询操作仅判断是否满足查询条件,不会计算任何分值,也不会关心返回的排序问题,同时,filter 查询的结果可以被缓存,提高性能。
12、ELK架构
ELK架构分为两种,一种是经典的ELK,另外一种是加上消息队列(Redis或Kafka或RabbitMQ)和Nginx结构。
经典的ELK 数据量小的开发环境,存在数据丢失的危险
经典的ELK主要是由Filebeat + Logstash + Elasticsearch + Kibana组成,如下图:(早期的ELK只有Logstash + Elasticsearch + Kibana)
整合消息队列+Nginx架构 生产环境,可以处理大数据量,并且不会丢失数据
这种架构,主要加上了Redis或Kafka或RabbitMQ做消息队列,保证了消息的不丢失
13、Java API操作Es
pom依赖
1 <dependency> 2 <groupId>org.elasticsearch</groupId> 3 <artifactId>elasticsearch</artifactId> 4 <version>7.8.0</version> 5 </dependency> 6 <dependency> 7 <groupId>org.elasticsearch.client</groupId> 8 <artifactId>elasticsearch-rest-high-level-client</artifactId> 9 <version>7.8.0</version> 10 </dependency>
原生java操作Es的代码:
1 import org.apache.http.HttpHost; 2 import org.apache.lucene.index.Term; 3 import org.elasticsearch.action.DocWriteResponse; 4 import org.elasticsearch.action.bulk.BulkItemResponse; 5 import org.elasticsearch.action.bulk.BulkRequest; 6 import org.elasticsearch.action.bulk.BulkResponse; 7 import org.elasticsearch.action.delete.DeleteRequest; 8 import org.elasticsearch.action.delete.DeleteResponse; 9 import org.elasticsearch.action.get.GetRequest; 10 import org.elasticsearch.action.get.GetResponse; 11 import org.elasticsearch.action.index.IndexRequest; 12 import org.elasticsearch.action.index.IndexResponse; 13 import org.elasticsearch.action.search.SearchRequest; 14 import org.elasticsearch.action.search.SearchResponse; 15 import org.elasticsearch.action.support.master.AcknowledgedResponse; 16 import org.elasticsearch.action.update.UpdateRequest; 17 import org.elasticsearch.action.update.UpdateResponse; 18 import org.elasticsearch.client.RequestOptions; 19 import org.elasticsearch.client.RestClient; 20 import org.elasticsearch.client.RestHighLevelClient; 21 import org.elasticsearch.client.indices.PutMappingRequest; 22 import org.elasticsearch.common.xcontent.XContentFactory; 23 import org.elasticsearch.common.xcontent.XContentType; 24 import org.elasticsearch.index.query.BoolQueryBuilder; 25 import org.elasticsearch.index.query.QueryBuilders; 26 import org.elasticsearch.index.query.RangeQueryBuilder; 27 import org.elasticsearch.search.SearchHit; 28 import org.elasticsearch.search.aggregations.Aggregation; 29 import org.elasticsearch.search.aggregations.AggregationBuilders; 30 import org.elasticsearch.search.aggregations.Aggregations; 31 import org.elasticsearch.search.aggregations.bucket.terms.Terms; 32 import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder; 33 import org.elasticsearch.search.aggregations.metrics.Avg; 34 import org.elasticsearch.search.builder.SearchSourceBuilder; 35 import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder; 36 import org.elasticsearch.search.sort.SortOrder; 37 38 import java.io.IOException; 39 import java.util.HashMap; 40 import java.util.List; 41 import java.util.Map; 42 43 public class Student { 44 static RestHighLevelClient client; 45 46 public static void main(String[] args) throws IOException { 47 connect(); 48 System.out.println(client); 49 // crud 50 // indexStudent(); 51 // getStudent(); 52 // deleteStudent(); 53 // updateStudent(); 54 // search(); 55 // matchAll(); 56 // System.out.println("~~~~"); 57 // match(); 58 // System.out.println("~~~~"); 59 // matchPhrase(); 60 // boolSearch(); 61 // search1(); 62 // groupby(); 63 // bulkApi(); 64 // putMapping(); 65 // avgAggs(); 66 close(); 67 68 } 69 70 /** 71 * bulk api 72 * 单条操作 73 * put /index1/type1/1 74 * { 75 * "name":"nx" 76 * } 77 * delete /index1/type1/1 78 * <p> 79 * 批量操作 80 * post /index1/type1/_bulk 81 * {"index":{"_id":"1"}} 82 * {"name":"nx"} 83 * {"index":{"_id":"2"}} 84 * {"name":"nx1"} 85 */ 86 private static void bulkApi() throws IOException { 87 BulkRequest request = new BulkRequest(); 88 request.add(new IndexRequest("company", "employee", "1") 89 .source("name", "Jack", "age", 38, "salary", 21000, "team", "a")) 90 .add(new IndexRequest("company", "employee", "2") 91 .source("name", "Smith", 92 "age", 36, 93 "salary", 18000, 94 "team", "a")) 95 .add(new IndexRequest("company", "employee", "3") 96 .source("name", "Kon", 97 "age", 29, 98 "salary", 17000, 99 "team", "a")) 100 .add(new IndexRequest("company", "employee", "4") 101 .source("name", "Mark", 102 "age", 42, 103 "salary", 30000, 104 "team", "b")) 105 .add(new IndexRequest("company", "employee", "5") 106 .source("name", "Lin", 107 "age", 37, 108 "salary", 28000, 109 "team", "b")) 110 .add(new IndexRequest("company", "employee", "6") 111 .source("name", "Whon", 112 "age", 29, 113 "salary", 15000, 114 "team", "b")); 115 BulkResponse responses = client.bulk(request, RequestOptions.DEFAULT); 116 BulkItemResponse[] items = responses.getItems(); 117 for (BulkItemResponse response : items) { 118 System.out.println(response); 119 } 120 } 121 122 /** 123 * 修改我们的分组字段 124 * fielddata = true 125 */ 126 private static void putMapping() throws IOException { 127 PutMappingRequest request = new PutMappingRequest("company"); 128 request.source( 129 XContentFactory.jsonBuilder() 130 .startObject() 131 .startObject("properties") 132 .startObject("team") 133 .field("type", "text") 134 .field("fielddata", true) 135 .endObject() 136 .endObject() 137 .endObject() 138 ); 139 AcknowledgedResponse resp = client.indices().putMapping(request, RequestOptions.DEFAULT); 140 System.out.println(resp.isAcknowledged()); 141 142 } 143 144 /** 145 * 聚合 分析 avg 146 */ 147 private static void avgAggs() throws IOException { 148 SearchRequest request = new SearchRequest("company"); 149 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 150 sourceBuilder.query(QueryBuilders.rangeQuery("age").gte(35)) 151 .size(0) 152 .aggregation( 153 AggregationBuilders 154 .terms("group_by_team") 155 .field("team") 156 .subAggregation(AggregationBuilders.avg("avg_salary").field("salary")) 157 ); 158 request.source(sourceBuilder); 159 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 160 Terms group_by_team = response.getAggregations().get("group_by_team"); 161 List<? extends Terms.Bucket> buckets = group_by_team.getBuckets(); 162 for (Terms.Bucket bucket : buckets) { 163 Avg avg_salary = bucket.getAggregations().get("avg_salary"); 164 System.out.println("组名:" + bucket.getKey() + " , 个数:" + bucket.getDocCount() 165 + " , 平均值 : " + avg_salary.getValue()); 166 } 167 } 168 169 /** 170 * 聚合 分组, 171 */ 172 private static void groupby() throws IOException { 173 SearchRequest request = new SearchRequest("person"); 174 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 175 sourceBuilder.query(QueryBuilders.matchAllQuery()) 176 .size(0) 177 .aggregation(AggregationBuilders.terms("group_by_age").field("age")); 178 request.source(sourceBuilder); 179 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 180 Terms group_by_age = response.getAggregations().get("group_by_age"); 181 List<? extends Terms.Bucket> buckets = group_by_age.getBuckets(); 182 for (Terms.Bucket bucket : buckets) { 183 System.out.println("组名: " + bucket.getKey() + " , 个数:" + bucket.getDocCount()); 184 } 185 } 186 187 /** 188 * sort \ from \ size \ _source \ highlight 189 */ 190 private static void search1() throws IOException { 191 SearchRequest request = new SearchRequest("person"); 192 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 193 sourceBuilder.query(QueryBuilders.matchQuery("like", "riding")) 194 .sort("age", SortOrder.DESC) 195 .from(0) 196 .size(2) 197 .fetchSource(new String[]{"name", "age", "address"}, new String[0]) 198 .highlighter(new HighlightBuilder().field("like")); 199 request.source(sourceBuilder); 200 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 201 SearchHit[] hits = response.getHits().getHits(); 202 for (SearchHit hit : hits) { 203 System.out.println(hit.getSourceAsString()); 204 System.out.println("~~~~"); 205 System.out.println(hit.getHighlightFields().get("like").toString()); 206 } 207 } 208 209 /** 210 * 多条件的搜索 211 */ 212 private static void boolSearch() throws IOException { 213 SearchRequest request = new SearchRequest("person"); 214 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 215 sourceBuilder.query(QueryBuilders.boolQuery() 216 .must(QueryBuilders.matchQuery("address", "beijing")) 217 .mustNot(QueryBuilders.matchQuery("like", "swimming")) 218 .should(QueryBuilders.matchQuery("age", 18)) 219 .should(QueryBuilders.matchQuery("age", 19)) 220 .should(QueryBuilders.matchQuery("age", 20)) 221 .filter(QueryBuilders.rangeQuery("age").gt(19).lte(20)) 222 ); 223 request.source(sourceBuilder); 224 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 225 SearchHit[] hits = response.getHits().getHits(); 226 for (SearchHit hit : hits) { 227 System.out.println(hit.getSourceAsString()); 228 } 229 } 230 231 /** 232 * 最基础的搜索 233 */ 234 private static void search() throws IOException { 235 SearchRequest request = new SearchRequest("person"); 236 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 237 SearchHit[] hits = response.getHits().getHits(); 238 for (SearchHit hit : hits) { 239 System.out.println(hit.getSourceAsString()); 240 } 241 } 242 243 /** 244 * match_all 245 */ 246 private static void matchAll() throws IOException { 247 SearchRequest request = new SearchRequest("person"); 248 request.source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery())); 249 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 250 SearchHit[] hits = response.getHits().getHits(); 251 for (SearchHit hit : hits) { 252 System.out.println(hit.getSourceAsString()); 253 } 254 } 255 256 /** 257 * match 258 */ 259 private static void match() throws IOException { 260 SearchRequest request = new SearchRequest("person"); 261 request.source(new SearchSourceBuilder().query(QueryBuilders.matchQuery("name", "Smith"))); 262 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 263 SearchHit[] hits = response.getHits().getHits(); 264 for (SearchHit hit : hits) { 265 System.out.println(hit.getSourceAsString()); 266 } 267 } 268 269 /** 270 * match_phrase 271 */ 272 private static void matchPhrase() throws IOException { 273 SearchRequest request = new SearchRequest("person"); 274 request.source(new SearchSourceBuilder().query(QueryBuilders.matchPhraseQuery("like", "hiking basketball"))); 275 SearchResponse response = client.search(request, RequestOptions.DEFAULT); 276 SearchHit[] hits = response.getHits().getHits(); 277 for (SearchHit hit : hits) { 278 System.out.println(hit.getSourceAsString()); 279 } 280 } 281 282 /** 283 * 查询 284 */ 285 private static void getStudent() throws IOException { 286 GetRequest request = new GetRequest("person", "student", "1"); 287 GetResponse response = client.get(request, RequestOptions.DEFAULT); 288 System.out.println(response.toString()); 289 Map<String, Object> map = response.getSource(); 290 System.out.println(map.get("name") + " , " + map.get("age") + " , " + map.get("address") + " , " + map.get("like")); 291 } 292 293 /** 294 * 删除 295 */ 296 private static void deleteStudent() throws IOException { 297 DeleteRequest req = new DeleteRequest("person", "student", "1"); 298 DeleteResponse resp = client.delete(req, RequestOptions.DEFAULT); 299 System.out.println(resp.toString()); 300 } 301 302 /** 303 * 更新 304 */ 305 private static void updateStudent() throws IOException { 306 //put 覆盖更新 307 // IndexRequest request = new IndexRequest("person", "student", "1"); 308 // request.source("name","Smith", "age",18, "address","beijing changping","like","pingpang football basketball"); 309 // client.index(request,RequestOptions.DEFAULT); 310 // 局部更新 311 UpdateRequest request = new UpdateRequest("person", "student", "1"); 312 request.doc("name", "Smith"); 313 UpdateResponse re = client.update(request, RequestOptions.DEFAULT); 314 System.out.println(re.toString()); 315 } 316 317 318 /** 319 * PUT /person/student/1 320 * { 321 * "name":"Smth", 322 * "age":18, 323 * "address":"beijing changping" 324 * "like":"pingpang football basketball" 325 * } 326 */ 327 private static void indexStudent() throws IOException { 328 // doc 1 kv string 329 IndexRequest request = new IndexRequest("person", "student", "1"); 330 request.source("name", "Smth", "age", 18, "address", "beijing changping", "like", "pingpang football basketball"); 331 // //异步 332 // client.indexAsync(request, RequestOptions.DEFAULT, new ActionListener<IndexResponse>() { 333 // public void onResponse(IndexResponse response) { 334 // System.out.println(response.toString()); 335 // } 336 // public void onFailure(Exception e) { 337 // 338 // } 339 // }); 340 //同步 341 IndexResponse response = client.index(request, RequestOptions.DEFAULT); 342 System.out.println(response.toString()); 343 // doc 2 xcontentFactory 344 IndexRequest request2 = new IndexRequest("person", "student", "2"); 345 request2.source( 346 XContentFactory.jsonBuilder() 347 .startObject() 348 .field("name", "Lucy") 349 .field("age", 19) 350 .field("address", "beijing haidian") 351 .field("like", "swimming running hiking") 352 .endObject() 353 ); 354 IndexResponse response2 = client.index(request2, RequestOptions.DEFAULT); 355 System.out.println(response2.toString()); 356 357 //doc 3 map 358 359 IndexRequest request3 = new IndexRequest("person", "student", "3"); 360 Map<String, Object> map = new HashMap<String, Object>(); 361 map.put("name", "Jack"); 362 map.put("age", 20); 363 map.put("address", "beijing chaoyang"); 364 map.put("like", "riding hiking basketball"); 365 request3.source(map); 366 IndexResponse response3 = client.index(request3, RequestOptions.DEFAULT); 367 System.out.println(response3.toString()); 368 // { 369 // "name":"007", 370 // "age":20, 371 // "address":"hebei shijiazhuang", 372 // "like":"football swimming" 373 // } 374 375 // doc 4 jsonString XcontentType.JSON 376 IndexRequest request4 = new IndexRequest("person", "student", "4"); 377 String jsonString = "{\"name\":\"007\",\"age\":20,\"address\":\"hebei shijiazhuang\",\"like\":\"football swimming\"}"; 378 request4.source(jsonString, XContentType.JSON); 379 IndexResponse response4 = client.index(request4, RequestOptions.DEFAULT); 380 System.out.println(response4.toString()); 381 //doc 5 doc 6 382 IndexRequest request5 = new IndexRequest("person", "student", "5"); 383 System.out.println(client.index( 384 request5.source( 385 XContentFactory.jsonBuilder() 386 .startObject() 387 .field("name", "008") 388 .field("age", 19) 389 .field("address", "hebei baoding") 390 .field("like", "riding pingpang") 391 .endObject() 392 ), RequestOptions.DEFAULT).toString()); 393 IndexRequest request6 = new IndexRequest("person", "student", "6"); 394 System.out.println(client.index( 395 request6.source( 396 XContentFactory.jsonBuilder() 397 .startObject() 398 .field("name", "009") 399 .field("age", 21) 400 .field("address", "hebei langfang") 401 .field("like", "basketball riding hiking") 402 .endObject() 403 ), RequestOptions.DEFAULT).toString()); 404 405 406 } 407 408 /** 409 * 创建client 410 */ 411 private static void connect() { 412 client = new RestHighLevelClient( 413 RestClient.builder( 414 new HttpHost("localhost", 9200, "http") 415 // new HttpHost("node2", 9200, "http"), 416 // new HttpHost("node3", 9200, "http") 417 ) 418 ); 419 } 420 421 /** 422 * close client 423 */ 424 private static void close() { 425 if (client != null) { 426 try { 427 client.close(); 428 } catch (IOException e) { 429 e.printStackTrace(); 430 } 431 } 432 } 433 }
参考: