ElasticSearch快速入门
一、简介#
Elasticsearch (简称“ES”)是分布式搜索和分析引擎。Logstash 和 Beats 将他们收集的数据存储到 ES。Kibana 提供可视化以及用户交互良好的方式来将ES的数据进行探索、监控还有可视化报表。
- ElasticSearch 数据仓库,存放数据的空间
- Logstash/Beats 仓库采购员/搬运工,收集和分类数据
- Kibana 仓库的管理员,把数据分析后再呈现
ES中的数据模型:文档(Document) 和 索引(Index)
ES将数据序列化为 JSON 格式的文档进行存储,索引是优化的文档集合,文档是字段(键值对)的集合。如果字段是文本数据类型(text),存储的数据结构是倒序索引,支持快速的全文搜索。而字段是数字类(numeric)和地理信息类(geo),结构是BKD树。
需要知道的是倒序索引,会列出每一个唯一的词,不管它在哪一个文档并且出现过几次,并标识该词出现的所有文档。
无模式(schema-less)
对文档写入的模式约束灵活,文档要存多少字段,以及字段类型可以不做约定。即使做了约定,还可以存储没有约定的字段。比如:
要存储图书的信息,事先约定了属性 id、name和price。但是你在写入时,可以写入 description 字段的数据。
想想看这在关系数据库是不允许的,而且存数据前一定要数据建模(schema),对写入有强约束。
ES的模式(schema)这里类似对应的是映射(mapping)
搜索和分析
- 搜索 REST API 结构化查询,本质上是JSON风格的查询用的特定领域语言(Query DSL)
- 分析 聚合查询对数据获取摘要,求平均数、中位数等等
可扩展性和弹性
ES 是分布式的搜索和分析引擎。多集群和多节点复制副本可以容灾,分区将同一份数据较为均匀分布在多个集群/节点上,防止某一节点/集群过载,随着需求量变化,始终可用。
二、安装 ElasticSearch#
安装前的准备
为了更好的操作ES,还要安装 Kibana。
安装前要装好 Docker
1.创建 network
docker network create elastic
2.创建目录 esdatadir
mkdir esdatadir mkdir esdatadir/config touch esdatadir/config/elasticsearch.yml mkdir esdatadir/data mkdir esdatadir/logs mkdir esdatadir/plugins # 设置读写权限 chmod -R 777 esdatadir
3.编辑elasticsearch.yml
http.host: 0.0.0.0 transport.host: 0.0.0.0 cluster.name: "docker-cluster" node.name: es01 http.cors.enabled: true http.cors.allow-origin: "*"
Docker 安装 ElasticSearch 7.17.1
1.拉镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.1
2.运行容器
容器名称取es01好了
cd esdatadir docker run -id --name es01 \ -p 9200:9200 \ -p 9300:9300 \ --net elastic \ -v $PWD/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v $PWD/data:/usr/share/elasticsearch/data \ -v $PWD/logs:/usr/share/elasticsearch/logs \ -v $PWD/plugins:/usr/share/elasticsearch/plugins \ -e "discovery.type=single-node" \ docker.elastic.co/elasticsearch/elasticsearch:7.17.1 cd ../
3.验证运行成功
大概等个10秒钟启动完成后
#curl -XGET http://ip:9200 curl -XGET "http://$(ifconfig enp0s3 | head -n2 | grep inet | awk '{print$2}'):9200"
结果大致:
{ "name" : "ea912245d40f", "cluster_name" : "docker-cluster", "cluster_uuid" : "VpQjM1qHQyup2DUxdJu0mQ", "version" : { "number" : "7.17.1", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "e5acb99f822233d62d6444ce45a4543dc1c8059a", "build_date" : "2022-02-23T22:20:54.153567231Z", "build_snapshot" : false, "lucene_version" : "8.11.1", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
安装 elasticsearch-analysis-ik
1.下载压缩包
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.1/elasticsearch-analysis-ik-7.17.1.zip
2.解压
mkdir ik unzip elasticsearch-analysis-ik-7.17.1.zip -d ik
3.复制到plugins目录
cp -r ik esdatadir/plugins/ik
4.验证复制成功
docker exec -it es01 ls plugins/ik -alh
有8个主要的条目,说明成功
5.重启容器
docker restart es01
Docker 安装 Kibana 7.17.1
1.拉镜像
docker pull docker.elastic.co/kibana/kibana:7.17.1
2.运行容器
docker run --name kib01 \ --net elastic \ -p 5601:5601 \ -e "ELASTICSEARCH_HOSTS=http://es01:9200" \ docker.elastic.co/kibana/kibana:7.17.1
按Ctrl+C
退出。
要再次运行只需, docker start kib01
3.访问 kibana
用浏览器访问http://{ip/host}:5601
即可。
三、搜索#
搜索可以通过 REST API 以及 Java Client 这两种方式。
前端 UI 组件可以通过调用 REST API 方式直接访问 ES。后端代码可以通过 Java Client 访问 ES,其本质通过 REST HTTP Client 调用。
3.1 REST API#
通过 Kibana 菜单路径 “Management” -> “Dev Tools” -> “Console” 找到调用 API 的面板。可以通过 “Help” 查找使用快捷键以及如何发送请求。
操作 Index
创建 Index 的请求,创建索引过程中,可以指定 Settings、字段的 Mappings 以及索引的别名
# 简单创建 PUT /my-index-000001 # 简单删除 DELETE /my-index-000001 # 建index(settings),static settings 不能 udpate PUT /my-index-000001 { "settings": { "number_of_shards": 3, "number_of_replicas": 2 } } # 查 settings GET /my-index-000001/_settings # 建index(mappings),常见类型有text、long等等 PUT /test { "mappings": { "properties": { "field1": { "type": "text" } } } } # 查 mapping GET /test/_mapping # 建index(aliases) PUT /logs { "aliases": { "<logs_{now/M}>": {} } } # 查 alias GET /logs/_alias
更新mapping
PUT /my-index-000001/_mapping { "properties": { "email": { "type": "keyword" } } }
查field的mapping
PUT /publications { "mappings": { "properties": { "id": { "type": "text" }, "title": { "type": "text" }, "abstract": { "type": "text" }, "author": { "properties": { "id": { "type": "text" }, "name": { "type": "text" } } } } } } GET /publications/_mapping/field/title
操作单Document
# 建docuemnt(自动生成ID) POST my-index-000001/_doc/ { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } } # 保存document(指定ID)1 PUT my-index-000001/_doc/1 { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } } # 建document(指定ID)2,index没有该ID文档才行 PUT my-index-000001/_create/2 { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } } # 建document(指定ID)3,index没有该ID文档才行 PUT my-index-000001/_doc/3?op_type=create { "@timestamp": "2099-11-15T13:12:00", "message": "GET /search HTTP/1.1 200 1070000", "user": { "id": "kimchy" } } # 只查出_source字段 GET my-index-000001/_source/1 # 查出整个文档 GET my-index-000001/_doc/1 # 更新文档 PUT test/_doc/1 { "counter" : 1, "tags" : ["red"] } ## counter += 4 POST test/_update/1 { "script" : { "source": "ctx._source.counter += params.count", "lang": "painless", "params" : { "count" : 4 } } } ## tags 新添元素 blue POST test/_update/1 { "script": { "source": "ctx._source.tags.add(params.tag)", "lang": "painless", "params": { "tag": "blue" } } } ## 条件删除tags一个元素 POST test/_update/1 { "script": { "source": "if (ctx._source.tags.contains(params.tag)) { ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag)) }", "lang": "painless", "params": { "tag": "blue" } } } # 新增字段 POST test/_update/1 { "script" : "ctx._source.new_field = 'value_of_new_field'" } # 新增字段且会识别无效果更新 POST test/_update/1 { "doc": { "name": "new_name" } } # 去除字段 POST test/_update/1 { "script" : "ctx._source.remove('new_field')" } # 去除对象类型字段中某一个嵌套字段 POST test/_update/1 { "script": "ctx._source['my-object'].remove('my-subfield')" } # 如果文档存在执行script,不存在执行upsert POST test/_update/1 { "script": { "source": "ctx._source.counter += params.count", "lang": "painless", "params": { "count": 4 } }, "upsert": { "counter": 1 } }
操作多Document
# 批量查询 GET /my-index-000001/_mget { "docs": [ { "_type": "_doc", "_id": "1" }, { "_type": "_doc", "_id": "2" } ] } GET /my-index-000001/_mget { "ids" : ["1", "2"] } # 批量不同的操作 POST _bulk { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_id" : "2" } } { "create" : { "_index" : "test", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field2" : "value2"} } # 批量删除指定查询的数据 POST my-index-000001/_delete_by_query?scroll_size=5000 { "query": { "term": { "user.id": "kimchy" } } } # 批量更新指定查询的数据 POST my-index-000001/_update_by_query { "script": { "source": "ctx._source.count++", "lang": "painless" }, "query": { "term": { "user.id": "kimchy" } } }
Search APIs
# 有分页,match all 搜索 GET /my-index-000001/_search?from=0&size=20 { "query": { "match_all": {} } } # 有分页,term搜索 GET /my-index-000001/_search?from=0&size=20 { "query": { "term": { "user.id": "kimchy" } } } # match搜索 GET /my-index-000001/_search { "query": { "match": { "user.id": { "query": "kimchy" } } } } # range搜索 GET /my-index-000001/_search { "query": { "range": { "@timestamp": { "gte": "now-1d/d" } } } } # 排序 GET /my-index-000001/_search { "query": { "match": { "user.id": { "query": "kimchy" } } }, "sort": { "_id": "desc" } } GET /my-index-000001/_search?sort=_id:desc { "query": { "match": { "user.id": { "query": "kimchy" } } } } # prefix搜索 GET /my-index-000001/_search { "query": { "prefix": { "user.id": { "value": "ki" } } } } # boolean 搜索 ## must:查询必须匹配 ## must_not:查询must补集 ## should:查询可以匹配,没有也没关系 ## filter:查询必须匹配,与must区别,它不记录score GET _search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "term": { "count": 2 } } } } }
3.2 Java Client#
初次使用 elasticSearch-java 7.17.1
引入Maven依赖
<dependencies> <dependency> <groupId>co.elastic.clients</groupId> <artifactId>elasticsearch-java</artifactId> <version>7.17.1</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.12.3</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.12.3</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> <version>2.12.3</version> </dependency> <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version>1.2</version> </dependency> <dependency> <groupId>jakarta.json</groupId> <artifactId>jakarta.json-api</artifactId> <version>2.0.1</version> </dependency> </dependencies>
编写应用代码,展示了Java客户端先连接ES,然后判断是否存在索引products,若不存在,创建索引。接着,逐步进行 term、match、match all 等一系列搜索。
public class ESNativeClient7Application { private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT); public static void main(String[] args) throws IOException, InterruptedException { // Create the low-level client try (RestClient restClient = RestClient.builder( new HttpHost("10.119.6.176", 9200)).build(); // Create the transport with a Jackson mapper ElasticsearchTransport transport = new RestClientTransport( restClient, new JacksonJsonpMapper())) { // And create the API client ElasticsearchClient client = new ElasticsearchClient(transport); // Create Index BooleanResponse resp = client.indices().exists(e -> e.index("products")); if (!resp.value()) { client.indices().create(c -> c .index("products") .mappings(m -> m .properties("name", Property.of(o -> o .text(t -> t .store(true) .index(true) .analyzer("ik_smart")) ) ) ).settings(s -> s .numberOfShards("3") .numberOfReplicas("2") ).aliases("<products{now/M}>", a -> a) ); client.index(c -> c .index("products") .id("1") .document(Product.builder().name("bicycle") .build())); } // Search SearchResponse<Product> search1 = client.search(s -> s .index("products") .query(q -> q .term(t -> t .field("name") .value(v -> v.stringValue("bicycle")) )), Product.class); for (Hit<Product> hit : search1.hits().hits()) { processProduct(hit.source()); } SearchResponse<Product> search2 = client.search(s -> s .index("products") .query(q -> q .match(m -> m .field("name") .query("bicycle") )), Product.class); for (Hit<Product> hit : search2.hits().hits()) { processProduct(hit.source()); } SearchResponse<Product> search3 = client.search(s -> s .index("products") .query(q -> q.matchAll(v -> v.queryName("name"))), Product.class); for (Hit<Product> hit : search3.hits().hits()) { processProduct(hit.source()); } SearchResponse<Product> search4 = client.search(s -> s .index("products") .query(q -> q .prefix(p -> p .field("name") .value("bi"))), Product.class); for (Hit<Product> hit : search4.hits().hits()) { processProduct(hit.source()); } SearchResponse<Product> search5 = client.search(s -> s .index("products") .query(q -> q .bool(b -> b .must(m -> m .matchAll(v -> v)) .filter(f -> f .term(t -> t .field("name") .value(v -> v.stringValue("bicycle")))))), Product.class); for (Hit<Product> hit : search5.hits().hits()) { processProduct(hit.source()); } TimeUnit.SECONDS.sleep(1); } } private static void processProduct(Product source) throws JsonProcessingException { String jsonStr = OBJECT_MAPPER.writeValueAsString(source); System.out.println(jsonStr); } }
用到了实体 Product
@Data @Builder @AllArgsConstructor @NoArgsConstructor @JsonIgnoreProperties(ignoreUnknown = true) public class Product { @JsonProperty("name") private String name; }
整合 Spring Boot
通过start.spring.io创建Spring Boot Maven 项目,版本选择2.6.4,JDK选择8,项目打包选择jar即可
引入依赖:
- spring-boot-starter-web
- spring-boot-configuration-processor
- spring-boot-starter-data-elasticsearch
- spring-boot-starter-test
- lombok
- joda-money 1.0.1
创建领域模型
@Data @Builder @AllArgsConstructor @NoArgsConstructor @Document(indexName = "products", writeTypeHint = WriteTypeHint.DEFAULT) public class Product { @Id private Long id; @Field(type = FieldType.Text, store = true, analyzer = "ik_smart") private String name; @Field(type = FieldType.Long, store = true) private Money price; }
@Document注解,配置索引的名称,以及@Field配置mapping
创建仓库
public interface ProductRepository extends ElasticsearchRepository<Product, Long> { Product findByName(String name); }
类似 JPA Repository 使用 ElasticsearchRepository,定义接口扩展它,通常根据业务需要自定义一些查询方法,命名规范与 spring data jpa一致
。一般find开头,跟着by后面是筛选条件的字段,多个字段用AND/OR连接,每个字段后面可以跟着操作,如:Like、In、GreaterThan等等。
创建服务
@Service @Slf4j public class ProductService { @Resource private ProductRepository productRepository; public Optional<Product> queryProductByName(String name) { Optional<Product> queriedProduct = Optional.ofNullable(productRepository.findByName(name)); queriedProduct.ifPresent(o -> { log.info("query product by repository: {}", o); }); return queriedProduct; } public void deleteAll() { productRepository.deleteAll(); log.info("index products deleted all"); } public void save(Product product) { productRepository.save(product); log.info("repository save Product: {}", product); } }
ProductService 根据仓库的存取行为进行业务代码编写,这里的业务较为简答
编写上下文配置
@SpringBootApplication @EnableElasticsearchRepositories public class ESSpringClientApplication { public static void main(String[] args) { SpringApplication app = new SpringApplicationBuilder() .sources(ESSpringClientApplication.class) .web(WebApplicationType.NONE) .build(); app.run(args); } @Bean public Jackson2ObjectMapperBuilderCustomizer customizer() { return builder -> builder.indentOutput(true); } @Bean public ElasticsearchCustomConversions elasticsearchCustomConversions() { return new ElasticsearchCustomConversions( Arrays.asList(new NumberToMoney(), new MoneyToNumber())); } @Bean CommandLineRunner run() { return new ClientRunner(); } }
编写 Money 类型的读写转换器
@WritingConverter public class MoneyToNumber implements Converter<Money, Number> { @Override public Number convert(Money source) { long value = source.getAmountMinorLong(); return value; } } @ReadingConverter public class NumberToMoney implements Converter<Number, Money> { @Override public Money convert(Number source) { return Money.ofMinor(CurrencyUnit.of("CNY"), source.longValue()); } }
创建Jackson2ObjectMapperBuilderCustomizer Bean来自定义启用ObjectMapper的缩进输出,为后面ClientRunner进行json输出。
创建ElasticsearchCustomConversions Bean 来注入 Money 类型的自定义转换器。Money会变为Number 存入ES。从ES读到Number转换为Money。
创建CommandLineRunner Bean,它会在项目启动后运行它定义的run()。
@Slf4j public class ClientRunner implements CommandLineRunner { @Resource private ElasticsearchRestTemplate elasticsearchRestTemplate; @Resource private ProductService productService; @Resource private ObjectMapper objectMapper; private static final String LINE_SEP = System.getProperty("line.separator"); private ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() - 1, Runtime.getRuntime().availableProcessors(), 1, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100)); private CountDownLatch cdl = new CountDownLatch(1); @Override public void run(String... args) throws Exception { productService.deleteAll(); // 准备数据 Product product = Product.builder() .id(1L) .name("Bicycle") .price(Money.ofMinor(CurrencyUnit.of("CNY"), 12000)) .build(); Product product2 = Product.builder() .id(2L) .name("Motorcycle") .price(Money.ofMinor(CurrencyUnit.of("CNY"), 300000)) .build(); poolExecutor.execute(() -> { // [1] productService.save(product); productService.queryProductByName("Bicycle"); // [2] saveProduct(product2); log.info("Product(id=2) exists: {}", elasticsearchRestTemplate.exists("2", Product.class)); Criteria criteria = new Criteria("name").is("Motorcycle"); CriteriaQuery criteriaQuery = new CriteriaQuery(criteria); for (SearchHit<Product> hit : elasticsearchRestTemplate.search(criteriaQuery, Product.class).getSearchHits()) { processProduct(hit.getContent()); } // [3] NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.matchAllQuery()) .withPageable(PageRequest.of(0, 20)) .withSorts(SortBuilders.fieldSort("price").order(SortOrder.ASC)) .build(); for (SearchHit<Product> hit : elasticsearchRestTemplate.search(nativeSearchQuery, Product.class).getSearchHits()) { processProduct(hit.getContent()); } cdl.countDown(); }); cdl.await(1, TimeUnit.MINUTES); System.exit(0); } private void saveProduct(Product product) { IndexQuery idxQuery = new IndexQueryBuilder() .withId(String.valueOf(product.getId())) .withObject(product) .build(); elasticsearchRestTemplate.index(idxQuery, IndexCoordinates.of("products")); log.info("template save Product: {}", product); try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException e) { log.error(e.getMessage()); return; } } private void processProduct(Product content) { try { log.info("query data by template:{}{}", LINE_SEP, objectMapper.writeValueAsString(content)); } catch (JsonProcessingException e) { log.error(e.getMessage()); return; } } }
重点关注CriteriaQuery和NativeSearchQuery,有前面的REST API使用,这里会很好理解
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)