ElasticSearch快速入门

一、简介#

Elasticsearch (简称“ES”)是分布式搜索和分析引擎。Logstash 和 Beats 将他们收集的数据存储到 ES。Kibana 提供可视化以及用户交互良好的方式来将ES的数据进行探索、监控还有可视化报表。

  • ElasticSearch 数据仓库,存放数据的空间
  • Logstash/Beats 仓库采购员/搬运工,收集和分类数据
  • Kibana 仓库的管理员,把数据分析后再呈现

ES中的数据模型:文档(Document) 和 索引(Index)

ES将数据序列化为 JSON 格式的文档进行存储,索引是优化的文档集合,文档是字段(键值对)的集合。如果字段是文本数据类型(text),存储的数据结构是倒序索引,支持快速的全文搜索。而字段是数字类(numeric)和地理信息类(geo),结构是BKD树。
需要知道的是倒序索引,会列出每一个唯一的词,不管它在哪一个文档并且出现过几次,并标识该词出现的所有文档。

无模式(schema-less)

对文档写入的模式约束灵活,文档要存多少字段,以及字段类型可以不做约定。即使做了约定,还可以存储没有约定的字段。比如:
要存储图书的信息,事先约定了属性 id、name和price。但是你在写入时,可以写入 description 字段的数据。
想想看这在关系数据库是不允许的,而且存数据前一定要数据建模(schema),对写入有强约束。

ES的模式(schema)这里类似对应的是映射(mapping)

搜索和分析

  • 搜索 REST API 结构化查询,本质上是JSON风格的查询用的特定领域语言(Query DSL)
  • 分析 聚合查询对数据获取摘要,求平均数、中位数等等

可扩展性和弹性

ES 是分布式的搜索和分析引擎。多集群和多节点复制副本可以容灾,分区将同一份数据较为均匀分布在多个集群/节点上,防止某一节点/集群过载,随着需求量变化,始终可用。

二、安装 ElasticSearch#

安装前的准备

为了更好的操作ES,还要安装 Kibana。

安装前要装好 Docker

1.创建 network

docker network create elastic

2.创建目录 esdatadir

mkdir esdatadir
mkdir esdatadir/config
touch esdatadir/config/elasticsearch.yml
mkdir esdatadir/data
mkdir esdatadir/logs
mkdir esdatadir/plugins
# 设置读写权限
chmod -R 777 esdatadir

3.编辑elasticsearch.yml

http.host: 0.0.0.0
transport.host: 0.0.0.0
cluster.name: "docker-cluster"
node.name: es01
http.cors.enabled: true
http.cors.allow-origin: "*"

Docker 安装 ElasticSearch 7.17.1

1.拉镜像

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.1

2.运行容器
容器名称取es01好了

cd esdatadir
docker run -id --name es01 \
-p 9200:9200 \
-p 9300:9300 \
--net elastic \
-v $PWD/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v $PWD/data:/usr/share/elasticsearch/data \
-v $PWD/logs:/usr/share/elasticsearch/logs \
-v $PWD/plugins:/usr/share/elasticsearch/plugins \
-e "discovery.type=single-node" \
docker.elastic.co/elasticsearch/elasticsearch:7.17.1
cd ../

3.验证运行成功
大概等个10秒钟启动完成后

#curl -XGET http://ip:9200
curl -XGET "http://$(ifconfig enp0s3 | head -n2 | grep inet | awk '{print$2}'):9200"

结果大致:

{
"name" : "ea912245d40f",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "VpQjM1qHQyup2DUxdJu0mQ",
"version" : {
"number" : "7.17.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "e5acb99f822233d62d6444ce45a4543dc1c8059a",
"build_date" : "2022-02-23T22:20:54.153567231Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

安装 elasticsearch-analysis-ik

1.下载压缩包

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.1/elasticsearch-analysis-ik-7.17.1.zip

2.解压

mkdir ik
unzip elasticsearch-analysis-ik-7.17.1.zip -d ik

3.复制到plugins目录

cp -r ik esdatadir/plugins/ik

4.验证复制成功

docker exec -it es01 ls plugins/ik -alh

有8个主要的条目,说明成功
5.重启容器

docker restart es01

Docker 安装 Kibana 7.17.1

1.拉镜像

docker pull docker.elastic.co/kibana/kibana:7.17.1

2.运行容器

docker run --name kib01 \
--net elastic \
-p 5601:5601 \
-e "ELASTICSEARCH_HOSTS=http://es01:9200" \
docker.elastic.co/kibana/kibana:7.17.1

Ctrl+C退出。
要再次运行只需, docker start kib01

3.访问 kibana
用浏览器访问http://{ip/host}:5601即可。

三、搜索#

搜索可以通过 REST API 以及 Java Client 这两种方式。
前端 UI 组件可以通过调用 REST API 方式直接访问 ES。后端代码可以通过 Java Client 访问 ES,其本质通过 REST HTTP Client 调用。

3.1 REST API#

通过 Kibana 菜单路径 “Management” -> “Dev Tools” -> “Console” 找到调用 API 的面板。可以通过 “Help” 查找使用快捷键以及如何发送请求。

操作 Index

创建 Index 的请求,创建索引过程中,可以指定 Settings、字段的 Mappings 以及索引的别名

# 简单创建
PUT /my-index-000001
# 简单删除
DELETE /my-index-000001
# 建index(settings),static settings 不能 udpate
PUT /my-index-000001
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
# 查 settings
GET /my-index-000001/_settings
# 建index(mappings),常见类型有text、long等等
PUT /test
{
"mappings": {
"properties": {
"field1": {
"type": "text"
}
}
}
}
# 查 mapping
GET /test/_mapping
# 建index(aliases)
PUT /logs
{
"aliases": {
"<logs_{now/M}>": {}
}
}
# 查 alias
GET /logs/_alias

更新mapping

PUT /my-index-000001/_mapping
{
"properties": {
"email": {
"type": "keyword"
}
}
}

查field的mapping

PUT /publications
{
"mappings": {
"properties": {
"id": { "type": "text" },
"title": { "type": "text" },
"abstract": { "type": "text" },
"author": {
"properties": {
"id": { "type": "text" },
"name": { "type": "text" }
}
}
}
}
}
GET /publications/_mapping/field/title

操作单Document

# 建docuemnt(自动生成ID)
POST my-index-000001/_doc/
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 保存document(指定ID)1
PUT my-index-000001/_doc/1
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 建document(指定ID)2,index没有该ID文档才行
PUT my-index-000001/_create/2
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 建document(指定ID)3,index没有该ID文档才行
PUT my-index-000001/_doc/3?op_type=create
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 只查出_source字段
GET my-index-000001/_source/1
# 查出整个文档
GET my-index-000001/_doc/1
# 更新文档
PUT test/_doc/1
{
"counter" : 1,
"tags" : ["red"]
}
## counter += 4
POST test/_update/1
{
"script" : {
"source": "ctx._source.counter += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}
## tags 新添元素 blue
POST test/_update/1
{
"script": {
"source": "ctx._source.tags.add(params.tag)",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
## 条件删除tags一个元素
POST test/_update/1
{
"script": {
"source": "if (ctx._source.tags.contains(params.tag)) { ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag)) }",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
# 新增字段
POST test/_update/1
{
"script" : "ctx._source.new_field = 'value_of_new_field'"
}
# 新增字段且会识别无效果更新
POST test/_update/1
{
"doc": {
"name": "new_name"
}
}
# 去除字段
POST test/_update/1
{
"script" : "ctx._source.remove('new_field')"
}
# 去除对象类型字段中某一个嵌套字段
POST test/_update/1
{
"script": "ctx._source['my-object'].remove('my-subfield')"
}
# 如果文档存在执行script,不存在执行upsert
POST test/_update/1
{
"script": {
"source": "ctx._source.counter += params.count",
"lang": "painless",
"params": {
"count": 4
}
},
"upsert": {
"counter": 1
}
}

操作多Document

# 批量查询
GET /my-index-000001/_mget
{
"docs": [
{
"_type": "_doc",
"_id": "1"
},
{
"_type": "_doc",
"_id": "2"
}
]
}
GET /my-index-000001/_mget
{
"ids" : ["1", "2"]
}
# 批量不同的操作
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
# 批量删除指定查询的数据
POST my-index-000001/_delete_by_query?scroll_size=5000
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
# 批量更新指定查询的数据
POST my-index-000001/_update_by_query
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}

Search APIs

# 有分页,match all 搜索
GET /my-index-000001/_search?from=0&size=20
{
"query": {
"match_all": {}
}
}
# 有分页,term搜索
GET /my-index-000001/_search?from=0&size=20
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
# match搜索
GET /my-index-000001/_search
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
}
}
# range搜索
GET /my-index-000001/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d"
}
}
}
}
# 排序
GET /my-index-000001/_search
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
},
"sort": {
"_id": "desc"
}
}
GET /my-index-000001/_search?sort=_id:desc
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
}
}
# prefix搜索
GET /my-index-000001/_search
{
"query": {
"prefix": {
"user.id": {
"value": "ki"
}
}
}
}
# boolean 搜索
## must:查询必须匹配
## must_not:查询must补集
## should:查询可以匹配,没有也没关系
## filter:查询必须匹配,与must区别,它不记录score
GET _search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"count": 2
}
}
}
}
}

3.2 Java Client#

初次使用 elasticSearch-java 7.17.1

引入Maven依赖

<dependencies>
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>7.17.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>jakarta.json</groupId>
<artifactId>jakarta.json-api</artifactId>
<version>2.0.1</version>
</dependency>
</dependencies>

编写应用代码,展示了Java客户端先连接ES,然后判断是否存在索引products,若不存在,创建索引。接着,逐步进行 term、match、match all 等一系列搜索。

public class ESNativeClient7Application {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT);
public static void main(String[] args) throws IOException, InterruptedException {
// Create the low-level client
try (RestClient restClient = RestClient.builder(
new HttpHost("10.119.6.176", 9200)).build();
// Create the transport with a Jackson mapper
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper())) {
// And create the API client
ElasticsearchClient client = new ElasticsearchClient(transport);
// Create Index
BooleanResponse resp = client.indices().exists(e -> e.index("products"));
if (!resp.value()) {
client.indices().create(c -> c
.index("products")
.mappings(m -> m
.properties("name", Property.of(o -> o
.text(t -> t
.store(true)
.index(true)
.analyzer("ik_smart"))
)
)
).settings(s -> s
.numberOfShards("3")
.numberOfReplicas("2")
).aliases("<products{now/M}>", a -> a)
);
client.index(c -> c
.index("products")
.id("1")
.document(Product.builder().name("bicycle")
.build()));
}
// Search
SearchResponse<Product> search1 = client.search(s -> s
.index("products")
.query(q -> q
.term(t -> t
.field("name")
.value(v -> v.stringValue("bicycle"))
)),
Product.class);
for (Hit<Product> hit : search1.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search2 = client.search(s -> s
.index("products")
.query(q -> q
.match(m -> m
.field("name")
.query("bicycle")
)),
Product.class);
for (Hit<Product> hit : search2.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search3 = client.search(s -> s
.index("products")
.query(q -> q.matchAll(v -> v.queryName("name"))),
Product.class);
for (Hit<Product> hit : search3.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search4 = client.search(s -> s
.index("products")
.query(q -> q
.prefix(p -> p
.field("name")
.value("bi"))),
Product.class);
for (Hit<Product> hit : search4.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search5 = client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(m -> m
.matchAll(v -> v))
.filter(f -> f
.term(t -> t
.field("name")
.value(v -> v.stringValue("bicycle")))))),
Product.class);
for (Hit<Product> hit : search5.hits().hits()) {
processProduct(hit.source());
}
TimeUnit.SECONDS.sleep(1);
}
}
private static void processProduct(Product source) throws JsonProcessingException {
String jsonStr = OBJECT_MAPPER.writeValueAsString(source);
System.out.println(jsonStr);
}
}

用到了实体 Product

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@JsonIgnoreProperties(ignoreUnknown = true)
public class Product {
@JsonProperty("name")
private String name;
}

整合 Spring Boot

通过start.spring.io创建Spring Boot Maven 项目,版本选择2.6.4,JDK选择8,项目打包选择jar即可
引入依赖:

  • spring-boot-starter-web
  • spring-boot-configuration-processor
  • spring-boot-starter-data-elasticsearch
  • spring-boot-starter-test
  • lombok
  • joda-money 1.0.1

创建领域模型

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document(indexName = "products", writeTypeHint = WriteTypeHint.DEFAULT)
public class Product {
@Id
private Long id;
@Field(type = FieldType.Text, store = true, analyzer = "ik_smart")
private String name;
@Field(type = FieldType.Long, store = true)
private Money price;
}

@Document注解,配置索引的名称,以及@Field配置mapping

创建仓库

public interface ProductRepository extends ElasticsearchRepository<Product, Long> {
Product findByName(String name);
}

类似 JPA Repository 使用 ElasticsearchRepository,定义接口扩展它,通常根据业务需要自定义一些查询方法,命名规范与 spring data jpa一致
。一般find开头,跟着by后面是筛选条件的字段,多个字段用AND/OR连接,每个字段后面可以跟着操作,如:Like、In、GreaterThan等等。

创建服务

@Service
@Slf4j
public class ProductService {
@Resource
private ProductRepository productRepository;
public Optional<Product> queryProductByName(String name) {
Optional<Product> queriedProduct = Optional.ofNullable(productRepository.findByName(name));
queriedProduct.ifPresent(o -> {
log.info("query product by repository: {}", o);
});
return queriedProduct;
}
public void deleteAll() {
productRepository.deleteAll();
log.info("index products deleted all");
}
public void save(Product product) {
productRepository.save(product);
log.info("repository save Product: {}", product);
}
}

ProductService 根据仓库的存取行为进行业务代码编写,这里的业务较为简答

编写上下文配置

@SpringBootApplication
@EnableElasticsearchRepositories
public class ESSpringClientApplication {
public static void main(String[] args) {
SpringApplication app = new SpringApplicationBuilder()
.sources(ESSpringClientApplication.class)
.web(WebApplicationType.NONE)
.build();
app.run(args);
}
@Bean
public Jackson2ObjectMapperBuilderCustomizer customizer() {
return builder -> builder.indentOutput(true);
}
@Bean
public ElasticsearchCustomConversions elasticsearchCustomConversions() {
return new ElasticsearchCustomConversions(
Arrays.asList(new NumberToMoney(), new MoneyToNumber()));
}
@Bean
CommandLineRunner run() {
return new ClientRunner();
}
}

编写 Money 类型的读写转换器

@WritingConverter
public class MoneyToNumber implements Converter<Money, Number> {
@Override
public Number convert(Money source) {
long value = source.getAmountMinorLong();
return value;
}
}
@ReadingConverter
public class NumberToMoney implements Converter<Number, Money> {
@Override
public Money convert(Number source) {
return Money.ofMinor(CurrencyUnit.of("CNY"), source.longValue());
}
}

创建Jackson2ObjectMapperBuilderCustomizer Bean来自定义启用ObjectMapper的缩进输出,为后面ClientRunner进行json输出。
创建ElasticsearchCustomConversions Bean 来注入 Money 类型的自定义转换器。Money会变为Number 存入ES。从ES读到Number转换为Money。
创建CommandLineRunner Bean,它会在项目启动后运行它定义的run()。

@Slf4j
public class ClientRunner implements CommandLineRunner {
@Resource
private ElasticsearchRestTemplate elasticsearchRestTemplate;
@Resource
private ProductService productService;
@Resource
private ObjectMapper objectMapper;
private static final String LINE_SEP = System.getProperty("line.separator");
private ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() - 1,
Runtime.getRuntime().availableProcessors(), 1, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100));
private CountDownLatch cdl = new CountDownLatch(1);
@Override
public void run(String... args) throws Exception {
productService.deleteAll();
// 准备数据
Product product = Product.builder()
.id(1L)
.name("Bicycle")
.price(Money.ofMinor(CurrencyUnit.of("CNY"), 12000))
.build();
Product product2 = Product.builder()
.id(2L)
.name("Motorcycle")
.price(Money.ofMinor(CurrencyUnit.of("CNY"), 300000))
.build();
poolExecutor.execute(() -> {
// [1]
productService.save(product);
productService.queryProductByName("Bicycle");
// [2]
saveProduct(product2);
log.info("Product(id=2) exists: {}", elasticsearchRestTemplate.exists("2", Product.class));
Criteria criteria = new Criteria("name").is("Motorcycle");
CriteriaQuery criteriaQuery = new CriteriaQuery(criteria);
for (SearchHit<Product> hit : elasticsearchRestTemplate.search(criteriaQuery, Product.class).getSearchHits()) {
processProduct(hit.getContent());
}
// [3]
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.matchAllQuery())
.withPageable(PageRequest.of(0, 20))
.withSorts(SortBuilders.fieldSort("price").order(SortOrder.ASC))
.build();
for (SearchHit<Product> hit : elasticsearchRestTemplate.search(nativeSearchQuery, Product.class).getSearchHits()) {
processProduct(hit.getContent());
}
cdl.countDown();
});
cdl.await(1, TimeUnit.MINUTES);
System.exit(0);
}
private void saveProduct(Product product) {
IndexQuery idxQuery = new IndexQueryBuilder()
.withId(String.valueOf(product.getId()))
.withObject(product)
.build();
elasticsearchRestTemplate.index(idxQuery, IndexCoordinates.of("products"));
log.info("template save Product: {}", product);
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
log.error(e.getMessage());
return;
}
}
private void processProduct(Product content) {
try {
log.info("query data by template:{}{}", LINE_SEP, objectMapper.writeValueAsString(content));
} catch (JsonProcessingException e) {
log.error(e.getMessage());
return;
}
}
}

重点关注CriteriaQuery和NativeSearchQuery,有前面的REST API使用,这里会很好理解

posted @   槎城侠客  阅读(319)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)
点击右上角即可分享
微信分享提示
主题色彩