Elastichsearch CRUD
Elastichsearch CRUD
简介
Elasticsearch 是一个分布式近实时的搜索引擎:
- Elasticsearch 对 Java 类库 Apache Lucene 进行封装,提供简单易用的 RESTful 接口和分布式等高级特性
- Elasticsearch 默认配置了一个定时器,每隔一秒对已输入的文档进行 refresh,使文档可以被搜索到
Elasticsearch 使用倒排索引对文档进行索引以支持全文搜索。
MySQL InnoDB 引擎使用 B+ 树对数据建立索引,具有最左前缀原则,当模糊匹配 %key%
时索引会失效,这时可以使用 Elasticsearch 以支持全文搜索。
Elasticsearch 的 RESTful 接口使用 JSON 进行数据交换。
请求脚本和导入数据:https://github.com/liaozibo-dev/elasticsearch-bulk-api-data
感谢
感谢原作者 刘老师 发布的 Elasticsearch 文章和视频:
- 原作者博客:Elastic:开发者上手指南 - Elastic 中国社区官方博客
- 原作者视频:elasticstack - B站
基本概念
Elasticsearch 基本概念:
- Document(文档):相当于 MySQL 中的行记录,文档是 JSON 格式的,我们插入时文档中的 text 类型文本会被分词处理并形成索引;搜索时输入会被分词并根据关系性返回结果
- Index(索引):相当于 MySQL 中的数据库
- Type(类型):相当于 MySQL 中的表,Elasticsearch 6 开始不再支持 Type,现在索引下只有一个
_doc
类型
Elasticsearch 中的文本有 text 和 keyword 两种类型;
text 类型会被分词;
keyword 类型会被当成一个整体,不会被分词;
Kibana Dev Tools
服务地址:
- Elasticsearch:https://localhost:9200/
- Kibana:http://localhost:5601/
- Kibana Dev Tools:http://localhost:5601/app/dev_tools#/console
使用 Kibana Dev Tools 向 Elasticsearch 发起请求
快捷键 | 作用 |
---|---|
Ctrl/Cmd + I |
自动缩进 |
Ctrl/Cmd + / |
打开当前请求的文档 |
Ctrl/Cmd + Space |
补全 |
Ctrl/Cmd + Enter |
提交请求 |
Ctrl/Cmd + Up/Down |
调到上一个/下一个请求 |
Ctrl/Cmd + Alt + L |
折叠/展开当前请求 |
Ctrl/Cmd + Option + O |
折叠其他所有请求并展开当前请求 |
Ctrl/Cmd + L |
跳到指定行号 |
控制台可以输入多个请求,每个请求用空行隔开
检查 Elasticsearch 状态
查看 Elasticsearch 信息:
GET /
查看集群健康状态,v
参数会显示标题:
GET /_cat/health?v
查看节点状态:
GET /_cat/nodes?v
列出所有索引:
GET /_cat/indices?v
点击右侧小扳手图标,可以复制请求的 cURL 命令
curl --cacert config\certs\http_ca.crt --ssl-no-revoke -u elastic:password -XGET https://localhost:9200/
索引
创建索引
Elasticsearch 可以自动创建索引和 Mapping(相当于 MySQL 的表结构,Mapping 定义索引包含的字段和类型)
Elasticsearch 同时也会创建一个叫做 _doc 的 Type,从 Elasticsearch 6.0 开始,一个 Index 只能有一个 Type
# 插入文档时自动创建索引
PUT twitter/_doc/1
{
"user": "GB",
"uid": 1,
"city": "Beijing",
"province": "Beijing",
"country": "China"
}
# 查看 Mapping
GET twitter/_mapping
multi-type:一个字段可以定义多种类型,以便为了不同目的以不同方式索引同一字段。
比如一个字段可以定义为 text 类型,以对它进行分词;同时将它定义为 keyword,以将它视为一个整体,不对它进行分词,并且可以对其进行聚合和排序。
数据类型 | 作用 |
---|---|
text | 全文搜索字符串 |
keyword | 精确字符串匹配和聚合 |
date、date_nanos | 格式为日期或数字日期的字符串 |
byte、short、integer、long | 整数 |
boolean | 布尔 |
float、double、half_float | 浮点数 |
object、nested | 分级的类型 |
# 手动创建索引 test,并且定义 keyword 类型的 id 字段和 text 类型的 message 字段
PUT test
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"message": {
"type": "text"
}
}
}
}
# 追加一个新的 long 类型字段 age
PUT test/_mapping
{
"properties": {
"age": {
"type": "long"
}
}
}
查询索引
# 查看 Mapping
GET test/_mapping
# 检查索引是否存在
HEAD twitter
删除索引
# 删除索引
DELETE twitter
关闭索引
关闭索引后,将阻止读写操作。
- 关闭索引后,集群上没有任何维护索引的开销
- 关闭索引会占用大量磁盘空间
# 关闭索引
POST twitter/_close
# 开启索引
POST twitter/_open
文档
refresh 操作:使文档更改可见以进行搜索操作
近实时:Elasticsearch 默认有一个定时器每秒对文档进行 refresh
refresh
# 默认有一个定时器每秒对文档进行 refresh,通过参数可以让文档立即 refresh
PUT twitter/_doc/1?refresh=true
{
"user": "GB",
"uid": 1,
"city": "Beijing",
"province": "Beijing",
"country": "China"
}
# refresh=wait_for 相对于一个同步操作,等待文档 refresh 后再返回
PUT twitter/_doc/1?refresh=wait_for
{
"user": "GB",
"uid": 1,
"city": "Beijing",
"province": "Beijing",
"country": "China"
}
创建文档
# 更新或插入文档
# 如果文档已存在,执行更新操作;否则,执行插入操作
PUT twitter/_doc/1
{
"user": "GB",
"uid": 1,
"city": "Shenzhen",
"province": "Guangdong",
"country": "China"
}
# 创建文档,id 已存在会报错
POST twitter/_create/2
{
"user": "GB",
"uid": 2,
"city": "Shenzhen",
"province": "Guangdong",
"country": "China"
}
# op_type=create 创建文档,id 已存在会报错
# op_type=index 创建文档,id 已存在会更新文档
# URL 没有显示指定 id 时,Elasticsearch 会自动分配一个类似 n7XW3oMBLht_QWdmdH48 的 id
POST twitter/_doc?op_type=create
{
"user": "双榆树-张三",
"message": "今儿天气不错啊,出去转转去",
"uid": 2,
"age": 20,
"city": "北京",
"province": "北京",
"country": "中国",
"address": "中国北京市海淀区",
"location": {
"lat": "39.970718",
"lon": "116.325747"
}
}
这里的更新默认指覆盖更新,而不是部分更新
# 自动分配 id,必须使用 POST,不能使用 PUT
POST my_index/_doc
{
"content": "this is really cool"
}
自动分配 id 的插入方式更快,因为 Elasticsearch 不用检查文档是否存在来决定执行更新或插入操作
查询文档
# 查询文档
GET twitter/_doc/1
# 查询文档原文
GET twitter/_source/1
# 只查看 _souce 的部分字段
GET twitter/_doc/1?_source=city,age,province
# 检查文档是否存在
HEAD twitter/_doc/1
更新文档
通常使用 POST 创建文档,使用 PUT 进行更新
# 覆盖更新
PUT twitter/_doc/1
{
"user": "GB",
"uid": 1,
"city": "北京",
"province": "北京",
"country": "中国",
"location":{
"lat":"29.084661",
"lon":"111.335210"
}
}
# 部分更新
POST twitter/_update/1
{
"doc": {
"city": "成都",
"province": "四川"
}
}
# 条件更新
# 对用户名是 GB 的文档进行更新
POST twitter/_update_by_query
{
"query": {
"match": {
"user": "GB"
}
},
"script": {
"source": "ctx._source.city=params.city; ctx._source.province=params.province; ctx._source.country=params.country",
"lang": "painless",
"params": {
"city": "上海",
"province": "上海",
"country": "中国"
}
}
}
# upsert 文档存在时,进行部分更新;文档不存在时,执行插入
POST catalog/_update/3
{
"doc": {
"author": "Albert Paro",
"title": "Elasticsearch 5.0 Cookbook",
"description": "Elasticsearch 5.0 Cookbook Third Edition",
"price": "54.99"
},
"doc_as_upsert": true
}
删除文档
# 删除文档
DELETE twitter/_doc/1
# 条件删除
# 删除城市是上海的文档
POST twitter/_delete_by_query
{
"query": {
"match": {
"city": "上海"
}
}
}
批量操作
批量查询
# 批量查询文档
GET _mget
{
"docs": [
{
"_index": "twitter",
"_id": 1
},
{
"_index": "twitter",
"_id": 2
}
]
}
# 批量查询文档,只获取部分字段
GET _mget
{
"docs": [
{
"_index": "twitter",
"_id": 1,
"_source": [
"age",
"city"
]
},
{
"_index": "twitter",
"_id": 2,
"_source": [
"age",
"city"
]
}
]
}
# 批量查询文档,简写
GET twitter/_mget
{
"ids": [1, 2]
}
# 查询所有数据
GET twitter/_search
# 查询文档总数
GET twitter/_count
批量插入
# 批量更新或插入
POST _bulk
{"index":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}
最好限制批量处理 1000 到 5000 个文档,总有效负载在 5 MB 到 15 MB 之间
# 批量插入,一条插入失败不会影响其他插入
# create 如果 id 已存在则会插入失败
# index 如果 id 已存在则进行更新操作
POST _bulk
{"create":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}
批量删除
# 批量删除
POST _bulk
{"delete":{"_index":"twitter","_id":1}}
{"delete":{"_index":"twitter","_id":2}}
批量更新
# 批量更新
POST _bulk
{"update":{"_index":"twitter","_id":3}}
{"doc":{"city":"长沙"}}
批量导入数据
测试数据:https://github.com/liaozibo-dev/elasticsearch-bulk-api-data/blob/master/es.json
在 Git Bash 中执行以下命令导入 1000 条数据
curl --cacert config/certs/http_ca.crt -u elastic:password -s -H "Content-Type:application/x-ndjson" -XPOST https://localhost:9200/_bulk --data-binary @example/es.json
确认导入 1000 条数据
GET bank_account/_count