Elastichsearch CRUD

简介

Elasticsearch 是一个分布式近实时的搜索引擎：

Elasticsearch 对 Java 类库 Apache Lucene 进行封装，提供简单易用的 RESTful 接口和分布式等高级特性
Elasticsearch 默认配置了一个定时器，每隔一秒对已输入的文档进行 refresh，使文档可以被搜索到

Elasticsearch 使用倒排索引对文档进行索引以支持全文搜索。

MySQL InnoDB 引擎使用 B+ 树对数据建立索引，具有最左前缀原则，当模糊匹配 %key% 时索引会失效，这时可以使用 Elasticsearch 以支持全文搜索。

Elasticsearch 的 RESTful 接口使用 JSON 进行数据交换。

请求脚本和导入数据：https://github.com/liaozibo-dev/elasticsearch-bulk-api-data

感谢

感谢原作者刘老师发布的 Elasticsearch 文章和视频：

原作者博客：Elastic：开发者上手指南 - Elastic 中国社区官方博客
原作者视频：elasticstack - B站

基本概念

Elasticsearch 基本概念：

Document（文档）：相当于 MySQL 中的行记录，文档是 JSON 格式的，我们插入时文档中的 text 类型文本会被分词处理并形成索引；搜索时输入会被分词并根据关系性返回结果
Index（索引）：相当于 MySQL 中的数据库
Type（类型）：相当于 MySQL 中的表，Elasticsearch 6 开始不再支持 Type，现在索引下只有一个 _doc 类型

Elasticsearch 中的文本有 text 和 keyword 两种类型；
text 类型会被分词；
keyword 类型会被当成一个整体，不会被分词；

Kibana Dev Tools

服务地址：

Elasticsearch：https://localhost:9200/
Kibana：http://localhost:5601/
Kibana Dev Tools：http://localhost:5601/app/dev_tools#/console

使用 Kibana Dev Tools 向 Elasticsearch 发起请求

快捷键	作用
`Ctrl/Cmd + I`	自动缩进
`Ctrl/Cmd + /`	打开当前请求的文档
`Ctrl/Cmd + Space`	补全
`Ctrl/Cmd + Enter`	提交请求
`Ctrl/Cmd + Up/Down`	调到上一个/下一个请求
`Ctrl/Cmd + Alt + L`	折叠/展开当前请求
`Ctrl/Cmd + Option + O`	折叠其他所有请求并展开当前请求
`Ctrl/Cmd + L`	跳到指定行号

控制台可以输入多个请求，每个请求用空行隔开

检查 Elasticsearch 状态

查看 Elasticsearch 信息：

GET /

查看集群健康状态，v 参数会显示标题：

GET /_cat/health?v

查看节点状态：

GET /_cat/nodes?v

列出所有索引：

GET /_cat/indices?v

点击右侧小扳手图标，可以复制请求的 cURL 命令

curl --cacert config\certs\http_ca.crt --ssl-no-revoke -u elastic:password -XGET https://localhost:9200/

索引

创建索引

Elasticsearch 可以自动创建索引和 Mapping（相当于 MySQL 的表结构，Mapping 定义索引包含的字段和类型）

Elasticsearch 同时也会创建一个叫做 _doc 的 Type，从 Elasticsearch 6.0 开始，一个 Index 只能有一个 Type

# 插入文档时自动创建索引
PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

# 查看 Mapping
GET twitter/_mapping

multi-type：一个字段可以定义多种类型，以便为了不同目的以不同方式索引同一字段。

比如一个字段可以定义为 text 类型，以对它进行分词；同时将它定义为 keyword，以将它视为一个整体，不对它进行分词，并且可以对其进行聚合和排序。

数据类型	作用
text	全文搜索字符串
keyword	精确字符串匹配和聚合
date、date_nanos	格式为日期或数字日期的字符串
byte、short、integer、long	整数
boolean	布尔
float、double、half_float	浮点数
object、nested	分级的类型

# 手动创建索引 test，并且定义 keyword 类型的 id 字段和 text 类型的 message 字段
PUT test
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      }
    }
  }
}

# 追加一个新的 long 类型字段 age
PUT test/_mapping
{
  "properties": {
    "age": {
      "type": "long"
    }
  }
}

查询索引

# 查看 Mapping
GET test/_mapping

# 检查索引是否存在
HEAD twitter

删除索引

# 删除索引
DELETE twitter

关闭索引

关闭索引后，将阻止读写操作。

关闭索引后，集群上没有任何维护索引的开销
关闭索引会占用大量磁盘空间

# 关闭索引
POST twitter/_close

# 开启索引
POST twitter/_open

文档

refresh 操作：使文档更改可见以进行搜索操作

近实时：Elasticsearch 默认有一个定时器每秒对文档进行 refresh

refresh

# 默认有一个定时器每秒对文档进行 refresh，通过参数可以让文档立即 refresh
PUT twitter/_doc/1?refresh=true
{
  
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

# refresh=wait_for 相对于一个同步操作，等待文档 refresh 后再返回
PUT twitter/_doc/1?refresh=wait_for
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

创建文档

# 更新或插入文档
# 如果文档已存在，执行更新操作；否则，执行插入操作
PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Shenzhen",
  "province": "Guangdong",
  "country": "China"
}

# 创建文档，id 已存在会报错
POST twitter/_create/2
{
  "user": "GB",
  "uid": 2,
  "city": "Shenzhen",
  "province": "Guangdong",
  "country": "China"
}

# op_type=create 创建文档，id 已存在会报错
# op_type=index 创建文档，id 已存在会更新文档
# URL 没有显示指定 id 时，Elasticsearch 会自动分配一个类似 n7XW3oMBLht_QWdmdH48 的 id
POST twitter/_doc?op_type=create
{
    "user": "双榆树-张三",
  "message": "今儿天气不错啊，出去转转去",
  "uid": 2,
  "age": 20,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市海淀区",
  "location": {
    "lat": "39.970718",
    "lon": "116.325747"
  }
}

这里的更新默认指覆盖更新，而不是部分更新

# 自动分配 id，必须使用 POST，不能使用 PUT
POST my_index/_doc
{
  "content": "this is really cool"
}

自动分配 id 的插入方式更快，因为 Elasticsearch 不用检查文档是否存在来决定执行更新或插入操作

查询文档

# 查询文档
GET twitter/_doc/1

# 查询文档原文
GET twitter/_source/1

# 只查看 _souce 的部分字段
GET twitter/_doc/1?_source=city,age,province

# 检查文档是否存在
HEAD twitter/_doc/1

更新文档

通常使用 POST 创建文档，使用 PUT 进行更新

# 覆盖更新
PUT twitter/_doc/1
{
  
   "user": "GB",
   "uid": 1,
   "city": "北京",
   "province": "北京",
   "country": "中国",
   "location":{
     "lat":"29.084661",
     "lon":"111.335210"
   }
}

# 部分更新
POST twitter/_update/1
{
  "doc": {
    "city": "成都",
    "province": "四川"
  }
}

# 条件更新
# 对用户名是 GB 的文档进行更新
POST twitter/_update_by_query
{
  "query": {
    "match": {
      "user": "GB"
    }
  },
  "script": {
    "source": "ctx._source.city=params.city; ctx._source.province=params.province; ctx._source.country=params.country",
    "lang": "painless",
    "params": {
      "city": "上海",
      "province": "上海",
      "country": "中国"
    }
  }
}

# upsert 文档存在时，进行部分更新；文档不存在时，执行插入
POST catalog/_update/3
{
  "doc": {
    "author": "Albert Paro",
    "title": "Elasticsearch 5.0 Cookbook",
    "description": "Elasticsearch 5.0 Cookbook Third Edition",
    "price": "54.99"
  },
  "doc_as_upsert": true
}

删除文档

# 删除文档
DELETE twitter/_doc/1

# 条件删除
# 删除城市是上海的文档
POST twitter/_delete_by_query
{
  "query": {
    "match": {
      "city": "上海"
    }
  }
}

批量操作

批量查询

# 批量查询文档
GET _mget
{
  "docs": [
    {
      "_index": "twitter",
      "_id": 1
    },
    {
      "_index": "twitter",
      "_id": 2
    }
  ]
}

# 批量查询文档，只获取部分字段
GET _mget
{
  "docs": [
    {
      "_index": "twitter",
      "_id": 1,
      "_source": [
        "age",
        "city"
      ]
    },
    {
      "_index": "twitter",
      "_id": 2,
      "_source": [
        "age",
        "city"
      ]
    }
  ]
}

# 批量查询文档，简写
GET twitter/_mget
{
  "ids": [1, 2]
}

# 查询所有数据
GET twitter/_search

# 查询文档总数
GET twitter/_count

批量插入

# 批量更新或插入
POST _bulk
{"index":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

最好限制批量处理 1000 到 5000 个文档，总有效负载在 5 MB 到 15 MB 之间

# 批量插入，一条插入失败不会影响其他插入
# create 如果 id 已存在则会插入失败
# index 如果 id 已存在则进行更新操作
POST _bulk
{"create":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

批量删除

# 批量删除
POST _bulk
{"delete":{"_index":"twitter","_id":1}}
{"delete":{"_index":"twitter","_id":2}}

批量更新

# 批量更新
POST _bulk
{"update":{"_index":"twitter","_id":3}}
{"doc":{"city":"长沙"}}

批量导入数据

测试数据：https://github.com/liaozibo-dev/elasticsearch-bulk-api-data/blob/master/es.json

在 Git Bash 中执行以下命令导入 1000 条数据

curl --cacert config/certs/http_ca.crt -u elastic:password -s -H "Content-Type:application/x-ndjson" -XPOST https://localhost:9200/_bulk --data-binary @example/es.json

确认导入 1000 条数据

GET bank_account/_count

参阅

posted @ 2022-10-16 21:36 廖子博阅读(35) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

廖子博

liaozibo.com

Elastichsearch CRUD

Elastichsearch CRUD

简介

感谢

基本概念

Kibana Dev Tools

检查 Elasticsearch 状态

索引

创建索引

查询索引

删除索引

关闭索引

文档

refresh

创建文档

查询文档

更新文档

删除文档

批量操作

批量查询

批量插入

批量删除

批量更新

批量导入数据

参阅

公告