Elastichsearch CRUD

Elastichsearch CRUD

简介

Elasticsearch 是一个分布式近实时的搜索引擎:

  • Elasticsearch 对 Java 类库 Apache Lucene 进行封装,提供简单易用的 RESTful 接口和分布式等高级特性
  • Elasticsearch 默认配置了一个定时器,每隔一秒对已输入的文档进行 refresh,使文档可以被搜索到

Elasticsearch 使用倒排索引对文档进行索引以支持全文搜索。

MySQL InnoDB 引擎使用 B+ 树对数据建立索引,具有最左前缀原则,当模糊匹配 %key% 时索引会失效,这时可以使用 Elasticsearch 以支持全文搜索。

Elasticsearch 的 RESTful 接口使用 JSON 进行数据交换。

请求脚本和导入数据:https://github.com/liaozibo-dev/elasticsearch-bulk-api-data

感谢

感谢原作者 刘老师 发布的 Elasticsearch 文章和视频:

基本概念

Elasticsearch 基本概念:

  • Document(文档):相当于 MySQL 中的行记录,文档是 JSON 格式的,我们插入时文档中的 text 类型文本会被分词处理并形成索引;搜索时输入会被分词并根据关系性返回结果
  • Index(索引):相当于 MySQL 中的数据库
  • Type(类型):相当于 MySQL 中的表,Elasticsearch 6 开始不再支持 Type,现在索引下只有一个 _doc 类型

Elasticsearch 中的文本有 text 和 keyword 两种类型;
text 类型会被分词;
keyword 类型会被当成一个整体,不会被分词;

Kibana Dev Tools

服务地址:

使用 Kibana Dev Tools 向 Elasticsearch 发起请求

快捷键 作用
Ctrl/Cmd + I 自动缩进
Ctrl/Cmd + / 打开当前请求的文档
Ctrl/Cmd + Space 补全
Ctrl/Cmd + Enter 提交请求
Ctrl/Cmd + Up/Down 调到上一个/下一个请求
Ctrl/Cmd + Alt + L 折叠/展开当前请求
Ctrl/Cmd + Option + O 折叠其他所有请求并展开当前请求
Ctrl/Cmd + L 跳到指定行号

控制台可以输入多个请求,每个请求用空行隔开

检查 Elasticsearch 状态

查看 Elasticsearch 信息:

GET /

查看集群健康状态,v 参数会显示标题:

GET /_cat/health?v

查看节点状态:

GET /_cat/nodes?v

列出所有索引:

GET /_cat/indices?v

点击右侧小扳手图标,可以复制请求的 cURL 命令

curl --cacert config\certs\http_ca.crt --ssl-no-revoke -u elastic:password -XGET https://localhost:9200/

索引

创建索引

Elasticsearch 可以自动创建索引和 Mapping(相当于 MySQL 的表结构,Mapping 定义索引包含的字段和类型)

Elasticsearch 同时也会创建一个叫做 _doc 的 Type,从 Elasticsearch 6.0 开始,一个 Index 只能有一个 Type

# 插入文档时自动创建索引
PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}
# 查看 Mapping
GET twitter/_mapping

multi-type:一个字段可以定义多种类型,以便为了不同目的以不同方式索引同一字段。

比如一个字段可以定义为 text 类型,以对它进行分词;同时将它定义为 keyword,以将它视为一个整体,不对它进行分词,并且可以对其进行聚合和排序。

数据类型 作用
text 全文搜索字符串
keyword 精确字符串匹配和聚合
date、date_nanos 格式为日期或数字日期的字符串
byte、short、integer、long 整数
boolean 布尔
float、double、half_float 浮点数
object、nested 分级的类型
# 手动创建索引 test,并且定义 keyword 类型的 id 字段和 text 类型的 message 字段
PUT test
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      }
    }
  }
}
# 追加一个新的 long 类型字段 age
PUT test/_mapping
{
  "properties": {
    "age": {
      "type": "long"
    }
  }
}

查询索引

# 查看 Mapping
GET test/_mapping
# 检查索引是否存在
HEAD twitter

删除索引

# 删除索引
DELETE twitter

关闭索引

关闭索引后,将阻止读写操作。

  • 关闭索引后,集群上没有任何维护索引的开销
  • 关闭索引会占用大量磁盘空间
# 关闭索引
POST twitter/_close
# 开启索引
POST twitter/_open

文档

refresh 操作:使文档更改可见以进行搜索操作

近实时:Elasticsearch 默认有一个定时器每秒对文档进行 refresh

refresh

# 默认有一个定时器每秒对文档进行 refresh,通过参数可以让文档立即 refresh
PUT twitter/_doc/1?refresh=true
{
  
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}
# refresh=wait_for 相对于一个同步操作,等待文档 refresh 后再返回
PUT twitter/_doc/1?refresh=wait_for
{
  "user": "GB",
  "uid": 1,
  "city": "Beijing",
  "province": "Beijing",
  "country": "China"
}

创建文档

# 更新或插入文档
# 如果文档已存在,执行更新操作;否则,执行插入操作
PUT twitter/_doc/1
{
  "user": "GB",
  "uid": 1,
  "city": "Shenzhen",
  "province": "Guangdong",
  "country": "China"
}
# 创建文档,id 已存在会报错
POST twitter/_create/2
{
  "user": "GB",
  "uid": 2,
  "city": "Shenzhen",
  "province": "Guangdong",
  "country": "China"
}
# op_type=create 创建文档,id 已存在会报错
# op_type=index 创建文档,id 已存在会更新文档
# URL 没有显示指定 id 时,Elasticsearch 会自动分配一个类似 n7XW3oMBLht_QWdmdH48 的 id
POST twitter/_doc?op_type=create
{
    "user": "双榆树-张三",
  "message": "今儿天气不错啊,出去转转去",
  "uid": 2,
  "age": 20,
  "city": "北京",
  "province": "北京",
  "country": "中国",
  "address": "中国北京市海淀区",
  "location": {
    "lat": "39.970718",
    "lon": "116.325747"
  }
}

这里的更新默认指覆盖更新,而不是部分更新

# 自动分配 id,必须使用 POST,不能使用 PUT
POST my_index/_doc
{
  "content": "this is really cool"
}

自动分配 id 的插入方式更快,因为 Elasticsearch 不用检查文档是否存在来决定执行更新或插入操作

查询文档

# 查询文档
GET twitter/_doc/1
# 查询文档原文
GET twitter/_source/1
# 只查看 _souce 的部分字段
GET twitter/_doc/1?_source=city,age,province
# 检查文档是否存在
HEAD twitter/_doc/1

更新文档

通常使用 POST 创建文档,使用 PUT 进行更新

# 覆盖更新
PUT twitter/_doc/1
{
  
   "user": "GB",
   "uid": 1,
   "city": "北京",
   "province": "北京",
   "country": "中国",
   "location":{
     "lat":"29.084661",
     "lon":"111.335210"
   }
}
# 部分更新
POST twitter/_update/1
{
  "doc": {
    "city": "成都",
    "province": "四川"
  }
}
# 条件更新
# 对用户名是 GB 的文档进行更新
POST twitter/_update_by_query
{
  "query": {
    "match": {
      "user": "GB"
    }
  },
  "script": {
    "source": "ctx._source.city=params.city; ctx._source.province=params.province; ctx._source.country=params.country",
    "lang": "painless",
    "params": {
      "city": "上海",
      "province": "上海",
      "country": "中国"
    }
  }
}
# upsert 文档存在时,进行部分更新;文档不存在时,执行插入
POST catalog/_update/3
{
  "doc": {
    "author": "Albert Paro",
    "title": "Elasticsearch 5.0 Cookbook",
    "description": "Elasticsearch 5.0 Cookbook Third Edition",
    "price": "54.99"
  },
  "doc_as_upsert": true
}

删除文档

# 删除文档
DELETE twitter/_doc/1
# 条件删除
# 删除城市是上海的文档
POST twitter/_delete_by_query
{
  "query": {
    "match": {
      "city": "上海"
    }
  }
}

批量操作

批量查询

# 批量查询文档
GET _mget
{
  "docs": [
    {
      "_index": "twitter",
      "_id": 1
    },
    {
      "_index": "twitter",
      "_id": 2
    }
  ]
}
# 批量查询文档,只获取部分字段
GET _mget
{
  "docs": [
    {
      "_index": "twitter",
      "_id": 1,
      "_source": [
        "age",
        "city"
      ]
    },
    {
      "_index": "twitter",
      "_id": 2,
      "_source": [
        "age",
        "city"
      ]
    }
  ]
}
# 批量查询文档,简写
GET twitter/_mget
{
  "ids": [1, 2]
}
# 查询所有数据
GET twitter/_search
# 查询文档总数
GET twitter/_count

批量插入

# 批量更新或插入
POST _bulk
{"index":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

最好限制批量处理 1000 到 5000 个文档,总有效负载在 5 MB 到 15 MB 之间

# 批量插入,一条插入失败不会影响其他插入
# create 如果 id 已存在则会插入失败
# index 如果 id 已存在则进行更新操作
POST _bulk
{"create":{"_index":"twitter","_id":1}}
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{"index":{"_index":"twitter","_id":2}}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{"index":{"_index":"twitter","_id":3}}
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{"index":{"_index":"twitter","_id":4}}
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{"index":{"_index":"twitter","_id":5}}
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{"index":{"_index":"twitter","_id":6}}
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

批量删除

# 批量删除
POST _bulk
{"delete":{"_index":"twitter","_id":1}}
{"delete":{"_index":"twitter","_id":2}}

批量更新

# 批量更新
POST _bulk
{"update":{"_index":"twitter","_id":3}}
{"doc":{"city":"长沙"}}

批量导入数据

测试数据:https://github.com/liaozibo-dev/elasticsearch-bulk-api-data/blob/master/es.json

在 Git Bash 中执行以下命令导入 1000 条数据

curl --cacert config/certs/http_ca.crt -u elastic:password -s -H "Content-Type:application/x-ndjson" -XPOST https://localhost:9200/_bulk --data-binary @example/es.json

确认导入 1000 条数据

GET bank_account/_count

参阅

posted @ 2022-10-16 21:36  廖子博  阅读(35)  评论(0编辑  收藏  举报