ES入门1：安装，运行，基本操作，查看集群信息，alias机制，systemctl_unit

参考：

https://www.jianshu.com/p/60b242cbd8b4 （Elasticsearch: 从入门到精通）

https://blog.csdn.net/qq_35170267/article/details/105098769（ElasticSearch7的基本原理和使用）

一，安装

下载：

https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

安装：

https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.13.2-darwin-x86_64.tar.gz
tar -xzf elasticsearch-7.13.2-darwin-x86_64.tar.gz
cd elasticsearch-7.13.2/

运行：

./bin/elasticsearch #在服务目录下启动服务，守护进程加 -d

#查看当前节点的所有 Index
curl -X GET 'http://localhost:9200/_cat/indices?v'
#列出每个 Index 所包含的 Type
curl 'localhost:9200/_mapping?pretty=true'

修改相关配置：

es有一个安全性的规则，不能使用root用户启动它，所以我们要先创建一个属于es的用户组，在其下创建一个用户

elaticsearch默认不能用root用户启动，所以会报java.lang.RuntimeException: can not run elasticsearch as root异常。

解决方式：
adduser es
passwd es
chown -R es:es elasticsearch-6.3.2/
chmod -R 770 elasticsearch-6.3.2/

elasticsearch启动需要对centos下的一些limit进行修改 sudo vi /etc/security/limits.conf

# the end
eshui soft nofile 65535
eshui hard nofile 65536
eshui soft nproc 4096
eshui hard nproc 4096

sudo vi /etc/security/limits.d/20-nproc.conf

es      soft    nproc     4096

如果这时报错"max virtual memory areas vm.maxmapcount [65530] is too low"，要运行下面的命令。

sudo vi /etc/sysctl.conf
#添加如下配置,让虚拟内存最大为655360字节
vm.max_map_count=655360

或者
sudo sysctl -w vm.max_map_count=262144

二，CURD

//增加
curl -H "Content-Type: application/json" 
-XPUT 'http://localhost:9200/store/books/2' -d '{
  "title": "Elasticsearch Blueprints",
  "name" : {
    "first" : "Vineeth",
    "last" : "Mohan"
  },
  "publish_date":"2015-06-06",
  "price":"35.99"
}'
//增加或修改
curl -H "Content-Type: application/json" 
-XPOST 'http://localhost:9200/store/books/2' -d '{
  "title": "Elasticsearch Blueprints",
  "name" : {
    "first" : "Vineeth",
    "last" : "Mohan"
  },
  "publish_date":"2015-06-06",
  "price":"35.99"
}'
//删除
curl -XDELETE 'http://localhost:9200/store/books/2'
//查询
curl 'http://localhost:9200/store/books/2'
#store是索引，books是类型，2是id

参考：mediacloud-awesome\Service\Data\Modules\Site\Controllers\Cli.php (媒体云对elasticsearch的使用)

三，基本常识

1，一个index索引一个type，详单于一个数据库只能有一个table,

2，每条记录有一个id，且唯一，put需要指定id：/index/_doc/id，如果指定的id不存在会新建；

3，post不需要指定id:/index/_doc ,如果指定了id就和put一样，如果没有指定id必须使用post

4，mapping拥有全部的字段约束信息，新增或修改字段时字段类型不能修改。https://www.cnblogs.com/jxd283465/p/11698972.html

5，插入数据的时候，如果我们的语句中指明了index和type，如果ES里面不存在，默认帮我们自动创建

6，可以使用_doc代替所有type

四，基本原理

倒排索引

倒排索引保存了每个单词在文档中的存在情况。如果现在有一个需求，找到所有含有单词quick的文档。如果是一般的写法，需要将所有文档遍历一遍。如果有倒排索引的存在，就可以直接找到含有quick的文档。

在倒排索引中，key是每个单词，而value是含有这个单词的所有文档的序号。

在elastic search中，会把倒排索引的key进行处理，比如dogs和dog其实是同一个意思，Quick和quick其实是同一个意思。

分词器介绍

分词器包括三部分：

character filter：分词之前的预处理，过滤掉HTML标签特殊符号等。
tokenizer：分词。
token filter：标准化。

es内置分词器

standard分词器：大写转小写，去除停用词，中文的话就是单个字分词。

simple分词器：过滤掉数字，以非字母字符来分割信息，然后将词汇单元转化成小写形式。

Whitepace分词器：仅仅根据空格分词。

language分词器：特定语言分词器。

安装中文分词器

为ES安装中文分词器插件，首先使用git clone将中文分词器的代码拉下来

git clone git@github.com:medcl/elasticsearch-analysis-ik.git

然后使用maven编译源码

mvn clean install -Dmaven.skip.test=true

之后target文件夹下会生成一个releases文件夹，里面有一个elasticsearch-analysis-ik的zip压缩包。
将该压缩包拷贝到elasticsearch/plugins/ik下，ik文件夹需要自己创建。解压缩后，将原压缩包删除。中文分词器插件配置完毕。

查看集群信息

参考：

https://zhuanlan.zhihu.com/p/34727867（3个es技巧）

之前我们需要查看集群信息都是通过集群API获取，但是返回数据为JSON格式，不利于理解，而且也不便于记忆。为快速方便看集群的各种监控或者配置情况，我们可以通过使用一个友好API， cat API。

curl localhost:9200/_cat
=^.^=
/_cat/allocation  # 返回分片分配和磁盘使用的信息
/_cat/shards # 返回关于分片的信息
/_cat/shards/{index} 
/_cat/master # 返回当选主节点信息
/_cat/nodes # 返回集群拓扑相关信息
/_cat/tasks 
/_cat/indices # 返回所有索引信息
/_cat/indices/{index}
/_cat/segments # 索引段包括分片布局的API信息
/_cat/segments/{index}
/_cat/count # 为所有索引返回文档个数的信息
/_cat/count/{index}
/_cat/recovery # 返回还原过程的视图
/_cat/recovery/{index}
/_cat/health # 集群健康度 加上?v可以把字段打印出来
/_cat/pending_tasks # 正在等待执行任务信息
/_cat/aliases # 返回有关别名信息
/_cat/aliases/{alias}
/_cat/thread_pool #集群范围内的线程池统计信息
/_cat/thread_pool/{thread_pools}
/_cat/plugins　# 插件信息
/_cat/fielddata # 字段数据信息使用堆内存
/_cat/fielddata/{fields}
/_cat/nodeattrs # 输出显示自定义节点属性
/_cat/repositories # 输出集群中注册快照存储库
/_cat/snapshots/{repository} # 输出属于指定仓库的快照信息
/_cat/templates # 输出当前正在存在的模板信息

使用alias（别名）管理索引

首先我们要说的是索引alias

alias就是一个快捷方式或者说软链接，可以指向1个或者多个索引，通过alias我们可以实现更好的实现索引的管理，比如查找搜索，分类数据，切换索引等。

curl -XPOST 'localhost:9200/_aliases' -d 
{
  "actions": [
      {"remove": {"index": "day1", "alias": "week2"}},
      {"add": {"index": "day1", "alias": "week1"}},
      {"add": {"index": "day2", "alias": "week1"}}
  ]
}
在这里add，remove方法也可以分开单独写。

通过上面添加alias，我们就可以用week1获取操作几天的索引数据了。

之前: curl localhost:9200/day1,day2/_search
现在: curl localhost:9200/week1/_search

使用systemctl管理

[Unit]
  Description=elasticsearch-server
  After=syslog.target network.target
  Wants=network.target

[Service]
  User=elasticsearch
  Type=simple
  PIDFile=/var/run/es_9200.pid
  WorkingDirectory=/data/soft/elasticsearch-8.13.3
  ExecStart=/data/soft/elasticsearch-8.13.3/bin/elasticsearch
  ExecStop=/bin/kill -SIGINT $MAINPID
  ExecReload=/bin/kill -HUP $MAINPID

[Install]
  WantedBy=multi-user.target

posted @ 2020-11-06 11:03 指令跳动阅读(288) 评论(0) 编辑收藏举报

刷新页面返回顶部

细节决定专业