linux系统Elasticsearch基础(3)

一、集群

1.集群搭建

1.时间同步
2.安装Java环境
3.安装ES
4.配置ES
5.启动ES

2.集群的特点

1.集群中的数据不论在哪一台机器操作，都可以看到
2.使用插件连接任意一台机器，都能看到三个节点
3.数据会自动分配到多个节点
4.如果主分片所在节点挂掉，副本节点的分片会自动升为主分片
5.如果主节点挂了，数据节点会自动提升为主节点

3.注意事项

1.集群节点的配置，不需要将所有节点的IP都写入配置文件，只需要写本机IP和集群中任意一台机器的IP即可
	52配置：    discovery.zen.ping.unicast.hosts: ["10.0.0.51", "10.0.0.52"]
	53配置：    discovery.zen.ping.unicast.hosts: ["10.0.0.51", "10.0.0.53"]

2.集群选举节点配置数量，一定是 集群数量/2+1
	discovery.zen.minimum_master_nodes: 2

3.ES默认5个分片1个副本，索引创建以后，分片数量不得修改，副本数可以修改

4.数据分配时分片颜色
	1）紫色：数据正在迁移
	2）黄色：数据正在复制

5.三个节点时，故障
	1）三个节点，没有副本时，一台机器都不能坏
	2）三个节点，一个副本时，可以坏两台，但是只能一台一台坏
	3）三个节点，两个副本时，可以随便随时坏两台

4.集群相关命令

1.查看主节点
GET _cat/master

2.查看集群健康状态
GET _cat/health

3.查看索引
GET _cat/indices

4.查看所有节点
GET _cat/nodes

5.查看分片
GET _cat/shards

二、集群修改

1.配置ES默认分片数和副本数

设置索引的分片数,默认为5 
#index.number_of_shards: 5  

设置索引的副本数,默认为1:  
#index.number_of_replicas: 1

2.修改指定索引的副本数

PUT /index/_settings
{
  "number_of_replicas": 2
}

3.修改所有索引副本数

PUT _all/_settings
{
  "number_of_replicas": 2
}

4.创建索引时指定分片数和副本数

PUT /qiudao
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}

#注意：
1.分片数不是越多越好，会占用资源
2.每个分片都会占用文件句柄数
3.查询数据时会根据算法去指定节点获取数据，分片数越少，查询成本越低

5.企业中一般怎么设置

1.跟开发沟通
2.看一共要几个节点
    2个节点，默认就可以了
    3个节点，重要的数据，2副本5分片，不重要的数据，1副本5分片
3.在开始阶段, 一个好的方案是根据你的节点数量按照1.5~3倍的原则来创建分片. 
    例如：如果你有3个节点, 则推荐你创建的分片数最多不超过9(3x3)个.
4.存储数据量多的可以设置分片多一些，存储数据量少的，可以少分写分片

三、集群的监控

1.监控内容

1.查看集群健康状态
	GET _cat/health

2.查看所有节点
	GET _cat/nodes
	
#两者有一个产生变化，说明集群出现故障

2.脚本监控

[root@db01 ~]# vim es_cluster_status.py
#!/usr/bin/env python
#coding:utf-8
#Author:_DriverZeng_
#Date:2017.02.12

import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
import subprocess
body = ""
false = "false"
clusterip = "10.0.0.51"
obj = subprocess.Popen(("curl -sXGET http://"+clusterip+":9200/_cluster/health?pretty=true"),shell=True, stdout=subprocess.PIPE)
data =  obj.stdout.read()
data1 = eval(data)
status = data1.get("status")
if status == "green":
    print "\033[1;32m 集群运行正常 \033[0m"
elif status == "yellow":
    print "\033[1;33m 副本分片丢失 \033[0m"
else:
    print "\033[1;31m 主分片丢失 \033[0m"
    
[root@db01 ~]# python es_cluster_status.py
 集群运行正常

3.监控插件 x-pack

四、ES优化

1.限制内存

1.启动内存最大是32G
2.服务器一半的内存全都给ES
3.设置可以先给小一点，慢慢提高
4.内存不足时
	1）让开发删除数据
	2）加节点
	3）提高配置
5.关闭swap空间

2.文件描述符

1.配置文件描述符
[root@db02 ~]# vim /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 131072
* hard nofile 131072

2.普通用户
[root@db02 ~]# vim /etc/security/limits.d/20-nproc.conf 
*          soft    nproc     65535
root       soft    nproc     unlimited

[root@db02 ~]# vim /etc/security/limits.d/90-nproc.conf 
*          soft    nproc     65535
root       soft    nproc     unlimited

3.语句优化

1.条件查询时，使用term查询，减少range的查询
2.建索引的时候，尽量使用命中率高的词

五、数据备份与恢复

0.安装npm环境

#安装npm（只需要在一个节点安装即可，如果前端还有nginx做反向代理可以每个节点都装）
[root@elkstack01 ~]# yum install -y npm
#进入下载head插件代码目录
[root@elkstack01 src]# cd /usr/local/
#从GitHub上克隆代码到本地
[root@elkstack01 local]# git clone git://github.com/mobz/elasticsearch-head.git
#克隆完成后，进入elasticsearch插件目录
[root@elkstack01 local]# cd elasticsearch-head/
#清除缓存
[root@elkstack01 elasticsearch-head]# npm cache clean -f
#使用npm安装n模块（不同的项目js脚本所需的node版本可能不同，所以就需要node版本管理工具）

1.安装备份工具

[root@db01 ~]# npm install elasticdump -g

2.备份命令

帮助文档：https://github.com/elasticsearch-dump/elasticsearch-dump

1）备份参数

--input: 数据来源
--output: 接收数据的目标
--type: 导出的数据类型（settings, analyzer, data, mapping, alias, template）

2）备份数据到另一个ES集群

elasticdump \
  --input=http://10.0.0.51:9200/my_index \
  --output=http://100.10.0.51:9200/my_index \
  --type=analyzer
  
elasticdump \
  --input=http://10.0.0.51:9200/my_index \
  --output=http://100.10.0.51:9200/my_index \
  --type=mapping
  
elasticdump --input=http://10.0.0.51:9200/my_index --output=http://100.10.0.51:9200/my_index --type=data

elasticdump \
  --input=http://10.0.0.51:9200/my_index \
  --output=http://100.10.0.51:9200/my_index \
  --type=template

3）备份数据到本地的json文件

elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=/tmp/student_mapping.json \
  --type=mapping
  
elasticdump \
  --input=http://10.0.0.51:9200/student \
  --output=/tmp/student_data.json \
  --type=data

4）导出文件打包

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=$ \
  | gzip > /data/my_index.json.gz

5）备份指定条件的数据

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=query.json \
  --searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}"

3.导入命令

elasticdump \
  --input=./student_template.json \
  --output=http://10.0.0.51:9200 \
  --type=template
  
elasticdump \
  --input=./student_mapping.json \
  --output=http://10.0.0.51:9200 \
  --type=mapping
  
elasticdump \
  --input=./student_data.json \
  --output=http://10.0.0.51:9200 \
  --type=data
  
elasticdump \
  --input=./student_analyzer.json \
  --output=http://10.0.0.51:9200 \
  --type=analyzer

#恢复数据的时候，如果数据已存在，会覆盖原数据

4.备份脚本

#!/bin/bash
echo '要备份的机器是：'${1}
index_name='
test
student
linux7
'
for index in `echo $index_name`
do
    echo "start input index ${index}"
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_analyzer.json --type=analyzer &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_data.json --type=data &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_template.json --type=template &> /dev/null
done

5.导入脚本

#!/bin/bash
echo '要导入的机器是：'${1}
index_name='
test
student
linux7
'
for index in `echo $index_name`
do
    echo "start input index ${index}"
    elasticdump --input=/data/${index}_alias.json --output=http://${1}:9200/${index} --type=alias &> /dev/null
    elasticdump --input=/data/${index}_analyzer.json --output=http://${1}:9200/${index} --type=analyzer &> /dev/null
    elasticdump --input=/data/${index}_data.json --output=http://${1}:9200/${index} --type=data &> /dev/null
    elasticdump --input=/data/${index}_template.json --output=http://${1}:9200/${index} --type=template &> /dev/null
done

六、中文分词器 ik

1.插入数据

POST /index/_doc/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /index/_doc/2
{"content":"公安部：各地校车将享最高路权"}

POST /index/_doc/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /index/_doc/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

2.查询数据

POST /index/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}

#查看结果，会获取到带中字和国字的数据，我们查询的词被分开了，所以我们要使用ik中文分词器

3.配置中文分词器

1）安装插件

[root@db01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip
[root@db02 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip
[root@db03 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

#解压到es目录下
[root@db01 ~]# unzip elasticsearch-analysis-ik-6.6.0.zip -d /etc/elasticsearch/

2）创建索引与mapping

PUT /news

curl -XPOST http://localhost:9200/news/text/_mapping -H 'Content-Type:application/json' -d'
{
    "properties": {
         "content": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
        }
    }
}

3）编辑我们要定义的词

[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">/etc/elasticsearch/analysis-ik/my.dic</entry>
    
[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/my.dic 
中国

[root@redis01 ~]# chown -R elasticsearch.elasticsearch /etc/elasticsearch/analysis-ik/my.dic

4）重新插入数据

POST /news/text/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /news/text/2
{"content":"公安部：各地校车将享最高路权"}

POST /news/text/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /news/text/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

5）重新插入数据

POST /news/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
   }
}

posted @ 2020-08-13 14:24 王顺子阅读(156) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称：王顺子
园龄： 4年6个月
粉丝： 4
关注： 27

2025年1月

日

一

二

三

四

五

六