ElasticSearch Curator使用教程
[TOC]
在日常工作中,当我们需要去维护一个elasitcsearch集群以期能稳定工作。通常需要有计划的做很多事情。比如定期的清除数据,合并 segment,备份恢复等。如果我们具备编程能力,这些工作一般都是可以通过各种编程语言根据我们的需求,调用elasticsearch的API可以完成的。但是,重复造轮子之前,我们应该确定,别人没有遇到过类似的事情,没有通用的工具可以完成我们的需求,我们才自己动手去做。elasticsearch整个生态圈已经很成熟。elastic.co提供的curator这个工具(用python开发的)已经为各种运维场景提供了完善的解决方案,大部分情况下,我们只需要使用curator就可以完成我们的日常需求。
安装curator
关于它的安装,可以查看官网。如果我们的服务器已经安装了pip,则可以很方便的通过pip install来完成:
pip install elasticsearch-curator
但很多生产环境是没有安装pip的。因为防火墙的关系,也不能直接访问https://packages.elastic.co。所以,官网上介绍的大部分安装方式,其实都是很适用。
因此,解决方案是直接下完整个RPM安装包,直接在服务器上安装。
地址:
Elasticsearch Curator 5.2.0 Binary Package (DEB)
Elasticsearch Curator 5.2.0 Binary Package for newer Debian 9 based systems (DEB)
Elasticsearch Curator 5.2.0 RHEL/CentOS 6 Binary Package (RPM)
Elasticsearch Curator 5.2.0 RHEL/CentOS 7 Binary Package (RPM)
curator的接口
curator提供了两个interface。一个是curator,一个是curator_cli。
curator_cli接口
先说这个接口,是因为它适合用于调试,但真正但运维场景我还是推荐curator。
$ curator_cli --help Usage: curator_cli [OPTIONS] COMMAND [ARGS]... Options: --config PATH Path to configuration file. Default: ~/.curator/curator.yml --host TEXT Elasticsearch host. --url_prefix TEXT Elasticsearch http url prefix. --port TEXT Elasticsearch port. --use_ssl Connect to Elasticsearch through SSL. --certificate TEXT Path to certificate to use for SSL validation. --client-cert TEXT Path to file containing SSL certificate for client auth. --client-key TEXT Path to file containing SSL key for client auth. --ssl-no-validate Do not validate SSL certificate --http_auth TEXT Use Basic Authentication ex: user:pass --timeout INTEGER Connection timeout in seconds. --master-only Only operate on elected master node. --dry-run Do not perform any changes. --loglevel TEXT Log level --logfile TEXT log file --logformat TEXT Log output format [default|logstash|json]. --version Show the version and exit. --help Show this message and exit. Commands: allocation Shard Routing Allocation close Close indices delete_indices Delete indices delete_snapshots Delete snapshots forcemerge forceMerge index/shard segments open Open indices replicas Change replica count show_indices Show indices show_snapshots Show snapshots snapshot Snapshot indices
上面是基本的命令参数。但为什么说不推荐在运维期间使用curator_cli。是因为这个接口只支持一次运行一个action。并且通过命令行写入复杂的filter是很反人类的。所以,一般是使用curator_cli来配合写curator的action.yml,或者做写简单的测试。
例子
获取所有的index
curator_cli --host 10.33.4.160 --port 9200 show_indices --verbos
输出:
.kibana open 54.9KB 6 1 1 2017-09-06T02:13:00Z .monitoring-alerts-6 open 6.5KB 1 1 1 2017-09-06T02:14:01Z .monitoring-es-6-2017.10.12 open 376.1MB 556576 1 1 2017-10-12T00:00:06Z .monitoring-es-6-2017.10.13 open 76.8MB 96220 1 1 2017-10-13T00:00:08Z .monitoring-kibana-6-2017.10.12 open 3.3MB 8638 1 1 2017-10-12T00:00:08Z .monitoring-kibana-6-2017.10.13 open 1.3MB 3390 1 1 2017-10-13T00:00:09Z .monitoring-logstash-6-2017.10.12 open 2.4MB 8211 1 1 2017-10-12T01:09:48Z .monitoring-logstash-6-2017.10.13 open 1.1MB 3390 1 1 2017-10-13T00:00:08Z .reporting-2017.09.17 open 376.9KB 2 5 1 2017-09-21T09:58:01Z .triggered_watches open 9.2MB 19 1 1 2017-09-06T02:14:01Z .watcher-history-3-2017.10.12 open 6.0MB 7200 1 1 2017-10-12T00:00:03Z .watcher-history-3-2017.10.13 open 2.4MB 2830 1 1 2017-10-13T00:00:03Z .watches open 23.6KB 4 1 1 2017-09-06T02:13:00Z syslog-network-2017.10.11 open 26.1MB 109195 5 1 2017-10-13T02:20:58Z syslog-network-2017.10.12 open 11.5KB 1 5 1 2017-10-12T20:11:28Z syslog-platform-2017.10.11 open 1019.5MB 4004662 5 1 2017-10-13T02:36:11Z syslog-platform-2017.10.12 open 16.0MB 61915 5 1 2017-10-12T03:17:38Z syslog-platform-2017.10.13 open 20.8MB 90628 5 1 2017-10-12T23:52:10Z watcher open 69.0KB 5 5 1 2017-09-21T02:23:10Z watcher_alarms-2017.10.11 open 365.5KB 1 5 1 2017-10-11T08:00:06Z
close index
curator_cli --host 10.33.4.160 --port 9200 close --filter_list '[{"filtertype":"age","source":"creation_date","direction":"older","unit":"days","unit_count":1},{"filtertype":"pattern","kind":"prefix","value":"syslog-"}]' 2017-10-13 17:30:21,573 INFO Closing selected indices: ['syslog-platform-2017.10.12'] 2017-10-13 17:30:21,713 INFO Singleton "close" action completed.
上面的操作就是通过--fliter_list
过滤出所有1天前创建的,以syslog-开头的index,然后关闭它们。可以从例子上看到,curator_cli很难阅读。
curator接口
这个接口从调用上就很简单:
curator [--config CONFIG.YML] [--dry-run] ACTION_FILE.YML
--config
之后跟上配置文件,再跟action文件。action文件中可以包含一连串的action(我们所有的操作都可以放在一起)。相比于curator_cli接口,curator接口集中式的config和action管理,可以方便我们重用变量,更利于维护和阅读。
configuration
一般来说,配置文件命名为curator.yml,当然,什么名字都无所谓,通过--config
引用即可。
--- # Remember, leave a key empty if there is no value. None will be a string, # not a Python "NoneType" client: hosts: - 10.33.4.160 port: 9200 url_prefix: use_ssl: False certificate: client_cert: client_key: ssl_no_validate: False http_auth: timeout: 30 master_only: False logging: loglevel: INFO logfile: /var/log/curator.log logformat: default blacklist: ['elasticsearch', 'urllib3']
很直观的配置,每个参数的含义都很清楚。这里需要指出的是,如果不配置参数的话,留空,即可,不要画蛇添足的写None。
另外,logfile如果不填的话,默认是输出到stdout。推荐是存储到文件中。如上例。
action
每个action由三部分组成:
- action,具体执行什么操作
- option, 配置哪些可选项
- filter, 过滤条件,哪些index需要执行action
可执行的操作:
对比curator_cli,多出来了alias, store, shrink等操作:
- alias
- allocation
- close
- cluster_routing
- create_index
- delete_indices
- delete_snapshots
- forcemerge
- index_settings
- open
- reindex
- replicas
- restore
- rollover
- shrink
- snapshot
options:
很多,这里不一一介绍,看后面的例子,理解最关键的几个,剩下自己到官网查资料:
- allocation_type
- continue_if_exception
- count
- delay
- delete_after
- delete_aliases
- disable_action
- extra_settings
- ignore_empty_list
- ignore_unavailable
- include_aliases
- include_global_state
- indices
- key
- max_age
- max_docs
- max_num_segments
- max_wait
- migration_prefix
- migration_suffix
- name
- node_filters
- number_of_replicas
- number_of_shards
- partial
- post_allocation
- preserve_existing-
- refresh
- remote_aws_key
- remote_aws_region
- remote_aws_secret_key
- remote_certificate
- remote_client_cert
- remote_client_key
remote_filters
- remote_ssl_no_validate
- remote_url_prefix
- rename_pattern
- rename_replacement
- repository
- requests_per_second
- request_body
- retry_count
- retry_interval
- routing_type
- setting
- shrink_node
- shrink_prefix
- shrink_suffix
- slices
- skip_repo_fs_check
- timeout
- timeout_override
- value
- wait_for_active_shards
- wait_for_completion
- wait_interval
- warn_if_no_indices
filters
最常用的filtertype是pattern和age:
- age
- alias
- allocated
- closed
- count
- forcemerged
- kibana
- none
- opened
- pattern
- period
- space
- state
例子:
--- # Remember, leave a key empty if there is no value. None will be a string, # not a Python "NoneType" # # Also remember that all examples have 'disable_action' set to True. If you # want to use this action as a template, be sure to set this to False after # copying it. actions: 1: action: delete_indices description: >- Delete metric indices older than 3 days (based on index name), for .monitoring-es-6- .monitoring-kibana-6- .monitoring-logstash-6- .watcher-history-3- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly. options: ignore_empty_list: True # disable_action: True filters: - filtertype: pattern kind: regex value: '^(\.monitoring-(es|kibana|logstash)-6-|\.watcher-history-3-).*$' - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 3 2: action: close description: >- Close indices older than 30 days (based on index name), for syslog- prefixed indices. options: ignore_empty_list: True delete_aliases: False # disable_action: True filters: - filtertype: pattern kind: prefix value: syslog- - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 30 3: action: forcemerge description: >- forceMerge syslog- prefixed indices older than 2 days (based on index creation_date) to 2 segments per shard. Delay 120 seconds between each forceMerge operation to allow the cluster to quiesce. Skip indices that have already been forcemerged to the minimum number of segments to avoid reprocessing. options: ignore_empty_list: True max_num_segments: 2 delay: 120 timeout_override: continue_if_exception: False filters: - filtertype: pattern kind: prefix value: syslog- exclude: - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 2 - filtertype: forcemerged max_num_segments: 2 exclude:
actions定义在一个yml文件中,通过缩进定义变量。例子中定义了3个action。它们会被顺序执行。当然,这三个任务(1,2,3)在这里没有先后依赖,如果有依赖关系,要保证被依赖的action写在前面。
三个任务分别是,删除索引,关闭过期索引,合并索引的segment。
这里特别要注意的是option选项,在多action,并且没有互相依赖的情况下,一定要设置ignore_empty_list: True
。这里代表的是,如果filter没有找到符合查询条件的index,略过。如果设置成false。则第一个action,没有找到匹配的index,整个curator会被abort。
官网上有各种action的例子,大家可以查看。
使用crontab定期执行curator
当然,curator是一个命令行工具,而我们的需要是需要自动化的定期维护,因此需要crontab等工具。一般的linux操作系统都自带crontab。修改/etc/crontab
文件:
SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root # For details see man 4 crontabs # Example of job definition: # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed 0 0 * * * root curator --config /opt/curator/curator.yml /opt/curator/action.yml
每天都会执行一次,delete index,close index,merge segment