elasticsearch数据过期删除处理
一、概述
使用elasticsearch收集日志进行处理,时间久了,很老的数据就没用了或者用途不是很大,这个时候就要对过期数据进行清理.这里介绍两种方式清理这种过期的数据。
1、curator
关于版本:
安装:
https://www.elastic.co/guide/en/elasticsearch/client/curator/current/installation.html
我使用的是ubuntu系统,所以参考的是https://www.elastic.co/guide/en/elasticsearch/client/curator/current/apt-repository.html
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
vim /etc/apt/sources.list.d/curator.list
deb [arch=amd64] https://packages.elastic.co/curator/5/debian stable main
sudo apt-get update && sudo apt-get install elasticsearch-curator
我使用的是elasticsearch-6.5.1,所以安装的是curator5.
安装完成后会生成两个命令:curator、curator_cli,这里我们只先用到curator。
需要创建配置文件:有两个文件一个是config、一个是action
mkdir {/etc/curator,/data/curator}
config:
# cat config_file.yml
client:
hosts:
- 127.0.0.1
port: 9200
url_prefix:
use_ssl: False
certficate:
client_cert:
client_key:
ssl_no_validate: False
http_auth:
timeout:
master_only: true
logging:
loglevel: INFO
logfile: "/data/curator/action.log"
logformat: default
action:
# cat action_file.yml
---
actions:
1:
action: delete_indices
description: >-
Delete indices older than 7 days (based on index name), for logstash-
prefixed indices. Ignore the error if the filter does not result in an
actionable list of indices (ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: regex
value: '^apm-6.5.1-transaction-|^apm-6.5.1-span-'
exclude:
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 15
exclude:
2:
action: delete_indices
description: >-
Delete indices older than 7 days (based on index name), for logstash-
prefixed indices. Ignore the error if the filter does not result in an
actionable list of indices (ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: False
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: loadbalance-api-
exclude:
- filtertype: age
source: name
direction: older
timestring: '%Y-%m-%d'
unit: days
unit_count: 20
exclude:
--- actions: 1: action: delete_indices description: >- Delete indices older than 7 days (based on index name), for logstash- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly. options: ignore_empty_list: True timeout_override: continue_if_exception: False disable_action: False filters: - filtertype: pattern kind: regex value: 'fluentd-k8s-(2019.02.11|2019.02.12)$' exclude: true - filtertype: pattern kind: prefix value: fluentd-k8s- exclude: - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 15 exclude:
可以设置多个action,每个都以不同的数字分割,使用不同的清理策略,具体可以参考https://www.elastic.co/guide/en/elasticsearch/client/curator/5.6/actions.html
注意自己的index的格式,比如我这里的时间格式有两种:
注意匹配,否则那个action就返回空列表,从而不会删除。
这个历史数据重要的会先落地到hdfs,然后在删除。这个日期根据自己服务器的磁盘和日志的重要性自己规划。重要的比如双11的数据不想删除,想留下来可以写到exclude里面,
或者做一个snapshot备份。接下来设置一个定时任务去删除就好了。
crontab -e
* * */25 * * curator --config /etc/curator/config_file.yml /etc/curator/action_file.yml
2、使用脚本删除
# cat es-dele-indices.sh
#!/bin/bash
#delete elasticsearch indices
searchIndex=fluentd-k8s
elastic_url=127.0.0.1
elastic_port=9200
date2stamp(){
date --utc --date "$1" +%s
}
dateDiff(){
case $1 in
-s) sec=1; shift;;
-m) sec=60; shift;;
-h) sec=3600; shift;;
-d) sec=86400; shift;;
*) sec=86400; shift;;
esac
dte1=$(date2stamp $1)
dte2=$(date2stamp $2)
diffSec=$((dte2-dte1))
if ((diffSec < 0)); then abs=-1; else abs=1; fi
echo $((diffSec/sec*abs))
}
for index in $(curl -s "${elastic_url}:${elastic_port}/_cat/indices?v" | grep -E " ${searchIndex}-20[0-9][0-9]\.[0-1][0-9]\.[0-3][0-9]" | awk '{ print $3 }');do
date=$(echo ${index: -10}|sed 's/\./-/g')
cond=$(date +%Y-%m-%d)
diff=$(dateDiff -d $date $cond)
echo -n "${index} (${diff})"
if [ $diff -gt 1 ]; then
#echo "/ DELETE"
curl -XDELETE "${elastic_url}:${elastic_port}/${index}?pretty"
else
echo ""
fi
done