ElasticSearch批量重建索引

ElasticSearch批量重建索引

ver 7.16.2

ES的设计目标是针对检索的, 对修改的支持不够好, 特别是对数据结构的修改, 和关系数据库不一样, 修改数据结构后, 索引的历史数据不会改变, 如果需要历史数据也应用修改后的结构和设置, 需要重建索引(Reindex).

重建索引的一般流程为:

  • 基于旧索引index_old结构修改部分配置后生成新索引index_new
  • 将数据通过reindexindex_old同步到index_new
  • 删除index_old
  • index_new一个别名index_old, 程序仍然直接使用index_old操作索引

Kibana Dev Tools

当索引数量很少时, 可以采用kibana的devtool, 可视化操作, 对用户友好.

# 1.获取源索引结构, 备份或者作为新索引的依据
GET my_index

# 2.根据源索引结构做修改后, 新建索引, 建议同步数据之前关闭新索引的刷新和副本
PUT my_index_alias
{
  "mappings": {
    //...new mappings
   },
  "settings": {
    //...new settings
    "index": {
        // 新索引关闭刷新和副本, 提高后续的写入效率
        "refresh_interval": "-1",
        "number_of_replicas": "0"
    }
  }
}

# 3.异步执行重建并同步数据到新索引
# slices多分片并行, 增大size提高每批处理条数, proceed忽略冲突数据(理论上新索引不存在冲突数据)
POST _reindex?slices=auto&wait_for_completion=false
{
  "source": {
    "index": "my_index",
    "size": 5000
  },
  "dest": {
    "index": "my_index_alias",
    "op_type": "create"
  },
  "conflicts": "proceed"
}

# 4.查看task进度
## 使用上一步返回的ID查询 
GET /_tasks/CzIa7FVORqu6sRH1U0LUMw:2873350476
## 查询所有重建索引的任务
GET _tasks?detailed=true&actions=*reindex

# 5.索引重建完成后删除旧索引
DELETE my_index

# 6.新索引别名为旧索引
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "my_index_alias",
        "alias": "my_index"
      }
    }
  ]
}

# 7.新索引启用副本和刷新
PUT my_index_alias/_settings
{
  "index": {
    "refresh_interval": "1s",
    "number_of_replicas": "1"
  }
}

脚本批量重建

一般生产环境下, 同一类型的索引会按天或者按类型分成多个索引, 方便运维, 但这样会为重建索引带来不便.
这种场景下可以使用下列脚本, 通过循环传入索引名达到批量重建索引的目的.

#!/bin/bash

es_url=http://ip:port
username=username
password=password
index_old_name=需重建的索引名
index_new_name=${index_old_name}_new


function error_exit() {
echo -e "\e[31m 操作失败 \e[0m";
exit 1
}

set -e
script_path=$(cd `dirname $0`; pwd)
index_dest_config=""
one_line_config=""
if [[ -f "${script_path}/config.json" ]];then
	echo "0.发现新索引配置: ${script_path}/config.json"
	index_dest_config=$(cat ${script_path}/config.json)
	echo $index_dest_config
	one_line_config=$(echo $index_dest_config)
else
	echo "0.未发现新索引配置, 根据旧索引为你生成了一份配置: $script_path/config.json, 请修改."
	index_dest_config=$(curl -ks -u ${username}:${password} -X GET "${es_url}/${index_old_name}?pretty" > config.json)
	exit 0
fi

echo "1.新建索引..."
result=$(curl -ks -u ${username}:${password} -X PUT -H "Content-Type: application/json" "${es_url}/${index_new_name}" -d "@${script_path}/config.json") 
echo $result
echo "$result" | grep '"acknowledged":true' || error_exit

echo "2.查询新索引详情并保存初始的副本数和刷新间隔, 方便后面恢复..."
result=$(curl -ks -u ${username}:${password} -X GET "${es_url}/${index_new_name}?pretty")
echo $result
echo $result | grep -v 'error' || error_exit

duplicate=$(echo "$result" | grep number_of_replicas | sed 's/,//g')
refresh=$(echo "$result" | grep refresh_interval | sed 's/,//g')
echo -e "${duplicate}\n${refresh}"

echo "3.重建索引之前关闭刷新和副本, 优化新索引的写入速度, 从而提高索引重建速度..."
result=$(curl -ks -u ${username}:${password} -X PUT -H "Content-Type: application/json" "${es_url}/${index_new_name}/_settings" -d \
'{"index": {"refresh_interval": "-1","number_of_replicas": "0"}}')
echo $result
echo $result | grep '"acknowledged":true' || error_exit



echo "4.开始重建索引..."
task=`curl -ks -u ${username}:${password} -X POST -H "Content-Type: application/json" "${es_url}/_reindex?slices=auto&wait_for_completion=false" -d \
'
{
  "source": {
    "index": "'${index_old_name}'",
    "size": 5000
  },
  "dest": {
    "index": "'${index_new_name}'",
    "op_type": "create"
  },
  "conflicts": "proceed"
}
'`

echo "$task"
echo "$task" | grep '"task":' || error_exit
task_id=`echo "$task" | awk -F '"' '{print $4}'`
echo "task_id=$task_id"


while [ 1 ]
do
      sleep 1
      task_status=$(curl -ks -u ${username}:${password} -X GET "${es_url}/_tasks/${task_id}?pretty")
	  echo "$task_status"
	  if [[ -n $(echo "$task_status" | grep complete | grep true) ]];then
		echo "$index_old_name -> $index_new_name 索引重建完成."
		break
	  fi
	  echo "$index_old_name -> $index_new_name 重建中..."
done


echo "5.删除旧索引"
result=$(curl -ks -u ${username}:${password} -X DELETE -H "Content-Type: application/json" "${es_url}/${index_old_name}") 
echo $result
echo $result | grep '"acknowledged":true' || error_exit


echo "6.新索引使用旧索引别名"
result=$(curl -ks -u ${username}:${password} -X POST -H "Content-Type: application/json" "${es_url}/_aliases" -d \
'{
  "actions": [
    {
      "add": {
        "index": "'${index_new_name}'",
        "alias": "'${index_old_name}'"
      }
    }
  ]
}'
) 
echo $result
echo $result | grep '"acknowledged":true' || error_exit

echo "7.恢复副本和刷新设置"
result=$(curl -ks -u ${username}:${password} -X PUT -H "Content-Type: application/json" "${es_url}/${index_new_name}/_settings" -d \
'{"index": {'${duplicate}','${refresh}'}}')
echo $result
echo $result | grep '"acknowledged":true' || error_exit
posted @   coder_klong  阅读(503)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
点击右上角即可分享
微信分享提示