elasticsearch 学习笔记之一 elasticdump

一、背景

对于数据量较小的数据进行备份还原，可以使用elasticdump工具。

数据量较小的情况：由于Elasticdump的工作方式是每次导入导出100条数据，因此它更适合数据量不是特别大的情况。对于大批量数据迁移，可能需要考虑其他的工具或方法。
迁移索引个数不多的情况：Elasticdump适合迁移索引个数不多的场景。每个索引的分片数量和副本数量需要单独进行迁移，或者在目标集群中提前创建好索引，然后再进行数据迁移。
无需跨集群配置的情况：相比于reindex跨集群操作，Elasticdump无需在ES集群的配置文件elasticsearch.yml中设置授权迁移访问地址（白名单）。

二、基本用法

1、支持的使用功能

可以一次性备份所有索引下的所有文档数据（--type=data）且可以还原所有文档数据且支持重复还原，备份数据的时候会碰到：scroll context达到最大值500导致备份失败。因此大数据量的备份不建议用该工具；还原的时候耗时也比较久。
可以一次性备份所有索引（--type=index），但不支持通过一个命令还原所有索引
可以通过通配符备份所有索引下的所有文档数据（--type=data），并一次性还原
可以通过通配符备份符合要求的索引（--type=index），但不支持通过通配符命令还原备份下的所有索引
可以备份单个索引下的所有文档数据（--type=data）并进行还原且支持重复还原
可以备份单个索引（--type=index），但是还原的时候必须将该索引删除才能还原

2、弊端

不支持一次性备份索引的所有数据，包括 index、data。对于一个索引只能分开备份 index 和 data；
索引index的数据（settting\mapping）要想还原，则必须先删除已经存在的索引，否则还原失败。目前单个mapping类型的数据还原也存在问题；
通过通配符备份下来的索引数据index，必须要一个一个指定索引进行还原，无法再次通过通配符的形式进行还原；
通配符和全量备份下来的index 或 data，只能放在一个文件中，不能按照索引单个文件存储；
全量文档数据备份的时候，会碰到scroll context达到最大值500导致备份失败的问题；
如果想同时备份索引A和索引B的index或者data，目前通过一个命令无法同时备份A和B；

3、语法

elasticdump --input SOURCE --output DESTINATION [OPTIONS]

核心选项
选项名	是否必配置	用法	说明	示例
--input	必配	--input=http://es用户名:es密码@ip:port 或 --input=http://es用户名:es密码@ip:port/索引或 --input=/opt/xxx.json	源地址没有配置索引的情况下表示将所有索引导出配置索引的情况下表示将指定索引指定数据导出	elasticdump --input=http://elastic:elasticsearch_422@10.19.223.119:9200 elasticdump --input=http://elastic:elasticsearch_422@10.19.223.119:9200/iotrm_event_event_acs\$0x00030403_2024-10-29
--input-index	选配	--input-index=all --input-index=index/type	未配置的情况下默认为all
--output	必配	--output=http://es用户名:es密码@ip:port 或 --output=http://es用户名:es密码@ip:port/索引或 --output=/opt/xxx.json	目标地址输入源是包含多个索引（type=index）的文件或地址时，output无法通过 http://es用户名:es密码@ip:port 这个形式还原或者通过通配符的形式还原 type=index数据，只能通过http://es用户名:es密码@ip:port/索引指定索引的方式一个一个索引数据还原	elasticdump --output=http://elastic:elasticsearch_422@10.19.223.119:9200 elasticdump --output=http://elastic:elasticsearch_422@10.19.223.119:9200/iotrm_event_event_acs\$0x00030403_2024-10-29
--output-index	选配	--output-index=all --output-index=index/type	未配置的情况下默认为all
可选选项
选项名	是否必配置	用法	说明	示例
--type	否	--type=data type 可选值为：index, settings, analyzer, data, mapping, policy, alias, template, component_template, index_template	--type 未配置的情况下，默认是data type值只能指定一个，因此不能通过一个命令导出多个数据，比如同时导出数据和索引定义 index的时候会把settings、mapping、alias的数据导出 What are we exporting? (default: data, options: [index, settings, analyzer, data, mapping, policy, alias, template, com ponent_template, index_template])
--big-int-fields	否		Specifies a comma-seperated list of fields that should be checked for big-int support (default '')
--bulkAction	否		Sets the operation type to be used when preparing the request body to be sent to elastic search. For more info - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html (default: index, options: [index, update, delete, create)
--ca, --input-ca, --output-ca			CA certificate. Use --ca if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--cert, --input-cert, --output-cert			Client certificate file. Use --cert if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--csvConfigs			Set all fast-csv configurations A escaped JSON string or file can be supplied. File location must be prefixed with the @ symbol (default: null)
--csvCustomHeaders			A comma-seperated listed of values that will be used as headers for your data. This param must be used in conjunction with `csvRenameHeaders` (default : null)
--csvDelimiter			The delimiter that will separate columns. (default : ',')
--csvFirstRowAsHeaders			If set to true the first row will be treated as the headers. (default : true)
--csvHandleNestedData			Set to true to handle nested JSON/CSV data. NB : This is a very opinionated implementaton ! (default : false)
--csvIdColumn			Name of the column to extract the record identifier (id) from When exporting to CSV this column can be used to override the default id (@id) column name (default : null)
--csvIgnoreAutoColumns			Set to true to prevent the following columns @id, @index, @type from being written to the output file (default : false)
--csvIgnoreEmpty			Set to true to ignore empty rows. (default : false)
--csvIncludeEndRowDelimiter			Set to true to include a row delimiter at the end of the csv (default : false)
--csvIndexColumn			Name of the column to extract the record index from When exporting to CSV this column can be used to override the default index (@index) column name (default : null)
--csvLTrim			Set to true to left trim all columns. (default : false)
--csvMaxRows			If number is > 0 then only the specified number of rows will be parsed.(e.g. 100 would return th e first 100 rows of data) (default : 0)
--csvRTrim			Set to true to right trim all columns. (default : false)
--csvRenameHeaders			If you want the first line of the file to be removed and replaced by the one provided in the `cs vCustomHeaders` option (default : true)
--csvSkipLines			If number is > 0 the specified number of lines will be skipped. (default : 0)
--csvSkipRows			If number is > 0 then the specified number of parsed rows will be skipped NB: (If the first row is treated as headers, they aren't a part of the count) (default : 0)
--csvTrim			Set to true to trim all white space from columns. (default : false)
--csvTypeColumn			Name of the column to extract the record type from When exporting to CSV this column can be used to override the default type (@type) column name (default : null)
--csvWriteHeaders			Determines if headers should be written to the csv file. (default : true)
--customBackoff			Activate custom customBackoff function. (s3)
--debug			Display the elasticsearch commands being used (default: false)
--delete			Delete documents one-by-one from the input as they are moved. Will not delete the source index (default: false)
--delete-with-routing			Passes the routing query-param to the delete function used to route operations to a specific shard. (default: false)
--esCompress			if true, add an Accept-Encoding header to request compressed content encodings from the server ( if not already present) and decode supported content encodings in the response. Note: Automatic decoding of the response content is performed on the body data returned through request (both through the request stream and passed to the callback function) but is not performed on th e response stream (available from the response event) which is the unmodified http.IncomingMessage object which ma y contain compressed data. See example below.
--fileSize			supports file splitting. This value must be a string supported by the bytes module. The following abbreviations must be used to signify size in terms of units b for bytes kb for kilobytes mb for megabytes gb for gigabytes tb for terabytes e.g. 10mb / 1gb / 1tb Partitioning helps to alleviate overflow/out of memory exceptions by efficiently segmenting file s into smaller chunks that then can be merged if needs be.
--filterSystemTemplates			Whether to remove metrics-- and logs-- system templates (default: true])
--force-os-version			Forces the OpenSearch version used by elasticsearch-dump. (default: 7.10.2)
--fsCompress			gzip data before sending output to file. On import the command is used to inflate a gzipped file
--compressionLevel			The level of zlib compression to apply to responses. defaults to zlib.Z_DEFAULT_COMPRESSION
--handleVersion			Tells elastisearch transport to handle the `_version` field if present in the dataset (default : false)
--headers			Add custom headers to Elastisearch requests (helpful when your Elasticsearch instance sits behind a proxy) (default: '{"User-Agent": "elasticdump"}') Type/direction based headers are supported .i.e. input-headers/output-headers (these will only be added based on the current flow type input/output)
--help			This page
--ignore-errors			Will continue the read/write loop on write error (default: false)
--ignore-es-write-errors			Will continue the read/write loop on a write error from elasticsearch (default: true)
--inputSocksPort, --outputSocksPort			Socks5 host port
--inputSocksProxy, --outputSocksProxy			Socks5 host address
--inputTransport			Provide a custom js file to use as the input transport
--key, --input-key, --output-key			Private key file. Use --key if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--limit			How many objects to move in batch per operation limit is approximate for file streams (default: 100)
--maxRows			supports file splitting. Files are split by the number of rows specified
--maxSockets			How many simultaneous HTTP requests can the process make? (default: 5 [node <= v0.10.x] / Infinity [node >= v0.11.x] )
--noRefresh			Disable input index refresh. Positive: 1. Much increased index speed 2. Much less hardware requirements Negative: 1. Recently added data may not be indexed Recommended using with big data indexing, where speed and system health is a higher priority than recently added data.
--offset			Integer containing the number of rows you wish to skip ahead from the input transport. When importing a large index, things can go wrong, be it connectivity, crashes, someone forgets to `screen`, etc. This allows you to start the dump again from the last known line written (as logged by the `offset` in the output). Please be advised that since no sorting is specified when the dump is initially created, there's no real way to guarantee that the skipped rows have already been written/parsed. This is more of an option for when you want to get as much data as possible in the index without concern for losing some rows in the process, similar to the `timeout` option. (default: 0)
--outputTransport			Provide a custom js file to use as the output transport
--overwrite			Overwrite output file if it exists (default: false)
--params			Add custom parameters to Elastisearch requests uri. Helpful when you for example want to use elasticsearch preference --input-params is a specific params extension that can be used when fetching data with the scrol l api --output-params is a specific params extension that can be used when indexing data with the bulk index api NB : These were added to avoid param pollution problems which occur when an input param is used in an output source (default: null)
--parseExtraFields			Comma-separated list of meta-fields to be parsed
--pass, --input-pass, --output-pass			Pass phrase for the private key. Use --pass if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--quiet			Suppress all messages except for errors (default: false)
--retryAttempts			Integer indicating the number of times a request should be automatically re-attempted before fai ling when a connection fails with one of the following errors `ECONNRESET`, `ENOTFOUND`, `ESOCKETTIME DOUT`, ETIMEDOUT`, `ECONNREFUSED`, `EHOSTUNREACH`, `EPIPE`, `EAI_AGAIN` (default: 0)
--retryDelay			Integer indicating the back-off/break period between retry attempts (milliseconds) (default : 5000)
--retryDelayBase			The base number of milliseconds to use in the exponential backoff for operation retries.
--scroll-with-post			Use a HTTP POST method to perform scrolling instead of the default GET (default: false)
--scrollId			The last scroll Id returned from elasticsearch. This will allow dumps to be resumed used the last scroll Id & `scrollTime` has not expired.
--scrollTime			Time the nodes will hold the requested search in order. (default: 10m)
--searchBody			Preform a partial extract based on search results when ES is the input, default values are if ES > 5 `'{"query": { "match_all": {} }, "stored_fields": [""], "_source": true }'` else `'{"query": { "match_all": {} }, "fields": [""], "_source": true }'` [As of 6.68.0] If the searchBody is preceded by a @ symbol, elasticdump will perform a file look up in the location specified. NB: File must contain valid JSON
--searchBodyTemplate			A method/function which can be called to the searchBody doc.searchBody = { query: { match_all: {} }, stored_fields: [], _source: true }; May be used multiple times. Additionally, searchBodyTemplate may be performed by a module. See [searchBody Template](#search -template) below.
--searchWithTemplate			Enable to use Search Template when using --searchBody If using Search Template then searchBody has to consist of "id" field and "params" objects If "size" field is defined within Search Template, it will be overridden by --size parameter See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html for further information (default: false)
--size			How many objects to retrieve (default: -1 -> no limit)
--skip-existing			Skips resource_already_exists_exception when enabled and exit with success (default: false)
--sourceOnly			Output only the json contained within the document _source Normal: {"_index":"","_type":"","_id":"", "_source":{SOURCE}} sourceOnly: {SOURCE} (default: false)
--support-big-int			Support big integer numbers
--templateRegex			Regex used to filter templates before passing to the output transport (default: ((metrics\|logs\|\..+)(-.+)?)
--timeout			Integer containing the number of milliseconds to wait for a request to respond before aborting the request. Passed directly to the request library. Mostly used when you don't care too much if you lose some data when importing but would rather have speed.
--tlsAuth			Enable TLS X509 client authentication
--toLog			When using a custom outputTransport, should log lines be appended to the output stream? (default: true, except for `$`)
--transform			A method/function which can be called to modify documents before writing to a destination. A global variable 'doc' is available. Example script for computing a new field 'f2' as doubled value of field 'f1': doc._source["f2"] = doc._source.f1 * 2; May be used multiple times. Additionally, transform may be performed by a module. See [Module Transform](#module-transform) below.
--versionType			Elasticsearch versioning types. Should be `internal`, `external`, `external_gte`, `force`. NB : Type validation is handled by the bulk endpoint and not by elasticsearch-dump
AWS独有配置项
--awsAccessKeyId and --awsSecretAccessKey			When using Amazon Elasticsearch Service protected by AWS Identity and Access Management (IAM), provide your Access Key ID and Secret Access Key. --sessionToken can also be optionally provided if using temporary credentials
--awsChain			Use [standard](https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-creden tials-in-the-aws-sdks/) location and ordering for resolving credentials including environment variables, config files, EC2 and ECS metadata locations _Recommended option for use with AWS_
--awsIniFileName			Override the default aws ini file name when using --awsIniFileProfile Filename is relative to ~/.aws/ (default: config)
--awsIniFileProfile			Alternative to --awsAccessKeyId and --awsSecretAccessKey, loads credentials from a specified profile in aws ini file. For greater flexibility, consider using --awsChain and setting AWS_PROFILE and AWS_CONFIG_FILE environment variables to override defaults if needed
--awsRegion			Sets the AWS region that the signature will be generated for (default: calculated from hostname or host)
--awsService			Sets the AWS service that the signature will be generated for (default: calculated from hostname or host)
--awsUrlRegex			Overrides the default regular expression that is used to validate AWS urls that should be signed (default: ^https?:\/\/.\.amazonaws\.com.$)
--s3ACL			S3 ACL: private \| public-read \| public-read-write \| authenticated-read \| aws-exec-read \| bucket-owner-read \| bucket-owner-full-control [default private]
--s3AccessKeyId			AWS access key ID
--s3Compress			gzip data before sending to s3
--s3Configs			Set all s3 constructor configurations A escaped JSON string or file can be supplied. File location must be prefixed with the @ symbol (default: null)
--s3Endpoint			AWS endpoint that can be used for AWS compatible backends such as OpenStack Swift and OpenStack Ceph
--s3ForcePathStyle			Force path style URLs for S3 objects [default false]
--s3Options			Set all s3 parameters shown here https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html #createMultipartUpload-property A escaped JSON string or file can be supplied. File location must be prefixed with the @ symbol (default: null)
--s3Region			AWS region
--s3SSEKMSKeyId			KMS Id to be used with aws:kms uploads
--s3SSLEnabled			Use SSL to connect to AWS [default true]
--s3SecretAccessKey			AWS secret access key
--s3ServerSideEncryption			Enables encrypted uploads
--s3StorageClass			Set the Storage Class used for s3 (default: STANDARD)

4、简单示例

4.1 Backup and index to a gzip using stdout

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz

4.2 Backup the results of a query to a file

elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody "{\"query\":{\"term\":{\"username\": \"admin\"}}}"

4.3 Import data from S3 into ES (using s3urls)

elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \  
--input "s3://${bucket_name}/${file_name}.json" \  
--output=http://production.es.com:9200/my_index

4.4 Export ES data to S3 (using s3urls)

elasticdump \
  --s3AccessKeyId "${access_key_id}" \
  --s3SecretAccessKey "${access_key_secret}" \
  --input=http://production.es.com:9200/my_index \
  --output "s3://${bucket_name}/${file_name}.json"

4.5 Import data from MINIO (s3 compatible) into ES (using s3urls)

elasticdump \
  --s3AccessKeyId "${access_key_id}" \
  --s3SecretAccessKey "${access_key_secret}" \
  --input "s3://${bucket_name}/${file_name}.json" \
  --output=http://production.es.com:9200/my_index
  --s3ForcePathStyle true
  --s3Endpoint https://production.minio.co

4.6 Export ES data to MINIO (s3 compatible) (using s3urls)

elasticdump \
  --s3AccessKeyId "${access_key_id}" \
  --s3SecretAccessKey "${access_key_secret}" \
  --input=http://production.es.com:9200/my_index \
  --output "s3://${bucket_name}/${file_name}.json"
  --s3ForcePathStyle true
  --s3Endpoint https://production.minio.co

4.7 Import data from CSV file into ES (using csvurls)

elasticdump \
 # csv:// prefix must be included to allow parsing of csv files

  # --input "csv://${file_path}.csv" \

  --input "csv:///data/cars.csv"
  --output=http://production.es.com:9200/my_index \
  --csvSkipRows 1    # used to skip parsed rows (this does not include the headers row)
  --csvDelimiter ";" # default csvDelimiter is ','

4.8 Backup the results of a query to a file

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=query.json \
  --searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}"

4.9 Specify searchBody from a file

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=query.json \
  --searchBody=@/data/searchbody.json

4.10 Copy a single shard data

elasticdump \
  --input=http://es.com:9200/api \
  --output=http://es.com:9200/api2 \
  --input-params="{\"preference\":\"_shards:0\"}"

Learn more @ https://github.com/taskrabbit/elasticsearch-dump

三、安装

elasticdump 源码地址：https://github.com/elasticsearch-dump/elasticsearch-dump

1、安装 Nodejs

进入网站 https://nodejs.org/zh-cn/download/prebuilt-binaries 下载
选择linux x86 下载二进制成果物包（这个根据实际情况自行选择）

远程到 linux 服务器上，将下载的 node-v16.20.2-linux-x64.tar.xz node压缩包上传到 /opt/tmp上
cd /opt/tmp
tar -xf node-v16.20.2-linux-x64.tar.xz

将环境变量设置到当前的远程 shell 上，如果要永久生效，可以直接更改环境变量文件（在 /etc/profile 文件中添加环境变量，再使用 source /etc/profile 进行生效）
- export NODE_HOME=export NODE_HOME=/opt/tmp/node-v16.20.2-linux-x64
- export PATH=export PATH=$PATH:$NODE_HOME/bin
- 检查是否生效
  - node --version

- - npm --version

2、安装 npm

npm 已经集成在了nodejs 二进制成果物里面了，按照第1 章节去安装nodejs 的过程中，实际上已经将npm 安装好了，通过执行命令 npm --version 就可以看到安装的npm 版本号
npm github 地址为：https://github.com/npm/cli/releases/tag/v10.9.0

参考文章：

https://blog.csdn.net/qq_32894641/article/details/136388906

3、离线安装 elasticdump

如果 elasticdump 运行的服务器是无法连接网络的，此时需要先找一台可以连接上网络的服务器先在线安装 elasticdump ,然后通过 npm-pack-all 对其进行打包，再将打包后的 elasticdump 放置到目标服务器上，再通过 npm 安装。

在线安装 elasticdump 所在服务器也必须先按照 1、2 章节内容安装 nodejs 以及 npm.

3.1、安装 npm-pack-all（要求linux服务器能够连接上网络）

npm-pack-all是一个Node.js的工具，用于将项目中的依赖项打包成tgz（tarball）文件。这些tgz文件是npm包的分发格式，可以被其他开发者下载并安装到他们的项目中。

打包项目依赖：npm-pack-all可以遍历项目的node_modules目录，将其中的每个依赖项单独打包成一个tgz文件。这样，你就可以将这些文件分享给其他开发者，或者将它们上传到npm仓库中供其他人使用。
忽略指定依赖：npm-pack-all允许你指定要忽略的依赖项。这对于那些你不想分享或上传的依赖项非常有用。
自定义输出目录：你可以使用npm-pack-all将打包后的tgz文件输出到指定的目录，方便你进行管理和分发。

安装（前提条件 npm 已经安装）

设置镜像仓库

npm config set registry https://registry.npm.taobao.org

下载依赖有报错的话执行下这个

npm config set strict-ssl false

npm install -g npm-pack-all #全局安装 npm-pack-all
npm bin -g # 查看npm/node安装位置

3.2、通过 npm 在线安装 elasticdump

npm install -g elasticdump

该命令会安装最新的elasticdump版本

查看安装位置
- npm bin -g (默认是安装在 nodejs 的 bin 目录下)
- ll nodejs安装目录/bin，可以看到安装好的 elasticdump 工具，对应的实际路径在 nodejs目录/lib/node_modules/elasticdump 目录下

通过 npm-pack-all 打包 elasticdump 安装后的成果物
- cd nodejs安装目录/lib/node_modules/elasticdump
- npm-pack-all

可以看到打包后的压缩名为：elasticdump-6.114.0.tgz。在 nodejs安装目录/lib/node_modules/elasticdump 目录下

3.3 将elasticdump 安装到离线服务器上

将 elasticdump-6.114.0.tgz 上传到离线服务器上 /opt/tmp/node-v16.20.2-linux-x64/lib/ 下，lib 下存在目录 node_modules
cd /opt/tmp/node-v16.20.2-linux-x64/lib/
npm install /opt/tmp/elasticdump-6.114.0.tgz

npm root #查找 node_modules 所在位置。显然本次操作，node_moudles 就是在 /opt/tmp/node-v16.20.2-linux-x64/lib 下面

elasticdump 在 node_modules/elasticdump 下
设置环境变量
- export ELASTIC_DUMP_HOME=/opt/tmp/node-v16.20.2-linux-x64/lib/node_modules/elasticdump
- export PATH=$PATH:$ELASTIC_DUMP_HOME/bin
elasticdump --help #检查是否安装好可使用

四、实战

1、对所有索引下的数据进行备份

elasticdump --overwrite --input=http://elastic:elasticsearch_422@10.19.223.119:9200/ --output=/opt/tmp/wf/data/es_dump_all/data_all.json --limit=10000 --- --type参数未指定的时候，默认是data

示例显示导出了 2685006条数据

注意：使用该命令导出所有数据会遇到该问题 ---- 创建太多的 scroll contexts ,默认为500

虽然使用了scroll 分页方式进行了查询，但是分页查询太多，导致超过上限。。需要更改 search.max_open_scroll_context 配置值，默认是500

因此不建议使用该工具查询大数据量的数据。

2、对所有索引下的数据进行还原

elasticdump --overwrite --output=http://elastic:TZQSHMX8zPOaezaT@10.19.214.13:9200/ --input=/opt/tmp/wf/data/es_dump_all/data_all.json --limit=10000 --- --type参数未指定的时候，默认是data

示例显示还原了 2685005 数据耗费了37分钟左右

3、指定索引下的数据进行备份

导出备份 iotrm_event_event_acs$0x00030403_2024-10-29 索引下的所有数据，注意：这里$，在命令行上转为\$

elasticdump --overwrite --input=http://elastic:elasticsearch_422@10.19.223.119:9200/iotrm_event_event_acs\$0x00030403_2024-10-29 --output=/opt/tmp/wf/data/data_iotrm_event_event_acs\$0x00030403_2024-10-29.json --limit=10000 （--type不指定的情况下，默认导出data数据）

这里导出了 79684条数据

4、指定索引下的数据进行还原

将备份文件中对应的 iotrm_event_event_acs$0x00030403_2024-10-29 索引数据导入到elasticsearch软件中

elasticdump --input=/opt/tmp/wf/data/data_iotrm_event_event_acs\$0x00030403_2024-10-29.json --output=http://elastic:TZQSHMX8zPOaezaT@10.19.214.13:9200/data_iotrm_event_event_acs\$0x00030403_2024-10-29 --limit=10000 ---- 将iotrm_event_event_acs\$0x00030403_2024-10-29索引下的数据还原到另外一个不存在的索引data_iotrm_event_event_acs\$0x00030403_2024-10-29，此时会自动创建这个不存在的索引

使用elastichead可以看到导入的数据 79684 条

对该索引下的数据增删改数据之后（修改了oxcqTJMBXFMQjG9sdTdP 数据，删除了ID为SBcoTJMBXFMQjG9sdTZM/的数据，新增 docId 为1的数据），然后再导入，看下效果

由此可以得出结论一：如果只是数据的导出导入，那么在导出-导入期间变更的数据只有删改的数据会被还原掉，新增的数据仍然是在的，不会自动删除。

5、备份指定索引

6、还原指定索引

elasticdump --input=/opt/tmp/wf/data/data_iotrm_event_event_acs\$0x00030403_2024-10-29_index.json --output=http://elastic:TZQSHMX8zPOaezaT@10.19.214.13:9200/data_iotrm_event_event_acs\$0x00030403_2024-10-29 --limit=10000 --type=index （包含？）

还原从索引 iotrm_event_event_acs\$0x00030403_2024-10-29 导出的数据文件 data_iotrm_event_event_acs\$0x00030403_2024-10-29.json，在不指定索引名称的情况下，会发现被还原到了原始索引上（iotrm_event_event_acs\$0x00030403_2024-10-29），上述还原的新索引data_iotrm_event_event_acs\$0x00030403_2024-10-29无数据

还原从索引 iotrm_event_event_acs$0x00030403_2024-10-29 导出的数据文件 data_iotrm_event_event_acs$0x00030403_2024-10-29.json，将数据导入到索引 data_iotrm_event_event_acs$0x00030403_2024-10-29 中

7、指定多个索引数据以及索引进行备份

1）导出多个索引数据

elasticdump --overwrite --input=http://elastic:elasticsearch_422@10.19.223.119:9200/iotrm_event_event_acs* --output=/opt/tmp/wf/data/data_iotrm_event_event_acs_all.json --limit=10000 --type=data

这里将 input 中的指定索引部分使用了通配符的形式，表示导出iotrm_event_event_acs开头的所有索引；
导出的数据类型通过type指定了为data;

2）导出多个索引定义

elasticdump --overwrite --input=http://elastic:elasticsearch_422@10.19.223.119:9200/iotrm_event_event_acs* --output=/opt/tmp/wf/data/data_iotrm_event_event_acs_all.json --limit=10000 --type=index

8、还原多个索引数据以及索引

1）删除 iotrm_event_event_acs 开头的部分索引

http://10.19.214.13:9200/data_iotrm_event_event_acsxxx?pretty

2）还原索引

不指定索引的情况下进行索引还原似乎不可行

3）还原数据

elasticdump --output=http://elastic:TZQSHMX8zPOaezaT@10.19.214.13:9200 --input=/opt/tmp/wf/data/data_iotrm_event_event_acs_all_data.json --limit=10000 --type=data

9、备份所有索引

elasticdump --overwrite --input=http://elastic:elasticsearch_422@10.19.223.119:9200 --output=/opt/tmp/wf/data/all_index.json --limit=10000 --type=index

10、还原所有索引

elasticdump --output=http://elastic:TZQSHMX8zPOaezaT@10.19.214.13:9200 --input=/opt/tmp/wf/data/all_index.json --limit=10000 --type=index （不支持通过一个命令批量还原索引）

目前只能单个索引备份，然后从单个索引备份文件中还原索引，且要求该索引已经被删除

图上显示：索引存在的情况下，是还原索引失败的。

第二次操作成功是因为通过postman 将索引删除了。

五、参考资料

1、GitHub - elasticsearch-dump/elasticsearch-dump: Import and export tools for elasticsearch & opensearch

2、离线部署elasticdump_elasticdump离线安装-CSDN博客

posted @ 2024-12-06 16:28 夏之夜阅读(39) 评论(0) 编辑收藏举报

刷新页面返回顶部

夏之夜