esm数据迁移
一、参考
二、安装运行
2.1 下载
安装运行
macOS 下载 darwin64.tar.gz
mkdir -p /Users/yz/work/github/esm/
cd /Users/yz/work/github/esm/
tar -zxvf darwin64.tar.gz
cd /Users/yz/work/github/esm/bin/darwin64
./esm --help
三、使用场景
3.1 命令参数
Usage:
esm [OPTIONS]
Application Options:
-s, --source= source elasticsearch instance, ie: http://localhost:9200
-q, --query= query against source elasticsearch instance, filter data before migrate, ie: name:medcl
-d, --dest= destination elasticsearch instance, ie: http://localhost:9201
-m, --source_auth= basic auth of source elasticsearch instance, ie: user:pass
-n, --dest_auth= basic auth of target elasticsearch instance, ie: user:pass
-c, --count= number of documents at a time: ie "size" in the scroll request (10000)
--buffer_count= number of buffered documents in memory (1000000)
-w, --workers= concurrency number for bulk workers (1)
-b, --bulk_size= bulk size in MB (5)
-t, --time= scroll time (10m)
--sliced_scroll_size= size of sliced scroll, to make it work, the size should be > 1 (1)
-f, --force delete destination index before copying
-a, --all copy indexes starting with . and _
--copy_settings copy index settings from source
--copy_mappings copy index mappings from source
--shards= set a number of shards on newly created indexes
-x, --src_indexes= indexes name to copy,support regex and comma separated list (_all)
-y, --dest_index= indexes name to save, allow only one indexname, original indexname will be used if not
specified
-u, --type_override= override type name
--green wait for both hosts cluster status to be green before dump. otherwise yellow is okay
-v, --log= setting log level,options:trace,debug,info,warn,error (INFO)
-o, --output_file= output documents of source index into local file
-i, --input_file= indexing from local dump file
--input_file_type= the data type of input file, options: dump, json_line, json_array, log_line (dump)
--source_proxy= set proxy to source http connections, ie: http://127.0.0.1:8080
--dest_proxy= set proxy to target http connections, ie: http://127.0.0.1:8080
--refresh refresh after migration finished
--fields= filter source fields, comma separated, ie: col1,col2,col3,...
--rename= rename source fields, comma separated, ie: _type:type, name:myname
-l, --logstash_endpoint= target logstash tcp endpoint, ie: 127.0.0.1:5055
--secured_logstash_endpoint target logstash tcp endpoint was secured by TLS
--repeat_times= repeat the data from source N times to dest output, use align with parameter regenerate_id
to amplify the data size
-r, --regenerate_id regenerate id for documents, this will override the exist document id in data source
--compress use gzip to compress traffic
-p, --sleep= sleep N seconds after each bulk request (-1)
Help Options:
-h, --help Show this help message
3.2 index 数据 迁移到 本地文件
./esm -s http://127.0.0.1:9200 -x "yz_test" -m elastic:password -c 5000 --refresh -o=dump.bin --copy_mappings
./esm -s http://127.0.0.1:9200 -x "logging" -q "date:[1610467200000 TO 1610553600000]" -m elastic:password -c 5000 --refresh -o=2021-01-13-log.bin
./esm -s http://127.0.0.1:9200 -x "logging" -q "date:[1610467200000 TO 1610553600000]" -m elastic:password -c 5000 --refresh -o=2021-01-13-log.bin --copy_settings --copy_mappings
3.3 本地文件数据 恢复到 index
./esm -d http://127.0.0.1:9200 -y "yz_test_recovery" -n elastic:password -c 5000 -b 5 --refresh -i=dump.bin
./esm -d http://127.0.0.1:9200 -y "log_recovery" -n elastic:password -c 5000 -b 5 --refresh -i=2021-01-13-log.bin
限制:
-y , --dest_index 只能是一个 index,且必须已经创建
--copy_mappings, 新创建的索引的 mapping 和 setting 需要重新指定