elasticsearch 5.x 系列之五 数据导入导出
一、首先给大家发一个福利,分享一个elasticsearch 数据导出工具。
esm
github 源码地址:
https://github.com/medcl/esm
下载编译好的对应elasticsearch 的esm:
下载地址:
https://github.com/medcl/esm/releases
实现看一下具体的使用:
./esm -s http://10.81.179.209:9200 -x "zebra_info_tmp" -w=5 -b=10 -c 10000 --refresh -o=dump.bin
解释: 把10.81.179.209 集群内的 zebra_info_tmp 索引下载到本地,然后保存成dum.cin。 -c 指的是每次批量操作的条数。 其他两个具体忘了。请查看官网。
./esm -d http://172.16.232.242:9200 -y "zebra_info_tmp" -c 1000 -b 10 --refresh -i=dump.bin
解释,把本地的内容保存到集群中的zebra_info_tmp 集群中。
二、在我们无法使用工具的情况下,我们该怎么办。
例如原始数据在hive 中的时候,
我们可以利用elasticsearch 的bulk api。
2.1 首先把数据弄成如下的样子,(json格式数据)
{"index":{"_index":"zebra_info_tmp","_type":"zebra_info","_id":"L1f47bbb97d239"}}
{"adcode":"230921","business_circle":"勃利县镇政府","city":"七台河市","citycode":"0464","district":"勃利县","extensions":{"avg_price":0,"good_comments":0,"lvl":0,"numbers":0,"other_type":null,"shops":0},"firstly_classification":"金融","formatted_address":"黑龙江省七台河市勃利县新华街道吉祥街5号","location":"45.746754887850216, 130.57131899190972","name":"平安易贷","province":"黑龙江省","secondary_classification":"银行","township":"新华街道","type_name":"金融"}
{"index":{"_index":"zebra_info_tmp","_type":"zebra_info","_id":"L15edb0517a1a1"}}
{"adcode":"350427","business_circle":"三明汽车北站","city":"三明市","citycode":"0598","district":"沙县","extensions":{"avg_price":0,"good_comments":0,"lvl":0,"numbers":0,"other_type":null,"shops":0},"firstly_classification":"金融","formatted_address":"福建省三明市沙县富口镇","location":"26.50277598187647, 117.67915191588664","name":"中国建设银行自助银行","province":"福建省","secondary_classification":"银行","township":"富口镇","type_name":"金融"}
2.3 利用bulk api 往elasticsearch 往集群导数据。
受限于本人的能力,暂时想到可以实现的最好的方法就是这种。
看一下其具体的bulk api
curl $1:9200/_bulk?pretty --data-binary @${JSON_SPILIT_PATH}/${file}
$1 elasticsearch 其中的一个ip, @后面跟着json 数据的文件。