用 Spark 为 Elasticsearch 导入搜索数据
越来越健忘了,得记录下自己的操作才行!
ES和spark版本:
spark-1.6.0-bin-hadoop2.6
Elasticsearch for Apache Hadoop 2.1.2
如果是其他版本,在索引数据写入的时候可能会出错。
首先,启动es后,spark shell导入es-hadoop jar包:
cp elasticsearch-hadoop-2.1.2/dist/elasticsearch-spark* spark-1.6.0-bin-hadoop2.6/lib/ cd spark-1.6.0-bin-hadoop2.6/bin ./spark-shell --jars ../lib/elasticsearch-spark-1.2_2.10-2.1.2.jar
交互如下:
import org.apache.spark.SparkConf import org.elasticsearch.spark._ val conf = new SparkConf() conf.set("es.index.auto.create", "true") conf.set("es.nodes", "127.0.0.1") val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3) val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran") sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
然后查看ES中的数据:
http://127.0.0.1:9200/spark/docs/_search?q=*
结果如下:
{"took":71,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"spark","_type":"docs","_id":"AVfhVqPBv9dlWdV2DcbH","_score":1.0,"_source":{"OTP":"Otopeni","SFO":"San Fran"}},{"_index":"spark","_type":"docs","_id":"AVfhVqPOv9dlWdV2DcbI","_score":1.0,"_source":{"one":1,"two":2,"three":3}}]}}
参考:
https://www.elastic.co/guide/en/elasticsearch/hadoop/2.1/spark.html#spark-installation
http://spark.apache.org/docs/latest/programming-guide.html
http://chenlinux.com/2014/09/04/spark-to-elasticsearch/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」