编译spark-atlas-connector
一、代码下载地址
https://github.com/hortonworks-spark/spark-atlas-connector.git
下载完成后,上传至/opt/soft目录
二、编译准备
1、由于代码中的pom文件已经修改好,故无需再进行版本改动,直接编译即可
2、改动说明:本代码是直接从GitHub开源平台中拉取spark-atlas-connector项目的,改动点如下:
(1)pom文件中修改对应组件的版本号
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> <spark.version>2.4.0-cdh6.1.1</spark.version> <atlas.version>2.1.0</atlas.version> <maven.version>3.6.3</maven.version> <scala.version>2.11.12</scala.version> <scala.binary.version>2.11</scala.binary.version> <kafka.version>2.2.1</kafka.version> <MaxPermGen>512m</MaxPermGen> <CodeCacheSize>512m</CodeCacheSize> <minJavaVersion>1.8</minJavaVersion> <maxJavaVersion>1.8</maxJavaVersion> <test.redirectToFile>true</test.redirectToFile> <scalatest.version>3.0.3</scalatest.version> <mockito.version>1.10.19</mockito.version> <integration.test.enabled>false</integration.test.enabled> <jersey.version>1.19</jersey.version> <scoverage.plugin.version>1.3.0</scoverage.plugin.version> <!-- jackson version pulled from atlas-intg --> <jackson.version>2.9.6</jackson.version> </properties>
(2)pom文件中添加阿里云镜像
<repositories> <repository> <id>aliyun</id> <name>aliyun</name> <url>https://maven.aliyun.com/repository/public</url> </repository> </repositories>
(3)pom文件中修改Maven版本号
<maven.version>3.6.3</maven.version>
(4)注释如下依赖:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId> <version>${spark.version}</version> <version>2.4.0</version> <scope>provided</scope> </dependency>
(5)修改spark-atlas-connector/src/main/scala/com/hortonworks/spark/atlas/utils/SparkUtils.scala
删除部分:
import scala.util.control.NonFatal import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 修改部分: def currSessionUser(qe: QueryExecution): String = { currUser() /* // ok , i accept your suggestion val thriftServerListener = Option(HiveThriftServer2.listener) thriftServerListener match { case None => currUser() } */ }
三、执行编译命令
mvn clean -DskipTests package -Pdist
四、结果查看
cd /opt/soft/spark-atlas-connector/spark-atlas-connector-assembly/target
五、使用
1、将spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar上传至/opt/module/atlas/conf目录
2、分发Atlas的conf目录至CDH集群
将/opt/module/atlas/conf目录分发给CDH集群中的节点,方便后续启动Spark作业时直接指定到conf目录
3、启动命令实例
spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn --driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar
说明:Spark作业启动时,加上以下四个参数即可
--driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
正常来说,Spark作业的启动命令为:
spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本