编译spark-atlas-connector

一、代码下载地址

  https://github.com/hortonworks-spark/spark-atlas-connector.git

  下载完成后,上传至/opt/soft目录

二、编译准备

1、由于代码中的pom文件已经修改好,故无需再进行版本改动,直接编译即可

2、改动说明:本代码是直接从GitHub开源平台中拉取spark-atlas-connector项目的,改动点如下:

(1)pom文件中修改对应组件的版本号

复制代码
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <java.version>1.8</java.version>
    <spark.version>2.4.0-cdh6.1.1</spark.version>
    <atlas.version>2.1.0</atlas.version>
    <maven.version>3.6.3</maven.version>
    <scala.version>2.11.12</scala.version>
    <scala.binary.version>2.11</scala.binary.version>
    <kafka.version>2.2.1</kafka.version>
    <MaxPermGen>512m</MaxPermGen>
    <CodeCacheSize>512m</CodeCacheSize>
    <minJavaVersion>1.8</minJavaVersion>
    <maxJavaVersion>1.8</maxJavaVersion>
    <test.redirectToFile>true</test.redirectToFile>
    <scalatest.version>3.0.3</scalatest.version>
    <mockito.version>1.10.19</mockito.version>
    <integration.test.enabled>false</integration.test.enabled>
    <jersey.version>1.19</jersey.version>
    <scoverage.plugin.version>1.3.0</scoverage.plugin.version>
    <!-- jackson version pulled from atlas-intg -->
    <jackson.version>2.9.6</jackson.version>
  </properties>
复制代码

(2)pom文件中添加阿里云镜像

<repositories>
    <repository>
      <id>aliyun</id>
      <name>aliyun</name>
      <url>https://maven.aliyun.com/repository/public</url>
    </repository>
  </repositories>

(3)pom文件中修改Maven版本号

<maven.version>3.6.3</maven.version>

(4)注释如下依赖:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
    <version>${spark.version}</version>
    <version>2.4.0</version>
    <scope>provided</scope>
</dependency>

(5)修改spark-atlas-connector/src/main/scala/com/hortonworks/spark/atlas/utils/SparkUtils.scala

  删除部分:

复制代码
import scala.util.control.NonFatal
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

修改部分:

def currSessionUser(qe: QueryExecution): String = {
currUser()
/*
// ok , i accept your suggestion
val thriftServerListener = Option(HiveThriftServer2.listener)
thriftServerListener match {
case None => currUser()
}
*/
}
复制代码

三、执行编译命令

  mvn clean -DskipTests package -Pdist

四、结果查看

  cd /opt/soft/spark-atlas-connector/spark-atlas-connector-assembly/target

五、使用

1、将spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar上传至/opt/module/atlas/conf目录

2、分发Atlas的conf目录至CDH集群

  将/opt/module/atlas/conf目录分发给CDH集群中的节点,方便后续启动Spark作业时直接指定到conf目录

3、启动命令实例

spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn --driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar

说明:Spark作业启动时,加上以下四个参数即可

--driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker

正常来说,Spark作业的启动命令为:

spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar

 

posted @   落魄的大数据转AI小哥  阅读(638)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
点击右上角即可分享
微信分享提示