Spark源码修改环境搭建
过程中存在问题:
- maven编译scala工程报错java.lang.NoClassDefFoundError: scala/reflect/internal/Trees,解决方案看maven编译
1.新建Maven工程sparkdemo-2.3.3
过程略
2.配置pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>sparkdemo-2.3.3</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.3.3</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-launcher_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-network-common_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.json4s</groupId>
<artifactId>json4s-ast_2.11</artifactId>
<version>3.5.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-nodemanager</artifactId>
<version>2.6.5</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.11.8</scalaVersion>
</configuration>
</plugin>
</plugins>
</build>
</project>
注意:要增加scala的编译插件,否则无法编译scala代码
3.新增scala代码,编译
如遇到编译报错,如下:
[INFO]
[INFO] --- scala-maven-plugin:3.4.6:compile (default) @ sparkdemo-2.3.3 ---
[INFO] D:\07_code\java\workspace\sparkdemo-2.3.3\src\main\java:-1: info: compiling
[INFO] D:\07_code\java\workspace\sparkdemo-2.3.3\src\main\scala:-1: info: compiling
[INFO] Compiling 2 source files to D:\07_code\java\workspace\sparkdemo-2.3.3\target\classes at 1655537041073
[ERROR] java.lang.NoClassDefFoundError: scala/reflect/internal/Trees
[INFO] at java.lang.Class.getDeclaredMethods0(Native Method)
[INFO] at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
[INFO] at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
[INFO] at java.lang.Class.getMethod0(Class.java:3018)
[INFO] at java.lang.Class.getMethod(Class.java:1784)
[INFO] at scala_maven_executions.MainHelper.runMain(MainHelper.java:155)
[INFO] at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.687 s
[INFO] Finished at: 2022-06-18T15:24:01+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.4.6:compile (default) on project sparkdemo-2.3.3: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: -10000 (Exit value: -10000) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
修改编译命令,增加命令行参数(scala.home):
scala:compile compile -Dscala.home=D:\06_devptools\scala-2.11.8 -f pom.xml
此时编译成功。
4.查找要修改的类并复制源码
如在ExecutorRunnable中增加日志,排查容器启动相关信息,Ctrl+N找到类,下载源码,复制源码。在scala目录下新建org.apache.spark.deploy.yarn包,然后在此package下复制,即可得到源码。修改后执行编译命令,在target下面可找到对应的class,替换原jar包中的类即可。