spark run using IDE / Maven
来自:http://stackoverflow.com/questions/26892389/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure-task-from-app
- Create a Fat Jar ( One which includes all dependencies ). Use Shade Plugin for this. Example pom :
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.2</version> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <id>job-driver-jar</id> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>driver</shadedClassifierName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <!-- Some care is required: http://doc.akka.io/docs/akka/snapshot/general/configuration.html --> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>mainClass</mainClass> </transformer> </transformers> </configuration> </execution> <execution> <id>worker-library-jar</id> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>worker</shadedClassifierName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> </transformers> </configuration> </execution> </executions> </plugin>
- Now we have to send the compiled jar file to the cluster. For this, specify the jar file in the spark config like this :
SparkConf conf = new SparkConf().setAppName("appName").setMaster("spark://machineName:7077").setJars(new String[] {"target/appName-1.0-SNAPSHOT-driver.jar"});
-
Run mvn clean package to create the Jar file. It will be created in your target folder.
-
Run using your IDE or using maven command :
mvn exec:java -Dexec.mainClass="className"
This does not require spark-submit. Just remember to package file before running
If you don't want to hardcode the jar path, you can do this :
- In the config, write :
SparkConf conf = new SparkConf() .setAppName("appName") .setMaster("spark://machineName:7077") .setJars(JavaSparkContext.jarOfClass(this.getClass()));
- Create the fat jar ( as above ) and run using maven after running package command :
java -jar target/application-1.0-SNAPSHOT-driver.jar
This will take the jar from the jar the class was loaded.