spark-submit command-line with --files
spark提交任务
bin/spark-submit --name Test --class com.test.batch.modeltrainer.ModelTrainerMain \
--master local --files /tmp/myobject.ser --verbose /opt/test/lib/spark-test.jar
程序引用
val serFile = SparkFiles.get("myobject.ser")
错误提示
Exception:
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.SparkFiles$.getRootDirectory(SparkFiles.scala:37)
at org.apache.spark.SparkFiles$.get(SparkFiles.scala:31)
at com.test.batch.modeltrainer.ModelTrainerMain$.main(ModelTrainerMain.scala:37)
at com.test.batch.modeltrainer.ModelTrainerMain.main(ModelTrainerMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决:
SparkEnv is an internal class that is only meant to be used within Spark. Outside of Spark, it will be null because there are no executors or driver to start an environment for. Similarly, SparkFiles is meant to be used internally (though it's privacy settings should be modified to reflect that).
只能在spark内去引用,在executors或driver去引用,在算子内
sc.parallelize(1 to 100).map { i => SparkFiles.get("my.file") }.collect()