spark远程调用的几个坑

远程调用  spark://server_ip:7077 
//方式一
 SparkSession spark = SparkSession
      .builder()
      .appName("JavaWordCount")
      .master("spark://10.9.2.155:7077")
      .getOrCreate();

//方式二
    SparkConf sparkConf = new SparkConf()
            .setMaster("spark://10.9.2.155:7077")
//            .setJars(new String[]{"D:\\install\\spark-3.0.0-bin-hadoop2.7\\examples\\jars\\spark-examples_2.12-3.0.0.jar"})
            .setAppName("JavaSparkPi");

    SparkSession spark = SparkSession
      .builder()
      .config(sparkConf)
      .getOrCreate();

问题一 远程调用出现类似死循环一样一直在执行

20/07/24 11:26:32 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20200724112629-0001, execId=0)
20/07/24 11:26:32 INFO Worker: Asked to launch executor app-20200724112629-0001/1 for JavaSparkPi
20/07/24 11:26:32 INFO SecurityManager: Changing view acls to: root
20/07/24 11:26:32 INFO SecurityManager: Changing modify acls to: root
20/07/24 11:26:32 INFO SecurityManager: Changing view acls groups to: 
20/07/24 11:26:32 INFO SecurityManager: Changing modify acls groups to: 
20/07/24 11:26:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/24 11:26:32 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_211-amd64/bin/java" "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=65518" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@DESKTOP-BL0DQH5:65518" "--executor-id" "1" "--hostname" "10.9.2.155" "--cores" "4" "--app-id" "app-20200724112629-0001" "--worker-url" "spark://Worker@10.9.2.155:39511"
20/07/24 11:26:35 INFO Worker: Executor app-20200724112629-0001/1 finished with state EXITED message Command exited with code 1 exitStatus 1
20/07/24 11:26:35 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
20/07/24 11:26:35 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20200724112629-0001, execId=1)
20/07/24 11:26:35 INFO Worker: Asked to launch executor app-20200724112629-0001/2 for JavaSparkPi
20/07/24 11:26:35 INFO SecurityManager: Changing view acls to: root
20/07/24 11:26:35 INFO SecurityManager: Changing modify acls to: root
20/07/24 11:26:35 INFO SecurityManager: Changing view acls groups to: 
20/07/24 11:26:35 INFO SecurityManager: Changing modify acls groups to: 
20/07/24 11:26:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/24 11:26:35 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_211-amd64/bin/java" "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=65518" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@DESKTOP-BL0DQH5:65518" "--executor-id" "2" "--hostname" "10.9.2.155" "--cores" "4" "--app-id" "app-20200724112629-0001" "--worker-url" "spark://Worker@10.9.2.155:39511"
20/07/24 11:26:37 INFO Worker: Executor app-20200724112629-0001/2 finished with state EXITED message Command exited with code 1 exitStatus 1
20/07/24 11:26:37 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 2
20/07/24 11:26:37 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20200724112629-0001, execId=2)
20/07/24 11:26:37 INFO Worker: Asked to launch executor app-20200724112629-0001/3 for JavaSparkPi
20/07/24 11:26:37 INFO SecurityManager: Changing view acls to: root
20/07/24 11:26:37 INFO SecurityManager: Changing modify acls to: root
20/07/24 11:26:37 INFO SecurityManager: Changing view acls groups to: 
20/07/24 11:26:37 INFO SecurityManager: Changing modify acls groups to: 
20/07/24 11:26:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/24 11:26:37 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_211-amd64/bin/java" "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=65518" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@DESKTOP-BL0DQH5:65518" "--executor-id" "3" "--hostname" "10.9.2.155" "--cores" "4" "--app-id" "app-20200724112629-0001" "--worker-url" "spark://Worker@10.9.2.155:39511"
20/07/24 11:26:39 INFO Worker: Executor app-20200724112629-0001/3 finished with state EXITED message Command exited with code 1 exitStatus 1
20/07/24 11:26:39 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 3
20/07/24 11:26:39 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20200724112629-0001, execId=3)
20/07/24 11:26:39 INFO Worker: Asked to launch executor app-20200724112629-0001/4 for JavaSparkPi
20/07/24 11:26:39 INFO SecurityManager: Changing view acls to: root
20/07/24 11:26:39 INFO SecurityManager: Changing modify acls to: root
20/07/24 11:26:39 INFO SecurityManager: Changing view acls groups to: 
20/07/24 11:26:39 INFO SecurityManager: Changing modify acls groups to: 
20/07/24 11:26:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/24 11:26:39 INFO ExecutorRunner: Launch command: "/usr/java/jdk1.8.0_211-amd64/bin/java" "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=65518" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@DESKTOP-BL0DQH5:65518" "--executor-id" "4" "--hostname" "10.9.2.155" "--cores" "4" "--app-id" "app-20200724112629-0001" "--worker-url" "spark://Worker@10.9.2.155:39511"
20/07/24 11:26:41 INFO Worker: Executor app-20200724112629-0001/4 finished with state EXITED message Command exited with code 1 exitStatus 1
20/07/24 11:26:41 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 4
20/07/24 11:26:41 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20200724112629-0001, execId=4)
20/07/24 11:26:41 INFO Worker: Asked to launch executor app-20200724112629-0001/5 for JavaSparkPi

一般可以看到控制台会打印以下的日志,

Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-BL0DQH5:4040

需在远程服务器上,把下面的增加hosts配置中:

vim /etc/hosts DESKTOP-BL0DQH5 ip

 

问题二

20/07/24 14:16:12 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.9.2.155, executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1417)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2293)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

20/07/24 14:16:13 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, 10.9.2.155, executor 0, partition 0, PROCESS_LOCAL, 1007348 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 1]
20/07/24 14:16:13 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, 10.9.2.155, executor 0, partition 1, PROCESS_LOCAL, 1007353 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 2]
20/07/24 14:16:13 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, 10.9.2.155, executor 0, partition 0, PROCESS_LOCAL, 1007348 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 3]
20/07/24 14:16:13 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, 10.9.2.155, executor 0, partition 1, PROCESS_LOCAL, 1007353 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 4]
20/07/24 14:16:13 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, 10.9.2.155, executor 0, partition 0, PROCESS_LOCAL, 1007348 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 5]
20/07/24 14:16:13 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, 10.9.2.155, executor 0, partition 1, PROCESS_LOCAL, 1007353 bytes)
20/07/24 14:16:13 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on 10.9.2.155, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 6]
20/07/24 14:16:13 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
20/07/24 14:16:13 INFO TaskSchedulerImpl: Cancelling stage 0
20/07/24 14:16:13 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
20/07/24 14:16:13 INFO TaskSchedulerImpl: Stage 0 was cancelled

standalone方式远程调用进要把jar传给worker需要增加,以下这行代码

.setJars(new String[]{"xx_path/spark-examples_2.12-3.0.0.jar"})

编辑于 12-01
posted @ 2020-12-04 16:51  玄明hanko  阅读(1717)  评论(0编辑  收藏  举报