sparkr——报错
> sc <- sparkR.init() Re-using existing Spark Context. Please stop SparkR with sparkR.stop() or restart R to create a new Spark Context > sqlContext <- sparkRSQL.init(sc) > df <- createDataFrame(sqlContext, faithful) 17/03/01 15:05:56 INFO SparkContext: Starting job: collectPartitions at NativeMethodAccessorImpl.java:-2 17/03/01 15:05:56 INFO DAGScheduler: Got job 0 (collectPartitions at NativeMethodAccessorImpl.java:-2) with 1 output partitions 17/03/01 15:05:56 INFO DAGScheduler: Final stage: ResultStage 0 (collectPartitions at NativeMethodAccessorImpl.java:-2) 17/03/01 15:05:56 INFO DAGScheduler: Parents of final stage: List() 17/03/01 15:05:56 INFO DAGScheduler: Missing parents: List() 17/03/01 15:05:56 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at RRDD.scala:460), which has no missing parents 17/03/01 15:05:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1280.0 B, free 1280.0 B) 17/03/01 15:05:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 854.0 B, free 2.1 KB) 17/03/01 15:05:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.31.137:49150 (size: 854.0 B, free: 511.5 MB) 17/03/01 15:05:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 17/03/01 15:05:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at RRDD.scala:460) 17/03/01 15:05:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 17/03/01 15:05:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, test3, partition 0,PROCESS_LOCAL, 12976 bytes) 17/03/01 15:05:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on test3:50531 (size: 854.0 B, free: 511.5 MB) 17/03/01 15:05:56 INFO DAGScheduler: ResultStage 0 (collectPartitions at NativeMethodAccessorImpl.java:-2) finished in 0.396 s 17/03/01 15:05:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 389 ms on test3 (1/1) 17/03/01 15:05:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/03/01 15:05:56 INFO DAGScheduler: Job 0 finished: collectPartitions at NativeMethodAccessorImpl.java:-2, took 0.526915 s > showDF(df) 17/03/01 15:06:02 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2 17/03/01 15:06:02 INFO DAGScheduler: Got job 1 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions 17/03/01 15:06:02 INFO DAGScheduler: Final stage: ResultStage 1 (showString at NativeMethodAccessorImpl.java:-2) 17/03/01 15:06:02 INFO DAGScheduler: Parents of final stage: List() 17/03/01 15:06:02 INFO DAGScheduler: Missing parents: List() 17/03/01 15:06:02 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents 17/03/01 15:06:02 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.7 KB, free 10.8 KB) 17/03/01 15:06:02 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.5 KB, free 14.4 KB) 17/03/01 15:06:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.31.137:49150 (size: 3.5 KB, free: 511.5 MB) 17/03/01 15:06:02 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/03/01 15:06:02 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:-2) 17/03/01 15:06:02 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 17/03/01 15:06:02 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, test2, partition 0,PROCESS_LOCAL, 12976 bytes) 17/03/01 15:06:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on test2:57552 (size: 3.5 KB, free: 511.5 MB) 17/03/01 15:06:04 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, test2): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.spark.api.r.RRDD$.createRProcess(RRDD.scala:413) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:429) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:187) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 20 more 17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, test2, partition 0,PROCESS_LOCAL, 12976 bytes) 17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2) on executor test2: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 1] 17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, test3, partition 0,PROCESS_LOCAL, 12976 bytes) 17/03/01 15:06:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on test3:50531 (size: 3.5 KB, free: 511.5 MB) 17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3) on executor test3: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 2] 17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, test3, partition 0,PROCESS_LOCAL, 12976 bytes) 17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4) on executor test3: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 3] 17/03/01 15:06:04 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 17/03/01 15:06:04 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/03/01 15:06:04 INFO TaskSchedulerImpl: Cancelling stage 1 17/03/01 15:06:04 INFO DAGScheduler: ResultStage 1 (showString at NativeMethodAccessorImpl.java:-2) failed in 2.007 s 17/03/01 15:06:04 INFO DAGScheduler: Job 1 failed: showString at NativeMethodAccessorImpl.java:-2, took 2.027519 s 17/03/01 15:06:04 ERROR RBackendHandler: showString on 15 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, test3): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.spark.api.r.RRDD$.createRProcess(RRDD.scala:413) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:429) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.R
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, test3): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
重点为这一句
这一错误,使得在sparkr中,定义class为
class(df)
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
的对象之后,使用class以及names以及show可以查看
但使用showDF以及head则报出如上错误。即无法读取
关注重点报错句,可知,其他节点上没有
Rscript
解决办法为,登陆其他的机器,将将Rscript copy到/usr/bin便可
或改成单节点:
即启动时,去掉--master
sparkR --driver-class-path /data1/mysql-connector-java-5.1.18.jar