kalor

导航

 

1. Spark Shell测试

Spark Shell是一个特别适合快速开发Spark原型程序的工具,可以帮助我们熟悉Scala语言。即使你对Scala不熟悉,仍然可以使用这一工具。Spark Shell使得用户可以和Spark集群进行交互,提交查询,这便于调试,也便于初学者使用Spark。

 

测试案例1:

[Spark@Master spark]$ MASTER=spark://Master:7077 bin/spark-shell //连接到集群
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/12/01 11:11:03 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:11:03 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:11:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:11:03 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:11:03 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:03 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:36942
14/12/01 11:11:03 INFO util.Utils: Successfully started service 'HTTP class server' on port 36942.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
14/12/01 11:11:10 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:11:10 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:11:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:11:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 11:11:11 INFO Remoting: Starting remoting
14/12/01 11:11:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:45322]
14/12/01 11:11:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:45322]
14/12/01 11:11:11 INFO util.Utils: Successfully started service 'sparkDriver' on port 45322.
14/12/01 11:11:11 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 11:11:11 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 11:11:12 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201111112-e9cc
14/12/01 11:11:12 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 52705.
14/12/01 11:11:12 INFO network.ConnectionManager: Bound socket to port 52705 with id = ConnectionManagerId(Master,52705)
14/12/01 11:11:12 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 11:11:12 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 11:11:12 INFO storage.BlockManagerMasterActor: Registering block manager Master:52705 with 267.3 MB RAM
14/12/01 11:11:12 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 11:11:12 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-87ad77b3-40b1-4320-958f-b1d632f2b4f5
14/12/01 11:11:12 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:12 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51107
14/12/01 11:11:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 51107.
14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:12 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/12/01 11:11:12 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/12/01 11:11:12 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 11:11:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 11:11:14 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
14/12/01 11:11:14 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/12/01 11:11:14 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala> 14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201111115-0000
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores
14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/1 on worker-20141201031041-Slave2-33833 (Slave2:33833) with 1 cores
14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/1 on hostPort Slave2:33833 with 1 cores, 512.0 MB RAM
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/0 is now RUNNING
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/1 is now RUNNING
14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:41369/user/Executor#-1591583962] with ID 0
14/12/01 11:11:19 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:57062 with 267.3 MB RAM
14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave2:47569/user/Executor#-1622351454] with ID 1
14/12/01 11:11:20 INFO storage.BlockManagerMasterActor: Registering block manager Slave2:52207 with 267.3 MB RAM


scala> val file = sc.textFile("hdfs://Master:9000/data/test1")
14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975
14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
14/12/01 11:12:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:52705 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:12:12 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
file: org.apache.spark.rdd.RDD[String] = hdfs://Master:9000/data/test1 MappedRDD[1] at textFile at <console>:12

scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
14/12/01 11:12:43 INFO mapred.FileInputFormat: Total input paths to process : 1
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:14

scala> count.collect()
14/12/01 11:12:59 INFO spark.SparkContext: Starting job: collect at <console>:17
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Registering RDD 3 (map at <console>:14)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:17) with 2 output partitions (allowLocal=false)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at <console>:17)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at <console>:14), which has no missing parents
14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(3424) called with curMem=176615, maxMem=280248975
14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(2051) called with curMem=180039, maxMem=280248975
14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
14/12/01 11:12:59 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:52705 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:12:59 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at <console>:14)
14/12/01 11:12:59 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave2, NODE_LOCAL, 1174 bytes)
14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1174 bytes)
14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43475]
14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:57062]
14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave2/192.168.8.31:43976]
14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:57062], 1 messages pending
14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave2/192.168.8.31:52207]
14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave2/192.168.8.31:52207], 1 messages pending
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:57062 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave2:52207 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:57062 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave2:52207 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 8197 ms on Slave2 (1/2)
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Stage 1 (map at <console>:14) finished in 8.626 s
14/12/01 11:13:07 INFO scheduler.DAGScheduler: looking for newly runnable stages
14/12/01 11:13:07 INFO scheduler.DAGScheduler: running: Set()
14/12/01 11:13:07 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 8585 ms on Slave1 (2/2)
14/12/01 11:13:07 INFO scheduler.DAGScheduler: failed: Set()
14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14), which is now runnable
14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(2112) called with curMem=182090, maxMem=280248975
14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 267.1 MB)
14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(1327) called with curMem=184202, maxMem=280248975
14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1327.0 B, free 267.1 MB)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:52705 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:07 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14)
14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave2, PROCESS_LOCAL, 948 bytes)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 948 bytes)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:57062 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave2:52207 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave1:36991
14/12/01 11:13:08 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 143 bytes
14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave2:50333
14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 149 ms on Slave2 (1/2)
14/12/01 11:13:08 INFO scheduler.DAGScheduler: Stage 0 (collect at <console>:17) finished in 0.179 s
14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 181 ms on Slave1 (2/2)
14/12/01 11:13:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/12/01 11:13:08 INFO spark.SparkContext: Job finished: collect at <console>:17, took 8.947687849 s
res0: Array[(String, Int)] = Array((spark,1), (hadoop,2), (hbase,1))

scala> 

测试案例2:

运行Spark自带测试程序

[Spark@Master spark]$ bin/run-example org.apache.spark.examples.SparkPi 2 spark://192.168.8.29:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/12/01 11:01:24 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:01:24 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:01:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:01:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 11:01:25 INFO Remoting: Starting remoting
14/12/01 11:01:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:60670]
14/12/01 11:01:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:60670]
14/12/01 11:01:25 INFO util.Utils: Successfully started service 'sparkDriver' on port 60670.
14/12/01 11:01:25 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 11:01:25 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 11:01:25 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201110125-9987
14/12/01 11:01:25 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 35768.
14/12/01 11:01:25 INFO network.ConnectionManager: Bound socket to port 35768 with id = ConnectionManagerId(Master,35768)
14/12/01 11:01:25 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 11:01:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 11:01:25 INFO storage.BlockManagerMasterActor: Registering block manager Master:35768 with 267.3 MB RAM
14/12/01 11:01:25 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 11:01:25 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-68503776-9126-4e30-89a3-83a560210e14
14/12/01 11:01:25 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:01:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:01:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:33890
14/12/01 11:01:25 INFO util.Utils: Successfully started service 'HTTP file server' on port 33890.
14/12/01 11:01:26 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:01:26 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/12/01 11:01:26 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/12/01 11:01:26 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 11:01:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 11:01:27 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/lib/spark-examples-1.1.0-hadoop2.4.0.jar at http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
14/12/01 11:01:27 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@Master:60670/user/HeartbeatReceiver
14/12/01 11:01:27 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents
14/12/01 11:01:28 INFO storage.MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=280248975
14/12/01 11:01:28 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 267.3 MB)
14/12/01 11:01:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
14/12/01 11:01:28 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1223 bytes)
14/12/01 11:01:28 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
14/12/01 11:01:28 INFO executor.Executor: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
14/12/01 11:01:28 INFO util.Utils: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar to /tmp/fetchFileTemp7489373377783107634.tmp
14/12/01 11:01:28 INFO executor.Executor: Adding file:/tmp/spark-ad7b4d7f-9793-406b-b3a9-21bd79fddf9f/spark-examples-1.1.0-hadoop2.4.0.jar to class loader
14/12/01 11:01:28 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 701 bytes result sent to driver
14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1223 bytes)
14/12/01 11:01:28 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
14/12/01 11:01:29 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 701 bytes result sent to driver
14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 765 ms on localhost (1/2)
14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 0.936 s
14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 177 ms on localhost (2/2)
14/12/01 11:01:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/12/01 11:01:29 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took 1.3590325 s
Pi is roughly 3.13872
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/12/01 11:01:29 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/12/01 11:01:30 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/12/01 11:01:30 INFO network.ConnectionManager: Selector thread was interrupted!
14/12/01 11:01:30 INFO network.ConnectionManager: ConnectionManager stopped
14/12/01 11:01:30 INFO storage.MemoryStore: MemoryStore cleared
14/12/01 11:01:30 INFO storage.BlockManager: BlockManager stopped
14/12/01 11:01:30 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/12/01 11:01:30 INFO spark.SparkContext: Successfully stopped SparkContext
14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

 

2. 利用Intellij IDEA(Scala插件)编写相应的Spark程序后进行打包成.jar文件后,提交到Spark集群进行运行

其中,com.husor.Test.WordCount.scala代码如下:

package com.husor.Test

import org.apache.spark.{SparkContext,SparkConf}
import org.apache.spark.SparkContext._

/**
 * Created by huxiu on 2014/11/27.
 */
object WordCount {
  def main(args: Array[String]) {

    println("Test is starting......")

    if (args.length < 2) {
      System.err.println("Usage: HDFS_InputFile <File> HDFS_OutputDir <Directory>")
      System.exit(1)
    }

    //System.setProperty("hadoop.home.dir", "d:\\winutil\\")

    val conf = new SparkConf().setAppName("WordCount")
                              .setSparkHome("SPARK_HOME")

    val spark = new SparkContext(conf)

    //val spark = new SparkContext("local","WordCount")

    val file = spark.textFile(args(0))

    //在控制台上进行输出
    //file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
    //val wordcounts = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

    val wordCounts = file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)
    wordCounts.saveAsTextFile(args(1))
    spark.stop()

    println("Test is Succeed!!!")

  }
}

 

相应的执行脚本runSpark.sh如下:

#!/bin/bash

set -x

spark-submit \
--class com.husor.Test.WordCount \
--master spark://Master:7077 \
--executor-memory 512m \
--total-executor-cores 1 \
/home/Spark/husor/spark/SparkTest.jar \
hdfs://Master:9000/data/test1 \
hdfs://Master:9000/user/huxiu/SparkWordCount

给执行脚本runSpark.sh添加执行权限(chmod +x runSpark.sh),执行过程如下:

[Spark@Master spark]$ ./runSpark.sh 
+ spark-submit --class com.husor.Test.WordCount --master spark://Master:7077 --executor-memory 512m --total-executor-cores 1 /home/Spark/husor/spark/SparkTest.jar hdfs://Master:9000/data/test1 hdfs://Master:9000/user/huxiu/SparkWordCount
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Test is starting......
14/12/01 12:10:50 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 12:10:50 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 12:10:50 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 12:10:50 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 12:10:50 INFO Remoting: Starting remoting
14/12/01 12:10:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:37899]
14/12/01 12:10:51 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:37899]
14/12/01 12:10:51 INFO util.Utils: Successfully started service 'sparkDriver' on port 37899.
14/12/01 12:10:51 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 12:10:51 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 12:10:51 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201121051-6189
14/12/01 12:10:51 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 34131.
14/12/01 12:10:51 INFO network.ConnectionManager: Bound socket to port 34131 with id = ConnectionManagerId(Master,34131)
14/12/01 12:10:51 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 12:10:51 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 12:10:51 INFO storage.BlockManagerMasterActor: Registering block manager Master:34131 with 267.3 MB RAM
14/12/01 12:10:51 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 12:10:51 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-83b486ec-2237-4f71-be00-0418e485151f
14/12/01 12:10:51 INFO spark.HttpServer: Starting HTTP Server
14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 12:10:51 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34902
14/12/01 12:10:51 INFO util.Utils: Successfully started service 'HTTP file server' on port 34902.
14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 12:10:51 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/12/01 12:10:51 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/12/01 12:10:51 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 12:10:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 12:10:52 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/SparkTest.jar at http://Master:34902/jars/SparkTest.jar with timestamp 1417407052941
14/12/01 12:10:53 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/12/01 12:10:53 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
14/12/01 12:10:53 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201121053-0006
14/12/01 12:10:53 INFO client.AppClient$ClientActor: Executor added: app-20141201121053-0006/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201121053-0006/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM
14/12/01 12:10:54 INFO client.AppClient$ClientActor: Executor updated: app-20141201121053-0006/0 is now RUNNING
14/12/01 12:10:54 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975
14/12/01 12:10:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
14/12/01 12:10:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:34131 (size: 12.6 KB, free: 267.3 MB)
14/12/01 12:10:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
14/12/01 12:10:54 INFO mapred.FileInputFormat: Total input paths to process : 1
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/12/01 12:10:55 INFO spark.SparkContext: Starting job: saveAsTextFile at WordCount.scala:35
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:34)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:35) with 2 output partitions (allowLocal=false)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(saveAsTextFile at WordCount.scala:35)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:34), which has no missing parents
14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(3400) called with curMem=176615, maxMem=280248975
14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(2055) called with curMem=180015, maxMem=280248975
14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
14/12/01 12:10:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:34131 (size: 2.0 KB, free: 267.3 MB)
14/12/01 12:10:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:34)
14/12/01 12:10:55 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/12/01 12:10:57 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:38410/user/Executor#898843507] with ID 0
14/12/01 12:10:57 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave1, NODE_LOCAL, 1222 bytes)
14/12/01 12:10:57 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:44906 with 267.3 MB RAM
14/12/01 12:10:58 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43149]
14/12/01 12:10:58 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:44906]
14/12/01 12:10:58 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:44906], 1 messages pending
14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:44906 (size: 2.0 KB, free: 267.3 MB)
14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:44906 (size: 12.6 KB, free: 267.3 MB)
14/12/01 12:10:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1222 bytes)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 159 ms on Slave1 (1/2)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 2454 ms on Slave1 (2/2)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 1 (map at WordCount.scala:34) finished in 4.444 s
14/12/01 12:11:00 INFO scheduler.DAGScheduler: looking for newly runnable stages
14/12/01 12:11:00 INFO scheduler.DAGScheduler: running: Set()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
14/12/01 12:11:00 INFO scheduler.DAGScheduler: failed: Set()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35), which is now runnable
14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(57552) called with curMem=182070, maxMem=280248975
14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.2 KB, free 267.0 MB)
14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(19863) called with curMem=239622, maxMem=280248975
14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.4 KB, free 267.0 MB)
14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:34131 (size: 19.4 KB, free: 267.2 MB)
14/12/01 12:11:00 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave1, PROCESS_LOCAL, 996 bytes)
14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:44906 (size: 19.4 KB, free: 267.2 MB)
14/12/01 12:11:00 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave1:51850
14/12/01 12:11:00 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 133 bytes
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 996 bytes)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 412 ms on Slave1 (1/2)
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 0 (saveAsTextFile at WordCount.scala:35) finished in 0.710 s
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 308 ms on Slave1 (2/2)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/12/01 12:11:00 INFO spark.SparkContext: Job finished: saveAsTextFile at WordCount.scala:35, took 5.556490798 s
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/12/01 12:11:00 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
14/12/01 12:11:01 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:02 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/12/01 12:11:02 INFO network.ConnectionManager: Selector thread was interrupted!
14/12/01 12:11:02 INFO network.ConnectionManager: ConnectionManager stopped
14/12/01 12:11:02 INFO storage.MemoryStore: MemoryStore cleared
14/12/01 12:11:02 INFO storage.BlockManager: BlockManager stopped
14/12/01 12:11:02 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
14/12/01 12:11:02 INFO spark.SparkContext: Successfully stopped SparkContext
Test is Succeed!!!
14/12/01 12:11:02 INFO Remoting: Remoting shut down
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00001
14/12/01 12:11:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(spark,1)
(hadoop,2)
(hbase,1)
[Spark@Master spark]$ hdfs dfs -ls /user/huxiu/SparkWordCount/
14/12/01 12:11:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/_SUCCESS
-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00000
-rw-r--r--   2 Spark huxiu         31 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00001
[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00000
14/12/01 12:11:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Note:

运行过程中可能会出现 Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory异常,而内存肯定是够的,但就是无法获取资源!检查防火墙,果然客户端只开启的对80端口的访问,其他都禁止了!

Solution:

关闭各节点上的防火墙(service iptables stop),然后在Spark on yarn集群上执行上述脚本runSpark.sh即可

posted on 2014-12-01 12:25  kalor  阅读(3373)  评论(1编辑  收藏  举报