Spark scala 程序开发
开发Spark 程序,看到的都是要打包成jar 然后运行。 今天发现其实对于standalone也可以直接运行。
如下代码,设定好master , 然后选择run as -> Scala Application 即可。
通过这种方式可以节约打包时间。
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ object WordCount { def main(args: Array[String]): Unit = { val logFile = "README.md" // Should be some file on your system val conf = new SparkConf() .setAppName("Simple Application") .setMaster("local[2]") val sc = new SparkContext(conf) val file = sc.textFile(logFile, 2).cache() val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_+_) counts.saveAsTextFile("result") } }
控制台输出结果如下。
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/11/06 00:40:28 INFO SecurityManager: Changing view acls to: hduser, 14/11/06 00:40:28 INFO SecurityManager: Changing modify acls to: hduser, 14/11/06 00:40:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser, ); users with modify permissions: Set(hduser, ) 14/11/06 00:40:30 INFO Slf4jLogger: Slf4jLogger started 14/11/06 00:40:30 INFO Remoting: Starting remoting 14/11/06 00:40:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@namenode1:36164] 14/11/06 00:40:31 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@namenode1:36164] 14/11/06 00:40:31 INFO Utils: Successfully started service 'sparkDriver' on port 36164. 14/11/06 00:40:31 INFO SparkEnv: Registering MapOutputTracker 14/11/06 00:40:31 INFO SparkEnv: Registering BlockManagerMaster 14/11/06 00:40:31 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141106004031-6c1f 14/11/06 00:40:31 INFO Utils: Successfully started service 'Connection manager for block manager' on port 43311. 14/11/06 00:40:31 INFO ConnectionManager: Bound socket to port 43311 with id = ConnectionManagerId(namenode1,43311) 14/11/06 00:40:31 INFO MemoryStore: MemoryStore started with capacity 515.8 MB 14/11/06 00:40:31 INFO BlockManagerMaster: Trying to register BlockManager 14/11/06 00:40:31 INFO BlockManagerMasterActor: Registering block manager namenode1:43311 with 515.8 MB RAM 14/11/06 00:40:31 INFO BlockManagerMaster: Registered BlockManager 14/11/06 00:40:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-885d61b7-801f-4701-9676-abfe34983844 14/11/06 00:40:32 INFO HttpServer: Starting HTTP Server 14/11/06 00:40:32 INFO Utils: Successfully started service 'HTTP file server' on port 50899. 14/11/06 00:40:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/11/06 00:40:34 INFO SparkUI: Started SparkUI at http://namenode1:4040 14/11/06 00:40:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/06 00:40:36 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@namenode1:36164/user/HeartbeatReceiver 14/11/06 00:40:38 INFO MemoryStore: ensureFreeSpace(159118) called with curMem=0, maxMem=540821422 14/11/06 00:40:38 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 155.4 KB, free 515.6 MB) 14/11/06 00:40:38 INFO FileInputFormat: Total input paths to process : 1 14/11/06 00:40:39 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/11/06 00:40:39 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/11/06 00:40:39 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/11/06 00:40:39 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 14/11/06 00:40:39 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/11/06 00:40:39 INFO SparkContext: Starting job: saveAsTextFile at WordCount.scala:21 14/11/06 00:40:39 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:18) 14/11/06 00:40:39 INFO DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:21) with 2 output partitions (allowLocal=false) 14/11/06 00:40:39 INFO DAGScheduler: Final stage: Stage 0(saveAsTextFile at WordCount.scala:21) 14/11/06 00:40:39 INFO DAGScheduler: Parents of final stage: List(Stage 1) 14/11/06 00:40:39 INFO DAGScheduler: Missing parents: List(Stage 1) 14/11/06 00:40:40 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:18), which has no missing parents 14/11/06 00:40:40 INFO MemoryStore: ensureFreeSpace(3360) called with curMem=159118, maxMem=540821422 14/11/06 00:40:40 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 515.6 MB) 14/11/06 00:40:40 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:18) 14/11/06 00:40:40 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/11/06 00:40:40 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1192 bytes) 14/11/06 00:40:40 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1192 bytes) 14/11/06 00:40:40 INFO Executor: Running task 0.0 in stage 1.0 (TID 0) 14/11/06 00:40:40 INFO Executor: Running task 1.0 in stage 1.0 (TID 1) 14/11/06 00:40:41 INFO CacheManager: Partition rdd_1_1 not found, computing it 14/11/06 00:40:41 INFO CacheManager: Partition rdd_1_0 not found, computing it 14/11/06 00:40:41 INFO HadoopRDD: Input split: file:/home/hduser/workspace/TestScala/README.md:0+2405 14/11/06 00:40:41 INFO HadoopRDD: Input split: file:/home/hduser/workspace/TestScala/README.md:2405+2406 14/11/06 00:40:41 INFO MemoryStore: ensureFreeSpace(7512) called with curMem=162478, maxMem=540821422 14/11/06 00:40:41 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 7.3 KB, free 515.6 MB) 14/11/06 00:40:41 INFO MemoryStore: ensureFreeSpace(8352) called with curMem=169990, maxMem=540821422 14/11/06 00:40:41 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 8.2 KB, free 515.6 MB) 14/11/06 00:40:41 INFO BlockManagerInfo: Added rdd_1_1 in memory on namenode1:43311 (size: 7.3 KB, free: 515.8 MB) 14/11/06 00:40:41 INFO BlockManagerMaster: Updated info of block rdd_1_1 14/11/06 00:40:41 INFO BlockManagerInfo: Added rdd_1_0 in memory on namenode1:43311 (size: 8.2 KB, free: 515.8 MB) 14/11/06 00:40:41 INFO BlockManagerMaster: Updated info of block rdd_1_0 14/11/06 00:40:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 2433 bytes result sent to driver 14/11/06 00:40:41 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 2433 bytes result sent to driver 14/11/06 00:40:41 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 1112 ms on localhost (1/2) 14/11/06 00:40:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1175 ms on localhost (2/2) 14/11/06 00:40:41 INFO DAGScheduler: Stage 1 (map at WordCount.scala:18) finished in 1.204 s 14/11/06 00:40:41 INFO DAGScheduler: looking for newly runnable stages 14/11/06 00:40:41 INFO DAGScheduler: running: Set() 14/11/06 00:40:41 INFO DAGScheduler: waiting: Set(Stage 0) 14/11/06 00:40:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/11/06 00:40:41 INFO DAGScheduler: failed: Set() 14/11/06 00:40:41 INFO DAGScheduler: Missing parents for Stage 0: List() 14/11/06 00:40:41 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:21), which is now runnable 14/11/06 00:40:42 INFO MemoryStore: ensureFreeSpace(57496) called with curMem=178342, maxMem=540821422 14/11/06 00:40:42 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.1 KB, free 515.5 MB) 14/11/06 00:40:42 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:21) 14/11/06 00:40:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/11/06 00:40:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 948 bytes) 14/11/06 00:40:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 948 bytes) 14/11/06 00:40:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 2) 14/11/06 00:40:42 INFO Executor: Running task 1.0 in stage 0.0 (TID 3) 14/11/06 00:40:42 INFO BlockManager: Removing broadcast 1 14/11/06 00:40:42 INFO BlockManager: Removing block broadcast_1 14/11/06 00:40:42 INFO MemoryStore: Block broadcast_1 of size 3360 dropped from memory (free 540588944) 14/11/06 00:40:42 INFO ContextCleaner: Cleaned broadcast 1 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 15 ms 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 17 ms 14/11/06 00:40:42 INFO FileOutputCommitter: Saved output of task 'attempt_201411060040_0000_m_000001_3' to file:/home/hduser/workspace/TestScala/result/_temporary/0/task_201411060040_0000_m_000001 14/11/06 00:40:42 INFO FileOutputCommitter: Saved output of task 'attempt_201411060040_0000_m_000000_2' to file:/home/hduser/workspace/TestScala/result/_temporary/0/task_201411060040_0000_m_000000 14/11/06 00:40:42 INFO SparkHadoopWriter: attempt_201411060040_0000_m_000000_2: Committed 14/11/06 00:40:42 INFO SparkHadoopWriter: attempt_201411060040_0000_m_000001_3: Committed 14/11/06 00:40:42 INFO Executor: Finished task 1.0 in stage 0.0 (TID 3). 826 bytes result sent to driver 14/11/06 00:40:42 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 826 bytes result sent to driver 14/11/06 00:40:42 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 395 ms on localhost (1/2) 14/11/06 00:40:42 INFO DAGScheduler: Stage 0 (saveAsTextFile at WordCount.scala:21) finished in 0.387 s 14/11/06 00:40:42 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 395 ms on localhost (2/2) 14/11/06 00:40:42 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/11/06 00:40:42 INFO SparkContext: Job finished: saveAsTextFile at WordCount.scala:21, took 2.649520816 s