Spark笔记
[root@master bin]# ./spark-shell --master yarn-client 20/03/21 02:48:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/03/21 02:48:40 INFO spark.SecurityManager: Changing view acls to: root 20/03/21 02:48:40 INFO spark.SecurityManager: Changing modify acls to: root 20/03/21 02:48:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 20/03/21 02:48:40 INFO spark.HttpServer: Starting HTTP Server 20/03/21 02:48:41 INFO server.Server: jetty-8.y.z-SNAPSHOT 20/03/21 02:48:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50659 20/03/21 02:48:41 INFO util.Utils: Successfully started service 'HTTP class server' on port 50659. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45) Type in expressions to have them evaluated. Type :help for more information. 20/03/21 02:48:52 INFO spark.SparkContext: Running Spark version 1.6.0 20/03/21 02:48:52 INFO spark.SecurityManager: Changing view acls to: root 20/03/21 02:48:52 INFO spark.SecurityManager: Changing modify acls to: root 20/03/21 02:48:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 20/03/21 02:48:53 INFO util.Utils: Successfully started service 'sparkDriver' on port 38244. 20/03/21 02:48:54 INFO slf4j.Slf4jLogger: Slf4jLogger started 20/03/21 02:48:54 INFO Remoting: Starting remoting 20/03/21 02:48:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.90:44009] 20/03/21 02:48:55 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 44009. 20/03/21 02:48:55 INFO spark.SparkEnv: Registering MapOutputTracker 20/03/21 02:48:55 INFO spark.SparkEnv: Registering BlockManagerMaster 20/03/21 02:48:55 INFO storage.DiskBlockManager: Created local directory at /usr/local/src/spark-1.6.0-bin-hadoop2.6/blockmgr-bae586ec-9c91-4d9b-bbec-682c94e5fe14 20/03/21 02:48:55 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB 20/03/21 02:48:56 INFO spark.SparkEnv: Registering OutputCommitCoordinator 20/03/21 02:48:56 INFO server.Server: jetty-8.y.z-SNAPSHOT 20/03/21 02:48:56 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 20/03/21 02:48:56 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 20/03/21 02:48:56 INFO ui.SparkUI: Started SparkUI at http://192.168.0.90:4040 20/03/21 02:48:57 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.90:8032 20/03/21 02:48:58 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers 20/03/21 02:48:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 20/03/21 02:48:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 20/03/21 02:48:58 INFO yarn.Client: Setting up container launch context for our AM 20/03/21 02:48:58 INFO yarn.Client: Setting up the launch environment for our AM container 20/03/21 02:48:58 INFO yarn.Client: Preparing resources for our AM container 20/03/21 02:49:00 INFO yarn.Client: Uploading resource file:/usr/local/src/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://master:9000/user/root/.sparkStaging/application_1584680108277_0036/spark-assembly-1.6.0-hadoop2.6.0.jar
scala> lines.flatMap(_.split(" ")) res25: List[String] = List(Preface, “The, Forsyte, Saga”, was, the, title, originally, destined, for, that, part, of, it, which, is, called, “The, Man, of, Property”;, and, to, adopt, it, for, the, collected, chronicles, of, the, Forsyte, family, has, indulged, the, Forsytean, tenacity, that, is, in, all, of, us., The, word, Saga, might, be, objected, to, on, the, ground, that, it, connotes, the, heroic, and, that, there, is, little, heroism, in, these, pages., But, it, is, used, with, a, suitable, irony;, and,, after, all,, this, long, tale,, though, it, may, deal, with, folk, in, frock, coats,, furbelows,, and, a, gilt-edged, period,, is, not, devoid, of, the, essential, heat, of, conflict., Discounting, for, the, gigantic, stature, and, blood-thirstiness, of, old, days,, as, they, ha... scala> lines.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues((_.size)).toArray.sortWith(_._2>_._2).slice(0,10) res26: Array[(String, Int)] = Array((the,5144), (of,3407), (to,2782), (and,2573), (a,2543), (he,2139), (his,1912), (was,1702), (in,1694), (had,1526))
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45) Type in expressions to have them evaluated. Type :help for more information. scala> import spark.sql import spark.sql scala> val orders = sql("select * from badou.orders") 20/03/22 10:23:16 ERROR metastore.ObjectStore: Version information found in metastore differs 0.13.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version. 20/03/22 10:23:20 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy17.create_database(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy18.createDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:309) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269) at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:308) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72) at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51) at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49) at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24) at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29) at $line18.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31) at $line18.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:33) at $line18.$read$$iw$$iw$$iw$$iw.<init>(<console>:35) at $line18.$read$$iw$$iw$$iw.<init>(<console>:37) at $line18.$read$$iw$$iw.<init>(<console>:39) at $line18.$read$$iw.<init>(<console>:41) at $line18.$read.<init>(<console>:43) at $line18.$read$.<init>(<console>:47) at $line18.$read$.<clinit>(<console>) at $line18.$eval$.$print$lzycompute(<console>:7) at $line18.$eval$.$print(<console>:6) at $line18.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:68) at org.apache.spark.repl.Main$.main(Main.scala:51) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) orders: org.apache.spark.sql.DataFrame = [order_id: string, user_id: string ... 5 more fields] scala> orders.show(10) +--------+-------+--------+------------+---------+-----------------+----------------------+ |order_id|user_id|eval_set|order_number|order_dow|order_hour_of_day|days_since_prior_order| +--------+-------+--------+------------+---------+-----------------+----------------------+ |order_id|user_id|eval_set|order_number|order_dow|order_hour_of_day| days_since_prior_...| | 2539329| 1| prior| 1| 2| 08| | | 2398795| 1| prior| 2| 3| 07| 15.0| | 473747| 1| prior| 3| 3| 12| 21.0| | 2254736| 1| prior| 4| 4| 07| 29.0| | 431534| 1| prior| 5| 4| 15| 28.0| | 3367565| 1| prior| 6| 2| 07| 19.0| | 550135| 1| prior| 7| 1| 09| 20.0| | 3108588| 1| prior| 8| 1| 14| 14.0| | 2295261| 1| prior| 9| 1| 16| 0.0| +--------+-------+--------+------------+---------+-----------------+----------------------+ only showing top 10 rows
########## 今天的苦逼是为了不这样一直苦逼下去!##########