Spark SQL快速离线数据分析
拷贝hive-site.xml到spark的conf目录下面
打开spark的conf目录下的hive-site.xml文件
加上这段配置(我这里三个节点的spark都这样配置)
把hive中的mysql连接包放到spark中去
检查spark-env.sh的hadoop配置项
检查dfs是否启动了
启动Mysql服务
启动hive metastore服务
启动hive
创建一个自己的数据库
创建一个表
create table if not exists test(userid string,username string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS textfile;
我们现在本地建一个数据文件
把数据加载进这个表里来
load data local inpath "/opt/datas/kfk.txt" into table test;
我们启动spark
数据拿到了,说明我们的sparkSQL和hive集成是没有问题的
[kfk@bigdata-pro01 ~]$ mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 98 Server version: 5.1.73 Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | metastore | | mysql | | test | +--------------------+ 4 rows in set (0.00 sec) mysql> use test; Database changed mysql> show tables; Empty set (0.00 sec) mysql>
scala> val df=spark.sql("select * from kfk.test") 18/03/19 09:30:53 INFO SparkSqlParser: Parsing command: select * from kfk.test 18/03/19 09:30:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 09:30:53 INFO CatalystSqlParser: Parsing command: string df: org.apache.spark.sql.DataFrame = [userid: string, username: string] scala> import java.util.Properties import java.util.Properties scala> val pro = new Properties() pro: java.util.Properties = {} scala> pro.setProperty("driver","com.mysql.jdbc.Driver") res1: Object = null scala> df.write.jdbc("jdbc:mysql://bigdata-pro01.kfk.com/test?user=root&password=root","spark1",pro) 18/03/19 09:55:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 242.0 KB, free 413.7 MB) 18/03/19 09:55:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.8 KB, free 413.7 MB) 18/03/19 09:55:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.151:34699 (size: 22.8 KB, free: 413.9 MB) 18/03/19 09:55:31 INFO SparkContext: Created broadcast 0 from 18/03/19 09:55:32 INFO FileInputFormat: Total input paths to process : 1 18/03/19 09:55:32 INFO SparkContext: Starting job: jdbc at <console>:29 18/03/19 09:55:32 INFO DAGScheduler: Got job 0 (jdbc at <console>:29) with 1 output partitions 18/03/19 09:55:32 INFO DAGScheduler: Final stage: ResultStage 0 (jdbc at <console>:29) 18/03/19 09:55:32 INFO DAGScheduler: Parents of final stage: List() 18/03/19 09:55:32 INFO DAGScheduler: Missing parents: List() 18/03/19 09:55:32 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at jdbc at <console>:29), which has no missing parents 18/03/19 09:55:32 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 13.3 KB, free 413.7 MB) 18/03/19 09:55:32 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 7.2 KB, free 413.6 MB) 18/03/19 09:55:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.151:34699 (size: 7.2 KB, free: 413.9 MB) 18/03/19 09:55:32 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/03/19 09:55:33 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at jdbc at <console>:29) (first 15 tasks are for partitions Vector(0)) 18/03/19 09:55:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 18/03/19 09:55:33 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 4868 bytes) 18/03/19 09:55:33 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 18/03/19 09:55:33 INFO HadoopRDD: Input split: hdfs://ns/user/hive/warehouse/kfk.db/test/kfk.txt:0+45 18/03/19 09:55:35 INFO TransportClientFactory: Successfully created connection to /192.168.86.151:40256 after 93 ms (0 ms spent in bootstraps) 18/03/19 09:55:35 INFO CodeGenerator: Code generated in 1278.378936 ms 18/03/19 09:55:35 INFO CodeGenerator: Code generated in 63.186243 ms 18/03/19 09:55:35 INFO LazyStruct: Missing fields! Expected 2 fields but only got 1! Ignoring similar problems. 18/03/19 09:55:35 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1142 bytes result sent to driver 18/03/19 09:55:35 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2692 ms on localhost (executor driver) (1/1) 18/03/19 09:55:35 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/03/19 09:55:35 INFO DAGScheduler: ResultStage 0 (jdbc at <console>:29) finished in 2.764 s 18/03/19 09:55:35 INFO DAGScheduler: Job 0 finished: jdbc at <console>:29, took 3.546682 s scala>
在mysql里面我们可以看到多了一个表
我们再启动一下spark-shell
下面这个监控页面是spark-sql的
我们在spark-sql做以下操作
启动spark-thriftserver
输入的是当前的机器的用户名和密码
启动spark-shell
val jdbcDF=spark.read.format("jdbc").option("url","jdbc:mysql://bigdata-pro01.kfk.com:3306/test").option("dbtable","spark1").option("user","root").option("password","root").load()
因为需要hive1.2.1版本的包,所以我们需要下载一个hive1.2.1版本下来把里面的包取出来
这个就是我下载的hive1.2.1,下载地址http://archive.apache.org/dist/hive/hive-1.2.1/
我把所有包都准备好了
把这些包都上传到spark的jar目录下(3个节点都这样做)
启动我们的hbase
再启动之前我们先修改一下jdk版本,原来我们用的是1.7,现在我们修改成1.8的(3个节点都修改)
现在启动我们的hive
启动spark-shell
可以看到报错了
是因为我们的表数据太多了
我们县直一下条数
scala> val df =spark.sql("select count(1) from weblogs").show 18/03/19 18:29:27 INFO SparkSqlParser: Parsing command: select count(1) from weblogs 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:28 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:31 INFO CodeGenerator: Code generated in 607.706437 ms 18/03/19 18:29:31 INFO CodeGenerator: Code generated in 72.215236 ms 18/03/19 18:29:31 INFO ContextCleaner: Cleaned accumulator 0 18/03/19 18:29:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 242.1 KB, free 413.7 MB) 18/03/19 18:29:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 413.7 MB) 18/03/19 18:29:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.151:42725 (size: 22.9 KB, free: 413.9 MB) 18/03/19 18:29:32 INFO SparkContext: Created broadcast 0 from 18/03/19 18:29:33 INFO HBaseStorageHandler: Configuring input job properties 18/03/19 18:29:33 INFO RecoverableZooKeeper: Process identifier=hconnection-0x490dfe25 connecting to ZooKeeper ensemble=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181 18/03/19 18:29:33 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 18/03/19 18:29:33 INFO ZooKeeper: Client environment:host.name=bigdata-pro01.kfk.com 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.version=1.8.0_60 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.home=/opt/modules/jdk1.8.0_60/jre 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.class.path=/opt/modules/spark-2.2.0-bin/conf/:/opt/modules/spark-2.2.0-bin/jars/htrace-core-3.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/jpam-1.1.jar:/opt/modules/spark-2.2.0-bin/jars/mysql-connector-java-5.1.27-bin.jar:/opt/modules/spark-2.2.0-bin/jars/snappy-java-1.1.2.6.jar:/opt/modules/spark-2.2.0-bin/jars/commons-compress-1.4.1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-server-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-sql_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-client-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/jetty-6.1.26.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-databind-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/javolution-5.5.1.jar:/opt/modules/spark-2.2.0-bin/jars/opencsv-2.3.jar:/opt/modules/spark-2.2.0-bin/jars/curator-framework-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-collections-3.2.2.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-jobclient-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-utils-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-graphite-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/pmml-model-1.2.15.jar:/opt/modules/spark-2.2.0-bin/jars/compress-lzf-1.0.3.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-app-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-encoding-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/xz-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-core-3.2.10.jar:/opt/modules/spark-2.2.0-bin/jars/guice-servlet-3.0.jar:/opt/modules/spark-2.2.0-bin/jars/stax-api-1.0-2.jar:/opt/modules/spark-2.2.0-bin/jars/eigenbase-properties-1.1.5.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-jvm-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/stream-2.7.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-mllib-local_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/derby-10.12.1.1.jar:/opt/modules/spark-2.2.0-bin/jars/joda-time-2.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-common-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/ivy-2.4.0.jar:/opt/modules/spark-2.2.0-bin/jars/slf4j-api-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/jetty-util-6.1.26.jar:/opt/modules/spark-2.2.0-bin/jars/shapeless_2.11-2.3.2.jar:/opt/modules/spark-2.2.0-bin/jars/activation-1.1.1.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-module-scala_2.11-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/libthrift-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/log4j-1.2.17.jar:/opt/modules/spark-2.2.0-bin/jars/antlr4-runtime-4.5.3.jar:/opt/modules/spark-2.2.0-bin/jars/chill-java-0.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/snappy-0.2.jar:/opt/modules/spark-2.2.0-bin/jars/core-1.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-annotations-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-container-servlet-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/spark-network-shuffle_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-graphx_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/breeze_2.11-0.13.1.jar:/opt/modules/spark-2.2.0-bin/jars/scala-compiler-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/aopalliance-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/aopalliance-repackaged-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/commons-beanutils-core-1.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/jsr305-1.3.9.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/osgi-resource-locator-1.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/univocity-parsers-2.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/hive-exec-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-crypto-1.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-json-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/minlog-1.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/JavaEWAH-0.3.2.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-jackson_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/javax.ws.rs-api-2.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-dbcp-1.4.jar:/opt/modules/spark-2.2.0-bin/jars/slf4j-log4j12-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/javax.inject-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/scala-xml_2.11-1.0.2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-pool-1.5.4.jar:/opt/modules/spark-2.2.0-bin/jars/jaxb-api-2.2.2.jar:/opt/modules/spark-2.2.0-bin/jars/spark-network-common_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/gson-2.2.4.jar:/opt/modules/spark-2.2.0-bin/jars/protobuf-java-2.5.0.jar:/opt/modules/spark-2.2.0-bin/jars/objenesis-2.1.jar:/opt/modules/spark-2.2.0-bin/jars/hive-metastore-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-container-servlet-core-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/stax-api-1.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/super-csv-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-core-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/scala-parser-combinators_2.11-1.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/apacheds-i18n-2.0.0-M15.jar:/opt/modules/spark-2.2.0-bin/jars/spire_2.11-0.13.0.jar:/opt/modules/spark-2.2.0-bin/jars/xbean-asm5-shaded-4.4.jar:/opt/modules/spark-2.2.0-bin/jars/httpclient-4.5.2.jar:/opt/modules/spark-2.2.0-bin/jars/hive-beeline-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/janino-3.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-beanutils-1.7.0.jar:/opt/modules/spark-2.2.0-bin/jars/javax.annotation-api-1.2.jar:/opt/modules/spark-2.2.0-bin/jars/curator-recipes-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-core-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/paranamer-2.6.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-locator-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/spark-hive_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/bonecp-0.8.0.RELEASE.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-column-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-linq4j-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/commons-cli-1.2.jar:/opt/modules/spark-2.2.0-bin/jars/javax.inject-1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-common-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-tags_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/bcprov-jdk15on-1.51.jar:/opt/modules/spark-2.2.0-bin/jars/stringtemplate-3.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/RoaringBitmap-0.5.11.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-client-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-codec-1.10.jar:/opt/modules/spark-2.2.0-bin/jars/hive-cli-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/scala-reflect-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/jline-2.12.1.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-core-asl-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-server-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/xercesImpl-2.9.1.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-format-2.3.1.jar:/opt/modules/spark-2.2.0-bin/jars/jdo-api-3.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-lang-2.6.jar:/opt/modules/spark-2.2.0-bin/jars/jta-1.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-httpclient-3.1.jar:/opt/modules/spark-2.2.0-bin/jars/pyrolite-4.13.jar:/opt/modules/spark-2.2.0-bin/jars/jul-to-slf4j-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/api-util-1.0.0-M20.jar:/opt/modules/spark-2.2.0-bin/jars/hive-hbase-handler-1.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-math3-3.4.1.jar:/opt/modules/spark-2.2.0-bin/jars/jets3t-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/spark-catalyst_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-jackson-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-annotations-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-server-web-proxy-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/spark-hive-thriftserver_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-hadoop-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/modules/spark-2.2.0-bin/jars/ST4-4.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-mapper-asl-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/machinist_2.11-0.6.1.jar:/opt/modules/spark-2.2.0-bin/jars/spark-mllib_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/scala-library-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/guava-14.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/javassist-3.18.1-GA.jar:/opt/modules/spark-2.2.0-bin/jars/api-asn1-api-1.0.0-M20.jar:/opt/modules/spark-2.2.0-bin/jars/antlr-2.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-module-paranamer-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/curator-client-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/arpack_combined_all-0.1.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-api-jdo-3.2.6.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-avatica-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/avro-mapred-1.7.7-hadoop2.jar:/opt/modules/spark-2.2.0-bin/jars/hive-jdbc-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/breeze-macros_2.11-0.13.1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-protocol-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-core_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/spire-macros_2.11-0.13.0.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-client-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/mx4j-3.0.2.jar:/opt/modules/spark-2.2.0-bin/jars/py4j-0.10.4.jar:/opt/modules/spark-2.2.0-bin/jars/scalap-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-guava-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-media-jaxb-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-configuration-1.6.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-ast_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/htrace-core-2.04.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/kryo-shaded-3.0.3.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-auth-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/commons-compiler-3.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/jtransforms-2.4.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-net-2.2.jar:/opt/modules/spark-2.2.0-bin/jars/jcl-over-slf4j-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/spark-launcher_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-core_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/antlr-runtime-3.4.jar:/opt/modules/spark-2.2.0-bin/jars/spark-repl_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-streaming_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-rdbms-3.2.9.jar:/opt/modules/spark-2.2.0-bin/jars/netty-3.9.9.Final.jar:/opt/modules/spark-2.2.0-bin/jars/lz4-1.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/zookeeper-3.4.6.jar:/opt/modules/spark-2.2.0-bin/jars/java-xmlbuilder-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-common-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/netty-all-4.0.43.Final.jar:/opt/modules/spark-2.2.0-bin/jars/validation-api-1.1.0.Final.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-core-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/avro-ipc-1.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/jodd-core-3.5.2.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-xc-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-client-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/guice-3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-yarn_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-hadoop-bundle-1.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/leveldbjni-all-1.8.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-api-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/javax.servlet-api-3.1.0.jar:/opt/modules/spark-2.2.0-bin/jars/mysql-connector-java-5.1.27.jar:/opt/modules/spark-2.2.0-bin/jars/libfb303-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/httpcore-4.4.4.jar:/opt/modules/spark-2.2.0-bin/jars/chill_2.11-0.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-sketch_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-lang3-3.5.jar:/opt/modules/spark-2.2.0-bin/jars/mail-1.4.7.jar:/opt/modules/spark-2.2.0-bin/jars/apache-log4j-extras-1.2.17.jar:/opt/modules/spark-2.2.0-bin/jars/xmlenc-0.52.jar:/opt/modules/spark-2.2.0-bin/jars/avro-1.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-server-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-api-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-hdfs-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/pmml-schema-1.2.15.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-core-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/spark-unsafe_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/base64-2.3.8.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-jaxrs-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-shuffle-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/oro-2.0.8.jar:/opt/modules/spark-2.2.0-bin/jars/commons-digester-1.8.jar:/opt/modules/spark-2.2.0-bin/jars/commons-io-2.4.jar:/opt/modules/spark-2.2.0-bin/jars/commons-logging-1.1.3.jar:/opt/modules/spark-2.2.0-bin/jars/macro-compat_2.11-1.1.1.jar:/opt/modules/hadoop-2.6.0/etc/hadoop/ 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp 18/03/19 18:29:33 INFO ZooKeeper: Client environment:java.compiler=<NA> 18/03/19 18:29:33 INFO ZooKeeper: Client environment:os.name=Linux 18/03/19 18:29:33 INFO ZooKeeper: Client environment:os.arch=amd64 18/03/19 18:29:33 INFO ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64 18/03/19 18:29:33 INFO ZooKeeper: Client environment:user.name=kfk 18/03/19 18:29:33 INFO ZooKeeper: Client environment:user.home=/home/kfk 18/03/19 18:29:33 INFO ZooKeeper: Client environment:user.dir=/opt/modules/spark-2.2.0-bin 18/03/19 18:29:33 INFO ZooKeeper: Initiating client connection, connectString=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181 sessionTimeout=90000 watcher=hconnection-0x490dfe25, quorum=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181, baseZNode=/hbase 18/03/19 18:29:33 INFO ClientCnxn: Opening socket connection to server bigdata-pro02.kfk.com/192.168.86.152:2181. Will not attempt to authenticate using SASL (unknown error) 18/03/19 18:29:33 INFO ClientCnxn: Socket connection established to bigdata-pro02.kfk.com/192.168.86.152:2181, initiating session 18/03/19 18:29:33 INFO ClientCnxn: Session establishment complete on server bigdata-pro02.kfk.com/192.168.86.152:2181, sessionid = 0x2623d3a0dea0018, negotiated timeout = 40000 18/03/19 18:29:33 INFO RegionSizeCalculator: Calculating region sizes for table "weblogs". 18/03/19 18:29:35 WARN TableInputFormatBase: Cannot resolve the host name for bigdata-pro03.kfk.com/192.168.86.153 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '153.86.168.192.in-addr.arpa' 18/03/19 18:29:35 INFO SparkContext: Starting job: show at <console>:23 18/03/19 18:29:35 INFO DAGScheduler: Registering RDD 5 (show at <console>:23) 18/03/19 18:29:35 INFO DAGScheduler: Got job 0 (show at <console>:23) with 1 output partitions 18/03/19 18:29:35 INFO DAGScheduler: Final stage: ResultStage 1 (show at <console>:23) 18/03/19 18:29:35 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/03/19 18:29:35 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/03/19 18:29:35 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at show at <console>:23), which has no missing parents 18/03/19 18:29:36 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 18.9 KB, free 413.6 MB) 18/03/19 18:29:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 9.9 KB, free 413.6 MB) 18/03/19 18:29:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.151:42725 (size: 9.9 KB, free: 413.9 MB) 18/03/19 18:29:36 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/03/19 18:29:36 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at show at <console>:23) (first 15 tasks are for partitions Vector(0)) 18/03/19 18:29:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 18/03/19 18:29:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 4894 bytes) 18/03/19 18:29:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 18/03/19 18:29:36 INFO HadoopRDD: Input split: bigdata-pro03.kfk.com:, 18/03/19 18:29:36 INFO TableInputFormatBase: Input split length: 7 M bytes. 18/03/19 18:29:37 INFO TransportClientFactory: Successfully created connection to /192.168.86.151:34328 after 117 ms (0 ms spent in bootstraps) 18/03/19 18:29:37 INFO CodeGenerator: Code generated in 377.251736 ms 18/03/19 18:29:37 ERROR ExecutorClassLoader: Failed to check existence of class HBase Counters on REPL class server at spark://192.168.86.151:34328/classes java.net.URISyntaxException: Illegal character in path at index 42: spark://192.168.86.151:34328/classes/HBase Counters.class at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.parseHierarchical(URI.java:3105) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324) at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640) at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1419) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1419) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361) at java.util.ResourceBundle.getBundle(ResourceBundle.java:1082) at org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37) at org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) at org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77) at org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:226) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:153) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:285) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:273) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:241) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:154) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:113) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:203) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/03/19 18:29:37 ERROR ExecutorClassLoader: Failed to check existence of class HBase Counters_en on REPL class server at spark://192.168.86.151:34328/classes java.net.URISyntaxException: Illegal character in path at index 42: spark://192.168.86.151:34328/classes/HBase Counters_en.class at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.parseHierarchical(URI.java:3105) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324) at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640) at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1419) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361) at java.util.ResourceBundle.getBundle(ResourceBundle.java:1082) at org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37) at org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) at org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77) at org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:226) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:153) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:285) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:273) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:241) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:154) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:113) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:203) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/03/19 18:29:37 ERROR ExecutorClassLoader: Failed to check existence of class HBase Counters_en_US on REPL class server at spark://192.168.86.151:34328/classes java.net.URISyntaxException: Illegal character in path at index 42: spark://192.168.86.151:34328/classes/HBase Counters_en_US.class at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.parseHierarchical(URI.java:3105) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:324) at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:90) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:162) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:80) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640) at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361) at java.util.ResourceBundle.getBundle(ResourceBundle.java:1082) at org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37) at org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56) at org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77) at org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94) at org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:226) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:153) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110) at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:285) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:273) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:241) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:154) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.next(HiveHBaseTableInputFormat.java:113) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:203) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/03/19 18:29:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1633 bytes result sent to driver 18/03/19 18:29:38 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1714 ms on localhost (executor driver) (1/1) 18/03/19 18:29:38 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/03/19 18:29:38 INFO DAGScheduler: ShuffleMapStage 0 (show at <console>:23) finished in 1.772 s 18/03/19 18:29:38 INFO DAGScheduler: looking for newly runnable stages 18/03/19 18:29:38 INFO DAGScheduler: running: Set() 18/03/19 18:29:38 INFO DAGScheduler: waiting: Set(ResultStage 1) 18/03/19 18:29:38 INFO DAGScheduler: failed: Set() 18/03/19 18:29:38 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[8] at show at <console>:23), which has no missing parents 18/03/19 18:29:38 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 6.9 KB, free 413.6 MB) 18/03/19 18:29:38 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.7 KB, free 413.6 MB) 18/03/19 18:29:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.86.151:42725 (size: 3.7 KB, free: 413.9 MB) 18/03/19 18:29:38 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 18/03/19 18:29:38 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[8] at show at <console>:23) (first 15 tasks are for partitions Vector(0)) 18/03/19 18:29:38 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 18/03/19 18:29:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, ANY, 4726 bytes) 18/03/19 18:29:38 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 18/03/19 18:29:38 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 18/03/19 18:29:38 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 24 ms 18/03/19 18:29:38 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1557 bytes result sent to driver 18/03/19 18:29:38 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 167 ms on localhost (executor driver) (1/1) 18/03/19 18:29:38 INFO DAGScheduler: ResultStage 1 (show at <console>:23) finished in 0.166 s 18/03/19 18:29:38 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/03/19 18:29:38 INFO DAGScheduler: Job 0 finished: show at <console>:23, took 2.618861 s 18/03/19 18:29:38 INFO CodeGenerator: Code generated in 53.284282 ms +--------+ |count(1)| +--------+ | 15814| +--------+ df: Unit = () scala> val df =spark.sql("select * from weblogs limit 10").show 18/03/19 18:29:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.86.151:42725 in memory (size: 3.7 KB, free: 413.9 MB) 18/03/19 18:29:53 INFO SparkSqlParser: Parsing command: select * from weblogs limit 10 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO CatalystSqlParser: Parsing command: string 18/03/19 18:29:53 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 242.3 KB, free 413.4 MB) 18/03/19 18:29:53 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 23.0 KB, free 413.4 MB) 18/03/19 18:29:53 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.86.151:42725 (size: 23.0 KB, free: 413.9 MB) 18/03/19 18:29:53 INFO SparkContext: Created broadcast 3 from 18/03/19 18:29:54 INFO HBaseStorageHandler: Configuring input job properties 18/03/19 18:29:54 INFO RegionSizeCalculator: Calculating region sizes for table "weblogs". 18/03/19 18:29:55 WARN TableInputFormatBase: Cannot resolve the host name for bigdata-pro03.kfk.com/192.168.86.153 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '153.86.168.192.in-addr.arpa' 18/03/19 18:29:55 INFO SparkContext: Starting job: show at <console>:23 18/03/19 18:29:55 INFO DAGScheduler: Got job 1 (show at <console>:23) with 1 output partitions 18/03/19 18:29:55 INFO DAGScheduler: Final stage: ResultStage 2 (show at <console>:23) 18/03/19 18:29:55 INFO DAGScheduler: Parents of final stage: List() 18/03/19 18:29:55 INFO DAGScheduler: Missing parents: List() 18/03/19 18:29:55 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[13] at show at <console>:23), which has no missing parents 18/03/19 18:29:55 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 15.8 KB, free 413.4 MB) 18/03/19 18:29:55 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 8.4 KB, free 413.4 MB) 18/03/19 18:29:55 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.86.151:42725 (size: 8.4 KB, free: 413.9 MB) 18/03/19 18:29:55 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006 18/03/19 18:29:55 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[13] at show at <console>:23) (first 15 tasks are for partitions Vector(0)) 18/03/19 18:29:55 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks 18/03/19 18:29:55 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, ANY, 4905 bytes) 18/03/19 18:29:55 INFO Executor: Running task 0.0 in stage 2.0 (TID 2) 18/03/19 18:29:55 INFO HadoopRDD: Input split: bigdata-pro03.kfk.com:, 18/03/19 18:29:55 INFO TableInputFormatBase: Input split length: 7 M bytes. 18/03/19 18:29:55 INFO CodeGenerator: Code generated in 94.66518 ms 18/03/19 18:29:55 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1685 bytes result sent to driver 18/03/19 18:29:55 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 368 ms on localhost (executor driver) (1/1) 18/03/19 18:29:55 INFO DAGScheduler: ResultStage 2 (show at <console>:23) finished in 0.361 s 18/03/19 18:29:55 INFO DAGScheduler: Job 1 finished: show at <console>:23, took 0.421794 s 18/03/19 18:29:55 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 18/03/19 18:29:55 INFO CodeGenerator: Code generated in 95.678505 ms +--------------------+--------+-------------------+----------+--------+--------+--------------------+ | id|datatime| userid|searchname|retorder|cliorder| cliurl| +--------------------+--------+-------------------+----------+--------+--------+--------------------+ |00127896594320375...|00:05:51|0012789659432037581| [蒙古国地图]| 1| 3|maps.blogtt.com/m...| |00127896594320375...|00:05:51|0012789659432037581| [蒙古国地图]| 1| 3|maps.blogtt.com/m...| |00143454165872647...|00:05:46|0014345416587264736| [ppg]| 3| 2|www.ppg.cn/yesppg...| |00143454165872647...|00:05:46|0014345416587264736| [ppg]| 3| 2|www.ppg.cn/yesppg...| |00143621727586595...|00:02:09|0014362172758659586| [明星合成]| 78| 24|scsdcsadwa.blog.s...| |00143621727586595...|00:02:09|0014362172758659586| [明星合成]| 78| 24|scsdcsadwa.blog.s...| |00143621727586595...|00:02:28|0014362172758659586| [明星合成]| 82| 26| av.avbox.us/| |00143621727586595...|00:02:28|0014362172758659586| [明星合成]| 82| 26| av.avbox.us/| |00143621727586595...|00:02:44|0014362172758659586| [明星合成]| 83| 27|csdfhnuop.blog.so...| |00143621727586595...|00:02:44|0014362172758659586| [明星合成]| 83| 27|csdfhnuop.blog.so...| +--------------------+--------+-------------------+----------+--------+--------+--------------------+ df: Unit = () scala> 18/03/19 18:33:57 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 192.168.86.151:42725 in memory (size: 8.4 KB, free: 413.9 MB) 18/03/19 18:33:57 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.86.151:42725 in memory (size: 23.0 KB, free: 413.9 MB) 18/03/19 18:33:57 INFO ContextCleaner: Cleaned accumulator 85
我们基于集群模式启动spark-shell
scala> spark.sql("select count(1) from weblogs").show 18/03/19 21:26:13 INFO SparkSqlParser: Parsing command: select count(1) from weblogs 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:15 INFO CatalystSqlParser: Parsing command: string 18/03/19 21:26:18 INFO ContextCleaner: Cleaned accumulator 0 18/03/19 21:26:18 INFO CodeGenerator: Code generated in 1043.849585 ms 18/03/19 21:26:18 INFO CodeGenerator: Code generated in 79.914587 ms 18/03/19 21:26:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 242.2 KB, free 413.7 MB) 18/03/19 21:26:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 413.7 MB) 18/03/19 21:26:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.151:35979 (size: 22.9 KB, free: 413.9 MB) 18/03/19 21:26:20 INFO SparkContext: Created broadcast 0 from 18/03/19 21:26:21 INFO HBaseStorageHandler: Configuring input job properties 18/03/19 21:26:21 INFO RecoverableZooKeeper: Process identifier=hconnection-0x25131637 connecting to ZooKeeper ensemble=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181 18/03/19 21:26:21 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 18/03/19 21:26:21 INFO ZooKeeper: Client environment:host.name=bigdata-pro01.kfk.com 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.version=1.8.0_60 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.home=/opt/modules/jdk1.8.0_60/jre 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.class.path=/opt/modules/spark-2.2.0-bin/conf/:/opt/modules/spark-2.2.0-bin/jars/htrace-core-3.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/jpam-1.1.jar:/opt/modules/spark-2.2.0-bin/jars/mysql-connector-java-5.1.27-bin.jar:/opt/modules/spark-2.2.0-bin/jars/snappy-java-1.1.2.6.jar:/opt/modules/spark-2.2.0-bin/jars/commons-compress-1.4.1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-server-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-sql_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-client-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/jetty-6.1.26.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-databind-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/javolution-5.5.1.jar:/opt/modules/spark-2.2.0-bin/jars/opencsv-2.3.jar:/opt/modules/spark-2.2.0-bin/jars/curator-framework-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-collections-3.2.2.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-jobclient-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-utils-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-graphite-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/pmml-model-1.2.15.jar:/opt/modules/spark-2.2.0-bin/jars/compress-lzf-1.0.3.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-app-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-encoding-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/xz-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-core-3.2.10.jar:/opt/modules/spark-2.2.0-bin/jars/guice-servlet-3.0.jar:/opt/modules/spark-2.2.0-bin/jars/stax-api-1.0-2.jar:/opt/modules/spark-2.2.0-bin/jars/eigenbase-properties-1.1.5.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-jvm-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/stream-2.7.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-mllib-local_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/derby-10.12.1.1.jar:/opt/modules/spark-2.2.0-bin/jars/joda-time-2.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-common-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/ivy-2.4.0.jar:/opt/modules/spark-2.2.0-bin/jars/slf4j-api-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/jetty-util-6.1.26.jar:/opt/modules/spark-2.2.0-bin/jars/shapeless_2.11-2.3.2.jar:/opt/modules/spark-2.2.0-bin/jars/activation-1.1.1.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-module-scala_2.11-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/libthrift-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/log4j-1.2.17.jar:/opt/modules/spark-2.2.0-bin/jars/antlr4-runtime-4.5.3.jar:/opt/modules/spark-2.2.0-bin/jars/chill-java-0.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/snappy-0.2.jar:/opt/modules/spark-2.2.0-bin/jars/core-1.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-annotations-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-container-servlet-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/spark-network-shuffle_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-graphx_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/breeze_2.11-0.13.1.jar:/opt/modules/spark-2.2.0-bin/jars/scala-compiler-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/aopalliance-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/aopalliance-repackaged-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/commons-beanutils-core-1.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/jsr305-1.3.9.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/osgi-resource-locator-1.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/univocity-parsers-2.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/hive-exec-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-crypto-1.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-json-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/minlog-1.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/JavaEWAH-0.3.2.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-jackson_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/javax.ws.rs-api-2.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-dbcp-1.4.jar:/opt/modules/spark-2.2.0-bin/jars/slf4j-log4j12-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/javax.inject-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/scala-xml_2.11-1.0.2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-pool-1.5.4.jar:/opt/modules/spark-2.2.0-bin/jars/jaxb-api-2.2.2.jar:/opt/modules/spark-2.2.0-bin/jars/spark-network-common_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/gson-2.2.4.jar:/opt/modules/spark-2.2.0-bin/jars/protobuf-java-2.5.0.jar:/opt/modules/spark-2.2.0-bin/jars/objenesis-2.1.jar:/opt/modules/spark-2.2.0-bin/jars/hive-metastore-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-container-servlet-core-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/stax-api-1.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/super-csv-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/metrics-core-3.1.2.jar:/opt/modules/spark-2.2.0-bin/jars/scala-parser-combinators_2.11-1.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/apacheds-i18n-2.0.0-M15.jar:/opt/modules/spark-2.2.0-bin/jars/spire_2.11-0.13.0.jar:/opt/modules/spark-2.2.0-bin/jars/xbean-asm5-shaded-4.4.jar:/opt/modules/spark-2.2.0-bin/jars/httpclient-4.5.2.jar:/opt/modules/spark-2.2.0-bin/jars/hive-beeline-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/janino-3.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-beanutils-1.7.0.jar:/opt/modules/spark-2.2.0-bin/jars/javax.annotation-api-1.2.jar:/opt/modules/spark-2.2.0-bin/jars/curator-recipes-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-core-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/paranamer-2.6.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-locator-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/spark-hive_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/bonecp-0.8.0.RELEASE.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-column-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-linq4j-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/commons-cli-1.2.jar:/opt/modules/spark-2.2.0-bin/jars/javax.inject-1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-common-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-tags_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/bcprov-jdk15on-1.51.jar:/opt/modules/spark-2.2.0-bin/jars/stringtemplate-3.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/RoaringBitmap-0.5.11.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-client-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-codec-1.10.jar:/opt/modules/spark-2.2.0-bin/jars/hive-cli-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/scala-reflect-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/jline-2.12.1.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-core-asl-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-server-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/xercesImpl-2.9.1.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-format-2.3.1.jar:/opt/modules/spark-2.2.0-bin/jars/jdo-api-3.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-lang-2.6.jar:/opt/modules/spark-2.2.0-bin/jars/jta-1.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-httpclient-3.1.jar:/opt/modules/spark-2.2.0-bin/jars/pyrolite-4.13.jar:/opt/modules/spark-2.2.0-bin/jars/jul-to-slf4j-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/api-util-1.0.0-M20.jar:/opt/modules/spark-2.2.0-bin/jars/hive-hbase-handler-1.2.1.jar:/opt/modules/spark-2.2.0-bin/jars/commons-math3-3.4.1.jar:/opt/modules/spark-2.2.0-bin/jars/jets3t-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/spark-catalyst_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-jackson-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-annotations-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-server-web-proxy-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/spark-hive-thriftserver_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-hadoop-1.8.2.jar:/opt/modules/spark-2.2.0-bin/jars/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/modules/spark-2.2.0-bin/jars/ST4-4.0.4.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-mapper-asl-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/machinist_2.11-0.6.1.jar:/opt/modules/spark-2.2.0-bin/jars/spark-mllib_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/scala-library-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/guava-14.0.1.jar:/opt/modules/spark-2.2.0-bin/jars/javassist-3.18.1-GA.jar:/opt/modules/spark-2.2.0-bin/jars/api-asn1-api-1.0.0-M20.jar:/opt/modules/spark-2.2.0-bin/jars/antlr-2.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-module-paranamer-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/curator-client-2.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/arpack_combined_all-0.1.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-api-jdo-3.2.6.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-avatica-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/avro-mapred-1.7.7-hadoop2.jar:/opt/modules/spark-2.2.0-bin/jars/hive-jdbc-1.2.1.spark2.jar:/opt/modules/spark-2.2.0-bin/jars/breeze-macros_2.11-0.13.1.jar:/opt/modules/spark-2.2.0-bin/jars/hbase-protocol-0.98.6-cdh5.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-core_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/spire-macros_2.11-0.13.0.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-client-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/mx4j-3.0.2.jar:/opt/modules/spark-2.2.0-bin/jars/py4j-0.10.4.jar:/opt/modules/spark-2.2.0-bin/jars/scalap-2.11.8.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-guava-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-media-jaxb-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/commons-configuration-1.6.jar:/opt/modules/spark-2.2.0-bin/jars/json4s-ast_2.11-3.2.11.jar:/opt/modules/spark-2.2.0-bin/jars/htrace-core-2.04.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/kryo-shaded-3.0.3.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-auth-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/commons-compiler-3.0.0.jar:/opt/modules/spark-2.2.0-bin/jars/jtransforms-2.4.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-net-2.2.jar:/opt/modules/spark-2.2.0-bin/jars/jcl-over-slf4j-1.7.16.jar:/opt/modules/spark-2.2.0-bin/jars/spark-launcher_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-core_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/antlr-runtime-3.4.jar:/opt/modules/spark-2.2.0-bin/jars/spark-repl_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-streaming_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/datanucleus-rdbms-3.2.9.jar:/opt/modules/spark-2.2.0-bin/jars/netty-3.9.9.Final.jar:/opt/modules/spark-2.2.0-bin/jars/lz4-1.3.0.jar:/opt/modules/spark-2.2.0-bin/jars/zookeeper-3.4.6.jar:/opt/modules/spark-2.2.0-bin/jars/java-xmlbuilder-1.0.jar:/opt/modules/spark-2.2.0-bin/jars/jersey-common-2.22.2.jar:/opt/modules/spark-2.2.0-bin/jars/netty-all-4.0.43.Final.jar:/opt/modules/spark-2.2.0-bin/jars/validation-api-1.1.0.Final.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-core-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/avro-ipc-1.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/jodd-core-3.5.2.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-xc-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-client-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/guice-3.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-yarn_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/parquet-hadoop-bundle-1.6.0.jar:/opt/modules/spark-2.2.0-bin/jars/leveldbjni-all-1.8.jar:/opt/modules/spark-2.2.0-bin/jars/hk2-api-2.4.0-b34.jar:/opt/modules/spark-2.2.0-bin/jars/javax.servlet-api-3.1.0.jar:/opt/modules/spark-2.2.0-bin/jars/mysql-connector-java-5.1.27.jar:/opt/modules/spark-2.2.0-bin/jars/libfb303-0.9.3.jar:/opt/modules/spark-2.2.0-bin/jars/httpcore-4.4.4.jar:/opt/modules/spark-2.2.0-bin/jars/chill_2.11-0.8.0.jar:/opt/modules/spark-2.2.0-bin/jars/spark-sketch_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/commons-lang3-3.5.jar:/opt/modules/spark-2.2.0-bin/jars/mail-1.4.7.jar:/opt/modules/spark-2.2.0-bin/jars/apache-log4j-extras-1.2.17.jar:/opt/modules/spark-2.2.0-bin/jars/xmlenc-0.52.jar:/opt/modules/spark-2.2.0-bin/jars/avro-1.7.7.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-server-common-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-yarn-api-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-hdfs-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/pmml-schema-1.2.15.jar:/opt/modules/spark-2.2.0-bin/jars/calcite-core-1.2.0-incubating.jar:/opt/modules/spark-2.2.0-bin/jars/spark-unsafe_2.11-2.2.0.jar:/opt/modules/spark-2.2.0-bin/jars/base64-2.3.8.jar:/opt/modules/spark-2.2.0-bin/jars/jackson-jaxrs-1.9.13.jar:/opt/modules/spark-2.2.0-bin/jars/hadoop-mapreduce-client-shuffle-2.6.5.jar:/opt/modules/spark-2.2.0-bin/jars/oro-2.0.8.jar:/opt/modules/spark-2.2.0-bin/jars/commons-digester-1.8.jar:/opt/modules/spark-2.2.0-bin/jars/commons-io-2.4.jar:/opt/modules/spark-2.2.0-bin/jars/commons-logging-1.1.3.jar:/opt/modules/spark-2.2.0-bin/jars/macro-compat_2.11-1.1.1.jar:/opt/modules/hadoop-2.6.0/etc/hadoop/ 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp 18/03/19 21:26:21 INFO ZooKeeper: Client environment:java.compiler=<NA> 18/03/19 21:26:21 INFO ZooKeeper: Client environment:os.name=Linux 18/03/19 21:26:21 INFO ZooKeeper: Client environment:os.arch=amd64 18/03/19 21:26:21 INFO ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64 18/03/19 21:26:21 INFO ZooKeeper: Client environment:user.name=kfk 18/03/19 21:26:21 INFO ZooKeeper: Client environment:user.home=/home/kfk 18/03/19 21:26:21 INFO ZooKeeper: Client environment:user.dir=/opt/modules/spark-2.2.0-bin 18/03/19 21:26:21 INFO ZooKeeper: Initiating client connection, connectString=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181 sessionTimeout=90000 watcher=hconnection-0x25131637, quorum=bigdata-pro02.kfk.com:2181,bigdata-pro01.kfk.com:2181,bigdata-pro03.kfk.com:2181, baseZNode=/hbase 18/03/19 21:26:21 INFO ClientCnxn: Opening socket connection to server bigdata-pro01.kfk.com/192.168.86.151:2181. Will not attempt to authenticate using SASL (unknown error) 18/03/19 21:26:21 INFO ClientCnxn: Socket connection established to bigdata-pro01.kfk.com/192.168.86.151:2181, initiating session 18/03/19 21:26:22 INFO ClientCnxn: Session establishment complete on server bigdata-pro01.kfk.com/192.168.86.151:2181, sessionid = 0x1623bd7ca740014, negotiated timeout = 40000 18/03/19 21:26:22 INFO RegionSizeCalculator: Calculating region sizes for table "weblogs". 18/03/19 21:26:54 WARN TableInputFormatBase: Cannot resolve the host name for bigdata-pro03.kfk.com/192.168.86.153 because of javax.naming.CommunicationException: DNS error [Root exception is java.net.SocketTimeoutException: Receive timed out]; remaining name '153.86.168.192.in-addr.arpa' 18/03/19 21:26:54 INFO SparkContext: Starting job: show at <console>:24 18/03/19 21:26:54 INFO DAGScheduler: Registering RDD 5 (show at <console>:24) 18/03/19 21:26:54 INFO DAGScheduler: Got job 0 (show at <console>:24) with 1 output partitions 18/03/19 21:26:54 INFO DAGScheduler: Final stage: ResultStage 1 (show at <console>:24) 18/03/19 21:26:54 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/03/19 21:26:54 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/03/19 21:26:54 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at show at <console>:24), which has no missing parents 18/03/19 21:26:54 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 18.9 KB, free 413.6 MB) 18/03/19 21:26:54 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 9.9 KB, free 413.6 MB) 18/03/19 21:26:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.151:35979 (size: 9.9 KB, free: 413.9 MB) 18/03/19 21:26:54 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/03/19 21:26:55 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at show at <console>:24) (first 15 tasks are for partitions Vector(0)) 18/03/19 21:26:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 18/03/19 21:26:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.86.153, executor 0, partition 0, ANY, 4898 bytes) 18/03/19 21:26:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.86.153:43492 (size: 9.9 KB, free: 413.9 MB) 18/03/19 21:26:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.153:43492 (size: 22.9 KB, free: 413.9 MB) 18/03/19 21:27:12 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 17658 ms on 192.168.86.153 (executor 0) (1/1) 18/03/19 21:27:12 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/03/19 21:27:12 INFO DAGScheduler: ShuffleMapStage 0 (show at <console>:24) finished in 17.708 s 18/03/19 21:27:12 INFO DAGScheduler: looking for newly runnable stages 18/03/19 21:27:12 INFO DAGScheduler: running: Set() 18/03/19 21:27:12 INFO DAGScheduler: waiting: Set(ResultStage 1) 18/03/19 21:27:12 INFO DAGScheduler: failed: Set() 18/03/19 21:27:12 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[8] at show at <console>:24), which has no missing parents 18/03/19 21:27:12 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 6.9 KB, free 413.6 MB) 18/03/19 21:27:12 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.7 KB, free 413.6 MB) 18/03/19 21:27:12 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.86.151:35979 (size: 3.7 KB, free: 413.9 MB) 18/03/19 21:27:12 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 18/03/19 21:27:12 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[8] at show at <console>:24) (first 15 tasks are for partitions Vector(0)) 18/03/19 21:27:12 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 18/03/19 21:27:12 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 192.168.86.153, executor 0, partition 0, NODE_LOCAL, 4730 bytes) 18/03/19 21:27:13 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.86.153:43492 (size: 3.7 KB, free: 413.9 MB) 18/03/19 21:27:13 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.86.153:46380 18/03/19 21:27:13 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 145 bytes 18/03/19 21:27:13 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 688 ms on 192.168.86.153 (executor 0) (1/1) 18/03/19 21:27:13 INFO DAGScheduler: ResultStage 1 (show at <console>:24) finished in 0.695 s 18/03/19 21:27:13 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/03/19 21:27:13 INFO DAGScheduler: Job 0 finished: show at <console>:24, took 19.203658 s 18/03/19 21:27:13 INFO CodeGenerator: Code generated in 57.823649 ms +--------+ |count(1)| +--------+ | 15814| +--------+ scala>
我们在hive里面查看一下统计的条数
可以看到统计的结果是一样的,证明我们的spark和hbase的集成是成功了