摘要:
阅读全文
03 2015 档案
摘要:今天测试过程中发现YARN Node变成Unhealthy了,后来定位到硬盘空间不够。。。。。通过查找大于100M的文件时发现有N多个spark-assembly-1.4.0-SNAPSHOT-hadoop2.5.0-cdh5.3.1.jar包,大小为170多M,每提交一个application到y...
阅读全文
摘要:CREATE TEMPORARY TABLE spark_tblsUSING org.apache.spark.sql.jdbcOPTIONS (url 'jdbc:mysql://hadoop000:3306/hive?user=root&password=root',dbtable ...
阅读全文
摘要:在编译spark1.3.0时:export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"mvn clean package -DskipTests -Phadoop-2.4 -Dhadoop.versi...
阅读全文
摘要:启动hiveserver2:hiveserver2 --hiveconf hive.execution.engine=spark spark.master=yarn使用beeline连接hiveserver2:beeline -u jdbc:hive2://hadoop000:10000 -n sp...
阅读全文
摘要:Hive现有支持的执行引擎有mr和tez,默认的执行引擎是mr,Hive On Spark的目的是添加一个spark的执行引擎,让hive能跑在spark之上;在执行hive ql脚本之前指定执行引擎、spark.home、spark.masterset hive.execution.engine=...
阅读全文
摘要:Spark源码编译与环境搭建Note that you must have a version of Spark which does not include the Hive jars;Spark编译:git clone https://github.com/apache/spark.git sp...
阅读全文
摘要:subtractReturn an RDD with the elements from `this` that are not in `other` . def subtract(other: RDD[T]): RDD[T]def subtract(other: RDD[T], numParti...
阅读全文
摘要:Hive中已经存在emp和dept表:select * from emp;+--------+---------+------------+-------+-------------+---------+---------+---------+| empno | ename | job ...
阅读全文