[mac] hadoop hive hbase spark 安装琐碎
所有项目都来自cdh5.8.0
hadoop
每次重启机器后启动hadoop,发现http://localhost:50070访问不了,jps发现namenode没有启动,查看
~/opt/cdh5/hadoop-2.6.0-cdh5.8.0/logs/hadoop-fanhuan-namenode-fanhuandeMacBook-Pro.local.log
日志发现报错:
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory
/private/tmp/hadoop-fanhuan/dfs/name is in an inconsistent
state:
storage directory does not exist or
is not accessible.
core-site.xml增加配置
《property》
《/property》
如果将hadoop配置成伪分布模式,则Hadoop会将各种信息存入\tmp目录中,所以当系统重启之后,这些信息会丢失,使得用户不得不重新执行hadoop
namenode
-format命令。为了避免这种情况,可以在hdfs-site.xml文件中添加一个属性,属性名为dfs.name.dir,值为你想存的目录,只要不存在\tmp下,就不会遇到每次重启之后元数据丢失的情况。
启动时报警
WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform…
using builtin-Java classes where applicable。
需要编译hadoop native 也可以网上下载 放到$HADOOP_HOME/lib/native下
hive
启动hive –service hwi 报错
ls:/Users/fanhuan/opt/cdh5/hive-1.1.0-cdh5.8.0/lib/hive-hwi-*.war:
No such file or directory
下载hive源码,进入hwi目录,编译war包
jar cfM hive-hwi-1.1.1.war -C web .
讲生成的war cp到 $HIVE_HOME/lib下
修改hive-site.xml
《property》
《/property》
就可以访问 http://localhost:9999/hwi
有如下报错
Unable to find a javac compiler;
com.sun.tools.javac.Main is not on the classpath.
Perhaps JAVA_HOME does not point to the JDK.
It is currently set to
"/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre"
运行如下命令,即可。
ln -s $JAVA_HOME/lib/tools.jar $HIVE_HOME/lib/
hbase
hbase-site.xml
《configuration》
《/configuration》
hbase shell
修改hbase_env.sh 使用自带的zookeeper
export HBASE_MANAGES_ZK=true
Spark
pyspark 报错
Exception in thread "main"
java.lang.NoClassDefFoundError:
org/apache/hadoop/fs/FSDataInputStream
at
org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:117)
修改conf/spark-env.sh,增加hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
又报错
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError:
com/fasterxml/jackson/databind/Module
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError:
com/fasterxml/jackson/core/Versioned
fanhuan@bogon:~$ hadoop classpath
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/etc/hadoop:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/lib/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/hdfs:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/hdfs/lib/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/hdfs/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/lib/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/mapreduce/lib/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/share/hadoop/mapreduce/*:
/Users/fanhuan/opt/cdh5/hadoop-2.6.0-cdh5.8.0/contrib/capacity-scheduler/*.jar
缺少jackson-core-2.2.3.jar jackson-databind-2.2.3.jar包,发现hadoop
classpath里没有,tools/lib里有,加进去后解决
export
SPARK_DIST_CLASSPATH=$HADOOP_HOME/share/hadoop/tools/lib/*:$(hadoop
classpath)