Ambari安装常见问题
参考自:
http://blog.csdn.net/xingxc111/article/details/70667574
http://blog.csdn.net/xfg0218/article/details/78067541
1.HostNotFoundException
- org.apache.ambari.server.HostNotFoundException: Host not found, hostname=
部署hdp时,由于操作系统为中文,节点报错:
Agent端日志:
- INFO 2016-04-05 10:31:30,106 hostname.py:89 - Read public hostname \'slavenode1.hdp\' using socket.getfqdn()
- ERROR 2016-04-05 10:31:30,111 main.py:309 - Fatal exception occurred:
- Traceback (most recent call last):
- File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 306, in <module>
- main(heartbeat_stop_callback)
- File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 297, in main
- ExitHelper.execute_cleanup()
- TypeError: unbound method execute_cleanup() must be called with ExitHelper instance as first argument (got nothing instead)
- ', None)
Server端日志:
- 01 四月 2016 20:19:53,480 ERROR [qtp-ambari-client-23] AbstractResourceProvider:280 - Caught AmbariException when creating a resource
- org.apache.ambari.server.HostNotFoundException: Host not found, hostname=
- at org.apache.ambari.server.state.cluster.ClustersImpl.getHost(ClustersImpl.java:370)
- at org.apache.ambari.server.state.ConfigHelper.getEffectiveDesiredTags(ConfigHelper.java:107)
- at org.apache.ambari.server.controller.AmbariManagementControllerImpl.findConfigurationTagsWithOverrides(AmbariManagementControllerImpl.java:1876)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:497)
- at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37)
解决办法:
原来是因为centos7系统中了中文,python对中文支持度不好,所以把系统改为英文就可以了,centos7有界面,直接在设置中改language为英文!
2. 缺少JAR问题libtirpc-devel-0.2.4-0.6.el7.x86_64.rpm
- 问题描述:
- Installing package hadoop_2_6_0_3_8-hdfs ('/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8-hdfs') 2017-05-26 17:07:30,977 - Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8-hdfs' returned 1. Error: Package: hadoop_2_6_0_3_8-hdfs-2.7.3.2.6.0.3-8.x86_64 (HDP-2.6) Requires: libtirpc-devel You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest 2017-05-26 17:07:30,977 - Failed to install package hadoop_2_6_0_3_8-hdfs. Executing '/usr/bin/yum clean metadata' 2017-05-26 17:07:31,544 - Retrying to install package hadoop_2_6_0_3_8-hdfs after 30 seconds
- 解决方法:下载libtirpc安装包在服务器上安装libtirpc-devel-0.2.4-0.6.el7.x86_64.rpm 或者 libtirpc-devel-0.2.4-0.8.el7.i686.rpm
- 解决方法:下载libtirpc安装包在服务器上安装libtirpc-0.2.1-13.el6.x86_64.rpm 以及libtirpc-devel-0.2.1-13.el6.x86_64.rpm
3. snappy版本过高
- 问题描述:环境~hdp2.6,redhat7.2
- resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install snappy-devel' returned 1. Error: Package: snappy-devel-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.20)
- Requires: snappy(x86-64) = 1.0.5-1.el6
- Installed: snappy-1.1.0-3.el7.x86_64 (@anaconda/7.1)
- snappy(x86-64) = 1.1.0-3.el7
- Available: snappy-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.20)
- snappy(x86-64) = 1.0.5-1.el6
- 解决方法:
- yum -y remove snappy
- yum -y install yum-plugin-versionlock
- echo 'snappy-1.0.5-1.el6.*' >> /etc/yum/pluginconf.d/versionlock.list
4. /usr/hdp/current/hadoop-client/conf doesn't exist,此问题多出现在安装失败,重试安装的时候
- 问题描述
- resource_management.core.exceptions.Fail: Applying File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] failed, parent directory /usr/hdp/current/hadoop-client/conf doesn't exist
- 解决方案:
此问题是由于/etc/hadoop/ 下面的conf目录不存在导致的,从别的好的机器上拷贝一份过来即可 scp -r <good mechine>:/etc/hadoop/* /etc/hadoop
5. /usr/sbin/hst: line 321: install-activity-analyzer.sh: command not found
- 问题描述
- 2016-09-12 16:34:18,905 - User['activity_analyzer'] {'gid': 'hadoop', 'groups': [u'hdfs']} Deploying activity analyzer Command: /usr/sbin/hst activity-analyzer setup root:root '/etc/rc.d/init.d' Exit code: 127 Std Out: None Std Err: /usr/sbin/hst: line 321: install-activity-analyzer.sh: command not found Command failed after 1 tries
- 解决办法:删除smartsense-hst,然后重新安装
yum remove smartsense-hst rm -rf /var/log/smartsense/
6. 重新启用kerberos时,yarn resourcemanager启动失败: Couldn't set ACLs on parent ZNode: /yarn-leader-election
- 问题描述
- 2017-06-14 10:03:29,878 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: java.io.IOException: Couldn't set ACLs on parent ZNode: /yarn-leader-election java.io.IOException: Couldn't set ACLs on parent ZNode: /yarn-leader-election at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:351) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:103) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:152) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit
- 解决办法:
登录zookeeper:
[root@bigdata-nn-01 ~]$ su - hadoop
[hadoop@bigdata-nn-01 ~]$ zookeeper-client -server bigdata-nn-01.cars.com:2181
其中: -server 后面一定要用FQDN主机名
删除:/yarn-leader-election
[zk: bigdata-nn-01.cars.com:2181(CONNECTED) 1] rmr /yarn-leader-election
7.缺少jar包python-argparse-1.2.1-2.1.el6.noarch.rpm(针对centos6.5 ambari2.5.1.0)
- 问题描述
- Error: Package: hive2_2_6_1_0_129-2.1.0.2.6.1.0-129.noarch (HDP-2.6) Requires: python-argparse You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest 2017-08-16 09:46:05,216 - Failed to install package hive2_2_6_1_0_129. Executing '/usr/bin/yum clean metadata' 2017-08-16 09:46:05,400 - Retrying to install package hive2_2_6_1_0_129 after 30 secondsCommand failed after 1 tries
- 解决办法:
yum install python-argparse-1.2.1-2.1.el6.noarch.rpm -y
8.关于ambri hst agent注册失败错误
- INFO 2017-09-21 10:52:33,435 security.py:178 - Server certificate not exists, downloading
- INFO 2017-09-21 10:52:33,435 security.py:191 - Downloading server cert from https://ambari-test1.com:9440/cert/ca/
- ERROR 2017-09-21 10:52:33,510 ServerAPI.py:137 - POST https://ambari-test1.com:9441/api/v1/register failed. (SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)'),)
这个错误是因为python-2.7.5-e58版本默认是使用ssl验证,解决这个问题,修改/etc/python/cert-verification.cfg:
[https]
verify = disable
或者降级python版本!
9.安装ambari-metrics-monitor报错
- Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-monitor' returned 1. Error: Package: ambari-metrics-monitor-2.5.0.3-7.x86_64 (ambari-2.5.0.3)
- Requires: gcc
- Error: Package: ambari-metrics-monitor-2.5.0.3-7.x86_64 (ambari-2.5.0.3)
- Requires: python-devel包
- You could try using --skip-broken to work around the problem
- ** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows:
请安装gcc与python-devel包和libtirpc-devel与yum install python-devel.x86_64
10.缺少rpcbind
- File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
- raise ExecutionFailed(err_msg, code, out, err)
- resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install rpcbind' returned 1.
- Error: Nothing to do
请安装rpcbind
11.缺少redhat-lsb
- File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
- raise ExecutionFailed(err_msg, code, out, err)
- resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install zookeeper_2_6_0_3_8-server' returned 1. Error: Package: zookeeper_2_6_0_3_8-server-3.4.6.2.6.0.3-8.noarch (HDP-2.6)
- Requires: redhat-lsb
请安装redhat-lsb
12.Check /var/log/ambari-server/ambari-server.out
使用ambari-server start的时候出现
- ERROR: Exiting with exit code -1.
- REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information
解决:
由于是重新安装,所以在使用/etc/init.d/postgresql initdb初始化数据库的时候会出现这个错误,所以需要
先用yum –y remove postgresql*命令把postgresql卸载
然后把/var/lib/pgsql/data目录下的文件全部删除
然后再配置postgresql数据库
然后再次安装
13./usr/hdp/current/hadoop-client/conf doesn't exist
安装HDFS和HBASE的时候出现/usr/hdp/current/hadoop-client/conf doesn't exist
/etc/Hadoop/conf文件链接存在
是由于/etc/hadoop/conf和/usr/hdp/current/hadoop-client/conf目录互相链接,造成死循环,所以要改变一个的链接
rm -rf conf
ln -s /etc/hadoop/conf.backup /etc/hadoop/conf
HBASE也会遇到同样的问题,解决方式同上
cd /etc/hbase
rm -rf conf
ln -s /etc/hbase/conf.backup /etc/hbase/conf
ZooKeeper也会遇到同样的问题,解决方式同上
cd /etc/zookeeper
rm -rf conf
ln -s /etc/zookeeper/conf.backup /etc/zookeeper/conf
/etc/Hadoop/conf文件链接不存在
查看正确的配置,发现缺少两个目录文件config.backup和2.4.0.0-169,把文件夹拷贝到/etc/hadoop目录下
重新创建/etc/hadoop目录下的conf链接:
rm -rf conf
ln -s /usr/hdp/current/hadoop-client/conf conf
问题解决
14.ambary-server重装
删除使用脚本删除
注意删除后要安装两个系统组件
yum -y install ruby*
yum -y install redhat-lsb*
yum -y install snappy*
15.Failed to start ping port listener of: [Errno 98] Address already in use
某个端口或者进程一直陪占用
解决方法:
发现df命令一直执行没有完成,
[root@testserver1 ~]# netstat -lanp|grep 8670
tcp 0 0 0.0.0.0:8670 0.0.0.0:* LISTEN 2587/df
[root@testserver1 ~]# kill -9 2587
kill后,再重启ambari-agent问题解决
[root@testserver1 ~]# service ambari-agent restart
16.启动hive的时候出现错误unicodedecodeerror ambari in position 117
查看/etc/sysconfig/i18n文件,发现内容如下:
LANG=”zh_CN.UTF8”
原来系统字符集设置成了中文,改成如下内容,问题解决:
LANG="en_US.UTF-8"
</div>