Impala1.2.4安装和配置
Impala1.2.4安装手册
安装前说明:
1、 安全性考虑,我们使用hive用到的账户cup进行impala的启停等操作,而不另外使用impala账户;这涉及到后文中的一些文件夹权限调整、配置文件中的用户参数调整;
2、 性能考虑,impala-state-store、impala-catalog这两个服务安装在hadoop集群的namenode上面,impala-server、impala-shell服务安装在各个datanode上,namenode上不安装使用impala-server;
3、 在安装impala相关软件包的时候使用root账户,之后再将相关文件所有者修改为cup账户;
4、 启停impala服务需要root权限的账号;
5、 安装步骤参照官方文档:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/Installing-and-Using-Impala.html |
安装Impala软件包
下载所需要的安装包,根据需要选择合适的版本(由于我们用的是CDH4.2.1版本,所以选择了impala1.2.4):
http://archive.cloudera.com/impala/redhat/6/x86_64/impala/ |
在Hadoop集群的namenode节点上依次安装以下的包:
rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-state-store-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm |
注意:impala的安装依赖这个包:bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm,这个包在官网1.2.4版本的目录中找不到,需要在1.2.3或者其他版本的目录中下载。
在其它datanode节点上依次安装以下的包:
rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm |
查看安装之后的impala路径:
[root@cup-slave-11 cup]# find / -name impala /etc/alternatives/impala /etc/impala /etc/default/impala /var/log/impala /var/lib/alternatives/impala /var/lib/impala /var/run/impala /usr/lib/impala |
Impala配置
在hdfs-site.xml文件中添加如下内容:
<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property> <property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property> <property> <name>dfs.client.use.legacy.blockreader.local</name> <value>false</value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>750</value> </property> <property> <name>dfs.block.local-path-access.user</name> <value>cup</value> </property> <property> <name>dfs.client.file-block-storage-locations.timeout</name> <value>3000</value> </property> |
添加配置文件:
impalad的配置文件路径由环境变量IMPALA_CONF_DIR指定,默认为/etc/impala/conf,拷贝配置好的hive-site.xml、core-site.xml、hdfs-site.xml、hbase-site.xml文件至/etc/impala/conf目录下。
将相关so文件拷贝到hadoop的lib目录(如果目标目录有这些文件,可以忽略此步骤):
cp /usr/lib/impala/lib/*.so* $HADOOP_HOME/lib/native/ |
用$HIVE_HOME/lib目录下带“datanucleus”字样的文件替换/usr/lib/impala/lib目录下对应文件(名称要改成跟/usr/lib/impala/lib原来的一样);不然在启动impala-state-store 和impala-catalog的时候会报错,详见异常3、异常5。
复制$HADOOP_HOME/lib 目录下的mysql-connector-java.jar文件到“/usr/share/java”目录,因为impala的catalogd要使用(注意mysql驱动包的名称一定要是mysql-connector-java.jar):
[root@cup-slave-11 native]# more /usr/bin/catalogd #!/bin/bash
export IMPALA_BIN=${IMPALA_BIN:-/usr/lib/impala/sbin} export IMPALA_HOME=${IMPALA_HOME:-/usr/lib/impala} export HIVE_HOME=${HIVE_HOME:-/usr/lib/hive} export HBASE_HOME=${HBASE_HOME:-/usr/lib/hbase} export IMPALA_CONF_DIR=${IMPALA_CONF_DIR:-/etc/impala/conf} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/impala/conf} export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/impala/conf} export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/impala/conf} export LIBHDFS_OPTS=${LIBHDFS_OPTS:--Djava.library.path=/usr/lib/impala/lib} export MYSQL_CONNECTOR_JAR=${MYSQL_CONNECTOR_JAR:-/usr/share/java/mysql-connector-java.jar} |
根据实际环境修改impala配置信息:
[root@cup-master-1 ~]# vi /etc/default/impala
IMPALA_STATE_STORE_HOST=10.204.193.10 IMPALA_STATE_STORE_PORT=24000 IMPALA_BACKEND_PORT=22000 IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} " IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}" IMPALA_SERVER_ARGS=" \ -log_dir=${IMPALA_LOG_DIR} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT}"
ENABLE_CORE_DUMPS=false
# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar IMPALA_BIN=/usr/lib/impala/sbin IMPALA_HOME=/usr/lib/impala HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1 HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1 IMPALA_CONF_DIR=/etc/impala/conf HADOOP_CONF_DIR=/etc/impala/conf HIVE_CONF_DIR=/etc/impala/conf HBASE_CONF_DIR=/etc/impala/conf |
根据实际环境修改impala相关脚本文件/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog,修改其中两处跟用户相关的地方:
DAEMON="catalogd" DESC="Impala Catalog Server" EXEC_PATH="/usr/bin/catalogd" SVC_USER="cup" ###编者注:这里默认是impala DAEMON_FLAGS="${IMPALA_CATALOG_ARGS}" CONF_DIR="/etc/impala/conf" PIDFILE="/var/run/impala/catalogd-impala.pid" LOCKDIR="/var/lock/subsys" LOCKFILE="$LOCKDIR/catalogd"
install -d -m 0755 -o cup -g cup /var/run/impala 1>/dev/null 2>&1 || : [ -d "$LOCKDIR" ] || install -d -m 0755 $LOCKDIR 1>/dev/null 2>&1 || :
|
在hdfs上创建impala目录:
hadoop dfs -mkdir /user/impala |
在每个节点上创建/var/run/hadoop-hdfs,因为hdfs-site.xml文件的dfs.domain.socket.path参数指定了这个目录:
[root@cup-slave-11 impala]# mkdir /var/run/hadoop-hdfs |
将/var/run/hadoop-hdfs和/var/log/impala目录的所有权赋给cup账户和cup用户组,不然在启动impala-server的时候会出现异常4:
chown -R cup:cup /var/log/impala chown -R cup:cup /var/run/hadoop-hdfs |
启动Impala服务
启动namenode节点上impala的state-store服务:
sudo service impala-state-store start |
启动namenode节点上impala的catalog服务:
sudo service impala-catalog start |
启动datanode节点上impala的impala-server服务:
sudo service impala-server start |
停止namenode节点上impala的state-store服务:
sudo service impala-state-store stop |
停止namenode节点上impala的catalog服务:
sudo service impala-catalog stop |
停止datanode节点上impala的impala-server服务:
sudo service impala-server stop |
注意:少数情况下启动impala服务虽然没有明显的错误提示,但是也有可能并未启动成功,需要观察/var/log/impala中是否有error字样的错误日志,如果有的话需要进一步核查。
确认Impala正常使用
查看datanode上面的impala进程是否存在:
[cup@cup-master-1 ~]$ ps -ef|grep impala cup 5522 45968 0 08:58 pts/25 00:00:00 grep impala cup 8292 1 0 Mar27 ? 00:01:06 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala -state_store_port=24000 |
查看datanode上面的impala-server进程是否存在:
[cup@cup-slave-11 ~]$ ps -ef|grep impala cup 15630 15599 0 09:24 pts/0 00:00:00 grep impala cup 112216 1 0 Mar27 ? 00:01:15 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000 -use_statestore -state_store_host=10.204.193.10 -be_port=22000 |
访问datanode上impala的web页面,默认端口25010:
访问datanode上面impala的web页面,默认端口25000:
在安装了impala-shell的节点执行sql语句:
[cup@cup-slave-11 ~]$ impala-shell Starting Impala Shell without Kerberos authentication Connected to cup-slave-11:21000 Server version: impalad version 1.2.4 RELEASE (build ac29ae09d66c1244fe2ceb293083723226e66c1a) Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v1.2.4 (ac29ae0) built on Wed Mar 5 07:05:40 PST 2014) [cup-slave-11:21000] > show databases; Query: show databases +---------+ | name | +---------+ | cloudup | | default | | xhyt | +---------+ Returned 3 row(s) in 0.01s [cup-slave-11:21000] > use cloudup; Query: use cloudup [cup-slave-11:21000] > select * from url_read_typ_rel limit 5; Query: select * from url_read_typ_rel limit 5 +----------------------+---------+---------+---------+---------+--------+-----+ | urlhash | rtidlv1 | rtyplv1 | rtidlv2 | rtyplv2 | isttim | url | +----------------------+---------+---------+---------+---------+--------+-----+ | 2160609062987073557 | 3 | 股票 | NULL | | NULL | | | 8059679893178527423 | 3 | 股票 | NULL | | NULL | | | -404610021015528651 | 2 | 房产 | NULL | | NULL | | | -6322366252916938780 | 5 | 教育 | NULL | | NULL | | | -6821513749785855580 | 12 | 游戏 | NULL | | NULL | | +----------------------+---------+---------+---------+---------+--------+-----+ Returned 5 row(s) in 0.61s |
常见异常:
异常1:
在启停state-store的时候会报错:
[root@cup-master-1 ~]# service impala-state-store start /etc/init.d/impala-state-store: line 35: /etc/default/hadoop: No such file or directory Starting Impala State Store Server:[ OK ] |
解决方法:
impala多个启动文件中有执行/etc/default/hadoop的操作,但实际上我们并没有此文件,此异常提示没有实质影响,可忽略。
异常2:
启动impala-server服务的时候会报错(错误日志在目录/var/log/impala下面):
ERROR: short-circuit local reads is disabled because - Impala cannot read or execute the parent directory of dfs.domain.socket.path - dfs.client.read.shortcircuit is not enabled. ERROR: block location tracking is not properly enabled because - dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000. |
解决方法:
确保在hdfs-site.xml文件配置了以下参数即可:
dfs.client.read.shortcircuit、
dfs.domain.socket.path、
dfs.datanode.hdfs-blocks-metadata.enabled、
dfs.client.use.legacy.blockreader.local、
dfs.datanode.data.dir.perm、
dfs.block.local-path-access.user、
dfs.client.file-block-storage-locations.timeout
异常3:
启动impala-state-store start报错:
java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:51) at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:41) /*编者注:此处省略若干信息*/ Caused by: javax.jdo.JDOFatalUserException: Class datanucleus.jdo.JDOPersistenceManagerFactory was not found. NestedThrowables: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory Caused by: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1155) |
解决方法:
这是由于/usr/lib/impala/lib目录下的datanucleus相关软件包跟$HIVE_HOME/lib目录下的版本不一致,需要将$HIVE_HOME/lib目录下的datanucleus相关文件替换到/usr/lib/impala/lib目录,同时修改文件名称与原来/usr/lib/impala/lib中的一样(因为有些配置文件中写明了文件名)。
异常4:
如果这两个目录的所有者不是运行impala的用户,在启动会报错:
[root@cup-slave-11 impala]# service impala-server start /etc/init.d/impala-server: line 35: /etc/default/hadoop: No such file or directory Starting Impala Server:[ OK ] /bin/bash: /var/log/impala/impala-server.log: Permission denied |
解决方法:
将/var/run/hadoop-hdfs和/var/log/impala目录的所有权赋给cup账户和cup用户组,同时确保/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog三个文件中的用户和用户组配置为cup用户。
异常5:
启动impala-catalog服务的时候报错:
E0327 16:02:46.283989 45718 Log4JLogger.java:115] Bundle "org.datanucleus.api.jdo" requires "org.datanucleus" version "3.2.0.m4" but the resolved bundle has version "3 .2.1" which is outside the expected range. |
解决方法:
根据错误描述,将/usr/lib/impala/lib目录下的datanucleus-api-jdo-3.2.1.jar文件名称改为datanucleus-api-jdo-3.2.0.m4.jar,问题解决。
----end
本文连接:http://www.cnblogs.com/chenz/articles/3629698.html
作者:chenzheng
联系:vinkeychen@gmail.com