安装hadoop+zookeeper ha
安装hadoop+zookeeper ha 前期工作配置好网络和主机名和关闭防火墙 chkconfig iptables off //关闭防火墙 1.安装好java并配置好相关变量 (/etc/profile) #java export JAVA_HOME=/usr/java/jdk1.8.0_65 export JRE_HOME=$JAVA_HOME/jre export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar (最前面要有.) 保存退出 source /etc/profile 2.设置好主机名和网络映射关系 (/etc/hosts) // hadoop.master为namenode // hadoop.slaver1/hadoop.slaver2/hadoop.slaver3 为datanode 192.168.22.241 hadoop.master 192.168.22.242 hadoop.slaver1 192.168.22.243 hadoop.slaver2 192.168.22.244 hadoop.slaver3 3.创建用户并创建密码(以root身份登陆) 1. useradd hadoop(或者其他用户名) 2. passwd hadoop (回车输入密码 两次) 3. su hadoop (使用hadoop用户登陆) 4.免密码登陆 1.安装ssh 具体百度 一般都自带有 2.创建在家目录底下创建.ssh目录(使用hadoop用户) mkdir ~/.ssh 3.创建公钥(namenode端运行) ssh-keygen -t rsa 一路回车 最后会在~/.ssh目录下生成id_rsa、id_rsa.pub 其中前者是密钥 后者是公钥 4.将id_rsa.pub文件拷贝到slaver节点的相同用户.ssh目录下 scp -r id_rsa.pub 用户名@主机名:目标文件(含路径) 5.在各个子节点执行cat id_rsa.pub >> ~/.ssh/authorized_keys 6.设置权限 chmod 600 authorized_keys cd .. chmod 700 -R .ssh 7.注意此时还不能免密码 需在master 节点运行ssh slaver 输入密码后才能免密码 5.安装zookeeper(三台 master slaver1 slaver2) 1.下载安装包 2.解压安装包 tar zxvf zookeeper-3.4.7.tar.gz 3.配置环境变量 #zookeeper export ZOOKEEPER_HOME=/opt/zookeeper-3.4.7 export PATH=$PATH::$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf 保存退出 source /etc/profile 4.修改配置文件 cp zoo_sample.cfg zoo.cfg vim zoo.cfg ####zoo.cfg#### tickTime=2000 initLimit=10 syncLimit=5 dataDir=/opt/zookeeper-3.4.7/tmp/zookeeper (注意创建相关目录) clientPort=2181 server.1=hadoop.master:2888:3888 server.2=hadoop.slaver1:2888:3888 server.3=hadoop.slaver2:2888:3888 参数说明: tickTime: zookeeper中使用的基本时间单位, 毫秒值. dataDir: 数据目录. 可以是任意目录. dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置. clientPort: 监听client连接的端口号. initLimit: zookeeper集群中的包含多台server, 其中一台为leader, 集群中其余的server为follower. syncLimit: 该参数配置leader和follower之间发送消息, 请求和应答的最大时间长度. server.X=A:B:C 其中X是一个数字, 表示这是第几号server. A是该server所在的IP地址. B配置该server和集群中的leader交换消息所使用的端口. C配置选举leader时所使用的端口. 5.分发到各个节点中 scp -r /opt/zookeeper-3.4.7 hadoop@主机名:/opt 6.根据dataDir配置的目录下新建myid文件, 写入一个数字, 该数字表示这是第几号server cd /opt/zookeeper-3.4.7/tmp/zookeeper touch myid(如果是安装上述配置,则master为1 slaver1为2 slaver3) 7.常用命令 ####启动/关闭/查看 zk##### zkServer.sh start //集群中每台主机执行一次 zkServer.sh stop zkServer.sh status ####查看/删除节点信息#### zkCli.sh ls / rmr /节点名称 6.安装hadoop(四台机子 master slaver1 slaver2 slaver3 其中namenode有master和slaver1) 1.下载安装包 2.解压安装包 3.配置环境变量 #hadoop export HADOOP_HOME=/opt/hadoop-2.5.2 export HADOOP_PREFIX=/opt/hadoop-2.5.2 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/bin 保存退出 source /etc/profile 4.修改配置文件 1.创建相关目录 cd /opt/hadoop-2.5.2 mkdir logs mkdir tmp 2.修改相关配置文件相关参数(core-site.xml/hadoop-env.sh/hdfs-site.xml/log4j.properties /mapred-env.sh/mapred-site.xml/masters/slaves/yarn-env.sh/yarn-site.xml) ####core-site.xml#### <configuration> <!-- 指定hdfs的nameservice为namenode--> <property> <name>fs.defaultFS</name> <value>hdfs://ns1:8020</value> </property> <!-- 指定hadoop块大小 --> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.5.2/tmp</value> <description>A base for other temporary directories.</description> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop.master:2181,hadoop.slaver1:2181,hadoop.slaver2:2181</value> </property> </configuration> ####hadoop-env.sh#### export JAVA_HOME=/usr/java/jdk1.8.0_65 export HADOOP_CLASSPATH=.:$HADOOP_CLASSPATH:$HADOOP_HOME/bin export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/bin ####hdfs-site.xml#### <configuration> <property> <name>dfs.namenode.http-address</name> <value>hadoop.master:50070</value> <description>The address and the base port where the dfs namenode web ui will listen on.</description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop.slaver1:50070</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file://${hadoop.tmp.dir}/dfs/namesecondary</value> <final>true</final> </property> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/dfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/dfs/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <name>dfs.namenode.hosts.exclude</name> <value>/opt/hadoop-2.5.2/other/excludes</value> <description>Names a file that contains a list of hosts that are not permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded.</description> </property> <property> <name>dfs.namenode.hosts</name> <value>/opt/hadoop-2.5.2/etc/hadoop/slaves</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- HBase configuration--> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <!--Zookeeper configuration--> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>hadoop.master:8020</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hadoop.slaver1:8020</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hadoop.master:50070</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hadoop.slaver1:50070</value> </property> <property> <name>dfs.namenode.servicerpc-address.ns1.nn1</name> <value>hadoop.master:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.ns1.nn2</name> <value>hadoop.slaver1:53310</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/zookeeper-3.4.7/journal</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop.master:8485;hadoop.slaver1:8485;hadoop.slaver2:8485/ns1</value> </property> <!-- 开启NameNode失败自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失败自动切换实现方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop.master:2181,hadoop.slaver1:2181,hadoop.slaver2:2181</value> </property> <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔离机制超时时间 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration> ####log4j.properties#### hadoop.root.logger=INFO,console hadoop.log.dir=/opt/hadoop-2.5.2/logs hadoop.log.file=hadoop.log ####mapred-env.sh#### export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000 export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA ####mapred-site.xml#### <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value> /opt/hadoop-2.5.2/etc/hadoop, /opt/hadoop-2.5.2/share/hadoop/common/*, /opt/hadoop-2.5.2/share/hadoop/common/lib/*, /opt/hadoop-2.5.2/share/hadoop/hdfs/*, /opt/hadoop-2.5.2/share/hadoop/hdfs/lib/*, /opt/hadoop-2.5.2/share/hadoop/mapreduce/*, /opt/hadoop-2.5.2/share/hadoop/mapreduce/lib/*, /opt/hadoop-2.5.2/share/hadoop/yarn/*, /opt/hadoop-2.5.2/share/hadoop/yarn/lib/* </value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop.master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop.master:19888</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/history/done</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/history/done_intermediate</value> </property> </configuration> ####masters#### hadoop.slaver1 //存储secondary namenode节点主机名 ####slaves#### hadoop.slaver1 hadoop.slaver2 hadoop.slaver3 ####yarn-env.sh#### export JAVA_HOME=/usr/java/jdk1.8.0_65 ####yarn-site.xml#### <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>hadoop.master:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop.master:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop.master:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop.master:18041</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop.master:8088</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/opt/hadoop-2.5.2/other/mynode</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/opt/hadoop-2.5.2/other/logs</value> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/opt/hadoop-2.5.2/other/logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>-1</value> </property> <property> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>-1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--zookeeper--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop.master</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop.slaver1</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop.master:2181,hadoop.slaver1:2181,hadoop.slaver2:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> 5.分发到各个节点中 scp -r /opt/hadoop-2.5.2 hadoop@hadoop.master:/opt 6.首次启动 6.1 启动zk zkServer.sh start(zk 各个节点执行) 6.2 启动journalnode hadoop-daemon.sh start journalnode(zk 各个节点执行) 6.3 格式化Namenode hadoop namenode -format(namenode 节点运行 注意是hadoop 不是hdfs) 6.4 启动Namenode hadoop-daemon.sh start namenode(namenode 节点运行) 6.5 格式化另一个Namenode hadoop namenode -bootstrapStandby(在secondary namenode节点运行) 6.6 格式化zk hdfs zkfc -formatZK (namenode节点执行) 6.7 将所有的服务停止 stop-all.sh 注意此时需在每个zk节点执行 zkServer.sh stop 7.正常启动 1.启动zk zkServer.sh start(zk 各个节点执行) 2.启动所有服务 start-all.sh //或者先执行start-dfs.sh 再执行start-yarn.sh 3.启动后台历史服务 mr-jobhistory-daemon.sh start historyserver(在namenode节点执行即可) 4.启动备份resourcemanger yarn-daemon.sh start resourcemanager //在备份节点运行 5.启动备份namenode hadoop-daemon.sh start namenode //在备份节点运行 8.验证 1.jps验证 查看相关进程 2.web验证 hdfs 主机名:50070 yarn 主机名:8088 history 主机名:19888 //以上主机名均指 namenode节点主机名 (此时namenode节点是active状态) 3.查看active状态 hdfs web查看 有active状态和stangby状态两种 yarn shell命令查看 yarn rmadmin -getServiceState rm1(或者rm2) //其中rm1/rm2为配置文件中配置的名称 4.kill当前active的namenode 看能不自己切换到standby namenode上 9.常见命令 ####启动/关闭yarn jobhistory记录#### web: //namenode:19888 //其中namenode 为集群任意节点主机名 mr-jobhistory-daemon.sh start historyserver //集群中每台主机执行一次 mr-jobhistory-daemon.sh stop historyserver ####启动/关闭/查看 zk##### zkServer.sh start //集群中每台主机执行一次 zkServer.sh stop zkServer.sh status ####启动/关闭/查看 yarn#### yarn-daemon.sh start resourcemanager yarn-daemon.sh stop resourcemanager yarn-daemon.sh stop nodemanager yarn rmadmin -getServiceState rm2 //其中rm2是集群配置的别名 web: //namenode:8088 //其中namenode是active状态的主机名 ####启动/关闭/查看 hadoop#### hadoop-daemon.sh start namenode hadoop-daemon.sh stop namenode hadoop-daemon.sh stop datanode web: //namenode:50070 //其中namenode是active状态的主机名 ####格式化zkNode#### hdfs zkfc -formatZK //namenode节点执行 注意是hdfs 不是hadoop ####启动/关闭zkNode##### hadoop-daemon.sh start zkfc hadoop-daemon.sh stop zkfc ####查看/删除job#### hadoop job -list hadoop job -kill 任务ID //注意不是applicationID ####初始化Journal Storage Directory#### hdfs namenode -initializeSharedEdits //非ha转成ha时执行 如果一开始已经是ha了无需执行 ####初始化namenode#### hadoop namenode -format //namenode端执行 hdfs namenode -bootstrapStandby //secend namenode端执行 执行前需保证namenode已经启动 10.常见异常 1.Journal Storage Directory /opt/zookeeper-3.4.7/journal/ns1 not formatted 原因:由于之前hadoop没部署ha,改成ha后形成错误 解决办法: 1.将配置文件hdfs-site.xml中dfs.journalnode.edits.dir对应的目录删除 2.hdfs namenode -initializeSharedEdits(namenode 执行) 2.datanode起来了,namenode起不来 解决办法: 1.查看配置文件相关配置项是否配置正确 2.查看环境变量是否配置正确 3.查看主机网络映射是否配置正确 4.是否二次格式化namenode 如果是,则需要将datanode 的clusterID和namespaceID改成namenode一致 目录一般是tmp目录下 5.重启hdfs 6.如果执行上述还不行,则在hadoop服务运行状态下将tmp目录下所有文件夹删除,再格式化,重启服务 3.两个namenode起来了,但都是standby状态 解决办法: 1.是否均启动zk 2.格式化zfkc hdfs zkfc -formatZK 3.所有服务重启(含zk)