hadoop分布式安装
先配置免登录
先准备了7个节点
192.168.101.172 node1 192.168.101.206 node2 192.168.101.207 node3 192.168.101.215 node4 192.168.101.216 node5 192.168.101.217 node6 192.168.101.218 node7
每台机器都修改hosts文件如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.172 node1
192.168.101.206 node2
192.168.101.207 node3
192.168.101.215 node4
192.168.101.216 node5
192.168.101.217 node6
192.168.101.218 node7
修改主机名:
vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=node2
关防火墙:
centos6:
service iptables stop
centos7:
systemctl stop firewalld
systemctl disable firewalld
创建用户:
groupadd hadoop useradd hadoop -g hadoop mkdir /main/bigdata chown -R hadoop:hadoop /main/bigdata/ passwd hadoop
时间同步centos6:
yum -y installntp ntpdate cn.pool.ntp.org
开始配置免登录:
免密的核心思想就是:如果B服务器authorized_keys有A服务器的公钥(锁),那么A服务器可以免密登录B服务器。
先到7台机器分别生成密钥:
# 安装ssh
yum -y install openssh-clients
su hadoop
rm -rf ~/.ssh/*
ssh-keygen -t rsa # 一路回车,然后拷贝到授权文件 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
接着要让一台机器可以访问其他的所有机器:
参考了 https://blog.csdn.net/ITYang_/article/details/70144395
先在所有6台机器上执行如家命令,把各个机器的密钥发送到node1(172)机器的authorized_keys中去:
#centos6
ssh-copy-id "-p 2121 192.168.101.172"
#centos7
ssh-copy-id -p 2121 hadoop@node1
# 或更详细的: 当上一步是dsa的时候,这里必须要指定算法是id_dsa.pub
ssh-copy-id -i ~/.ssh/id_dsa.pub "-p 2121 192.168.101.172"
把我们的公钥都传递给172这个节点. 在172(node1)上执行:
cat .ssh/authorized_keys
会看到别的机器节点密钥都传上来了
测试:
此时在别的节点上执行
ssh node1 -p 2121
发现可以免密码进入node1 !
下一步是把我们辛苦收集到node1机器的.ssh/authorized_keys文件分发到各个节点上去:
在node1的机器上执行一个shell:
yum install expect
然后:
#!/bin/bash SERVERS="192.168.101.172 192.168.101.206 192.168.101.207 192.168.101.215 192.168.101.216 192.168.101.217 192.168.101.218" PASSWORD=机器密码
# 将当前机器的密钥copy到其他的节点上,为了保证下一步scp的时候可以免密登录 auto_ssh_copy_id() { expect -c "set timeout -1;
# 如果缺省端口的话 可以直接spawn ssh-copy-id $1; spawn ssh-copy-id \" -p 2121 $1 \"; expect { *(yes/no)* {send -- yes\r;exp_continue;} *assword:* {send -- $2\r;exp_continue;} eof {exit 0;} }"; }
# 循环所有的机器,开始copy ssh_copy_id_to_all() { for SERVER in $SERVERS do auto_ssh_copy_id $SERVER $PASSWORD done }
# 调用上面的方法 ssh_copy_id_to_all
# 循环所有的机器ip,scp把当前机器的密钥全都copy到别的机器上去(端口2121,如果缺省的话可以不填写) 用户名hadoop根据实际情况更换 for SERVER in $SERVERS do scp -P 2121 ~/.ssh/authorized_keys hadoop@$SERVER:~/.ssh/ done
centos7的 copy-id命令语法变了,所以shell变为:
#!/bin/bash SERVERS="node1 node2 node3 node4 node5 node6 node7 node8" USERNAME=hadoop PASSWORD=机器密码 # 将当前机器的密钥copy到其他的节点上,为了保证下一步scp的时候可以免密登录 auto_ssh_copy_id() { expect -c "set timeout -1; # 如果缺省端口的话 可以直接spawn ssh-copy-id $1; spawn ssh-copy-id -p 2121 $2@$1 ; expect { *(yes/no)* {send -- yes\r;exp_continue;} *assword:* {send -- $3\r;exp_continue;} eof {exit 0;} }"; } # 循环所有的机器,开始copy ssh_copy_id_to_all() { for SERVER in $SERVERS do auto_ssh_copy_id $SERVER $USERNAME $PASSWORD done } # 调用上面的方法 ssh_copy_id_to_all # 循环所有的机器ip,scp把当前机器的密钥全都copy到别的机器上去(端口2121,如果缺省的话可以不填写) 用户名hadoop根据实际情况更换 for SERVER in $SERVERS do scp -P 2121 ~/.ssh/authorized_keys $USERNAME@$SERVER:~/.ssh/ done
但是实际情况在使用非root用户的时候发现了很奇怪的现象:
centos6小插曲:
node1收集到别的6个节点的密钥之后,反向ssh-copy-id回去之后,无法ssh免密码登录对方,scp的时候仍然需要输入密码....
但是别的节点删掉本地的~/.ssh/authorized_keys 之后就可以了.....
结果我就在其他6个节点执行了rm ~/.ssh/authorized_keys之后,再在node1运行这个shell就可以不用输入密码了 ....
root用户没有这个问题...
好奇怪. 有读者解决了之后可以和我沟通下,共同学习.
在这台node1机器分别执行ssh nodeX -p xxxx 之后选yes,会缓存knowhost
然后scp这个文件到其他的机器上去:
#!/bin/bash SERVERS="node1 node2 node3 node4 node5 node6 node7 node8" for SERVER in $SERVERS do scp -P 2121 ~/.ssh/known_hosts hadoop@$SERVER:~/.ssh/ done
这样其他的机器就不用首次输入yes了
至此,机器面密码登录搞定.
安装Java8:
cd /main tar -zvxf soft/jdk-8u171-linux-x64.tar.gz -C . vim /etc/profile # 最后加上 export JAVA_HOME=/main/jdk1.8.0_171 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
rm -rf /usr/bin/java* # 立即生效 source /etc/profile
zookeeper安装配置省略 ,基本就是解压 改路径,看心情改端口,设置myid就ok
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/main/zookeeper/data dataLogDir=/main/zookeeper/logs # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=192.168.101.173:2888:3888 server.2=192.168.101.183:2888:3888 server.3=192.168.101.193:2888:3888
安装hadoop:
安装5个节点
准备在node1-2五台机器安装,其中node1/node2作为master, node3-5作为slave
在node1的机器上 进行配置:
cd /main tar -zvxf soft/hadoop-2.8.4.tar.gz -C . chown -R hadoop:hadoop /main/hadoop-2.8.4/ su hadoop mkdir /main/hadoop-2.8.4/data mkdir /main/hadoop-2.8.4/data/journal mkdir /main/hadoop-2.8.4/data/tmp mkdir /main/hadoop-2.8.4/data/hdfs mkdir /main/hadoop-2.8.4/data/hdfs/namenode mkdir /main/hadoop-2.8.4/data/hdfs/datanode
继续修改/main/hadoop-2.8.4/etc/hadoop/下的:
hadoop-env.sh
mapred-env.sh
yarn-env.sh
三个文件设置java_home
export JAVA_HOME=/main/jdk1.8.0_171
hadoop-env.sh最后一行加上ssh的端口
export HADOOP_SSH_OPTS="-p 2121"
修改/main/hadoop-2.8.4/etc/hadoop/slaves文件
node3
node4
node5
修改/main/hadoop-2.8.4/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- fs.defaultFS需要设置成HDFS的逻辑服务名(需与hdfs-site.xml中的dfs.nameservices一致) --> <property> <name>fs.defaultFS</name> <value>hdfs://hjbdfs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/main/hadoop-2.8.4/data/tmp</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>192.168.101.173:2181,192.168.101.173:2181,192.168.101.173:2181</value> </property> </configuration>
更多参数详见: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
修改 hdfs-site.xml:
注意故障切换sshfence节点的端口如果ssh不是默认的22端口,需要设置为sshfence([[username][:port]])比如 sshfence(hadoop:2121) 否则active的namenode挂了之后,sshfence无法进去到另外一台机器去,导致无法自动切换主备.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- HDFS NN的逻辑名称,使用上面crore中fs.defaultFS设置的hjbdfs --> <property> <name>dfs.nameservices</name> <value>hjbdfs</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- 给定服务逻辑名称myhdfs的节点列表 --> <property> <name>dfs.ha.namenodes.hjbdfs</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通信地址,nn1所在地址 --> <property> <name>dfs.namenode.rpc-address.hjbdfs.nn1</name> <value>node1:8020</value> </property> <!-- nn1的http通信地址,外部访问地址 --> <property> <name>dfs.namenode.http-address.hjbdfs.nn1</name> <value>node1:50070</value> </property> <!-- nn2的RPC通信地址,nn2所在地址 --> <property> <name>dfs.namenode.rpc-address.hjbdfs.nn2</name> <value>node2:8020</value> </property> <!-- nn2的http通信地址,外部访问地址 --> <property> <name>dfs.namenode.http-address.hjbdfs.nn2</name> <value>node2:50070</value> </property> <!-- 指定NameNode的元数据在JournalNode日志上的存放位置(一般和zookeeper部署在一起) --> <!-- 设置一组 journalNode 的 URI 地址,active NN 将 edit log 写入这些JournalNode,而 standby NameNode 读取这些 edit log,并作用在内存中的目录树中。如果journalNode有多个节点则使用分号分割。该属性值应符合以下格式qjournal://host1:port1;host2:port2;host3:port3/journalId --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node3:8485;node4:8485;node5:8485/hjbdf_journal</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/main/hadoop-2.8.4/data/journal</value> </property> <!--指定cluster1出故障时,哪个实现类负责执行故障切换 --> <property> <name>dfs.client.failover.proxy.provider.hjbdfs</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!--解决HA集群脑裂问题(即出现两个 master 同时对外提供服务,导致系统处于不一致状态)。在 HDFS HA中,JournalNode 只允许一个 NameNode 写数据,不会出现两个 active NameNode 的问题.这是配置自动切换的方法,有多种使用方法,具体可以看官网,在文末会给地址,这里是远程登录杀死的方法 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence(hadoop:2121)</value> <description>how to communicate in the switch process</description> </property> <!-- 这个是使用sshfence隔离机制时才需要配置ssh免登陆 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔离机制超时时间,这个属性同上,如果你是用脚本的方法切换,这个应该是可以不配置的 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!-- 这个是开启自动故障转移,如果你没有自动故障转移,这个可以先不配 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/main/hadoop-2.8.4/data/hdfs/datanode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/main/hadoop-2.8.4/data/hdfs/namenode</value> </property> </configuration>
参考 http://ju.outofmemory.cn/entry/95494
https://www.cnblogs.com/meiyuanbao/p/3545929.html
官方参数: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
修改mapred-site.xml.template名称为mapred-site.xml并修改:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node1:19888</value> </property> </configuration>
配置yarn-site.xml:
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Site specific YARN configuration properties --> <!--启用resourcemanager ha--> <!--是否开启RM ha,默认是开启的--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--声明两台resourcemanager的地址--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rmcluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node2</value> </property> <!--指定zookeeper集群的地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>192.168.101.173:2181,192.168.101.173:2181,192.168.101.173:2181</value> </property> <!--启用自动恢复,当任务进行一半,rm坏掉,就要启动自动恢复,默认是false--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的状态信息存储在zookeeper集群,默认是存放在FileSystem里面。--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>
以上基础的都配置完了,打包:
tar -zcvf hadoop-2.8.4.ready.tar.gz /main/hadoop-2.8.4
然后用scp把包拷贝到其他4个节点
scp -P 2121 hadoop-2.8.4.ready.tar.gz hadoop@node2:/main
去另外4个节点解压 重命名文件夹为/main/hadoop-2.8.4即可
然后继续配置node1 node2这两个master
分别在node1和node2的yarn-site.xml上添加yarn.resourcemanager.ha.id :
类似与zookeeper的myid
<property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property>
<property> <name>yarn.resourcemanager.ha.id</name> <value>rm2</value> </property>
启动journal:
三个slave节点启动journal
[hadoop@node5 hadoop-2.8.4]$ /main/hadoop-2.8.4/sbin/hadoop-daemon.sh start journalnode starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node5.out [hadoop@node5 hadoop-2.8.4]$ jps 2272 Jps 2219 JournalNode
启动namenode:
格式化一个master的namenode: journal要起来才能格式化
不能再次格式化或者在另外的节点再次格式化,否则会导致nn和dn的namespaceID不一致而报错!!!
/main/hadoop-2.8.4/bin/hdfs namenode -format
启动 一个 namenode:
[hadoop@node1 hadoop]$ /main/hadoop-2.8.4/sbin/hadoop-daemon.sh start namenode starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node1.out [hadoop@node1 hadoop]$ jps 7536 Jps 7457 NameNode
[hadoop@node1 hadoop]$ ps -ef|grep namenode hadoop 7457 1 10 09:20 pts/4 00:00:08 /main/jdk1.8.0_171/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/main/hadoop-2.8.4/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/main/hadoop-2.8.4 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/main/hadoop-2.8.4/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/main/hadoop-2.8.4/logs -Dhadoop.log.file=hadoop-hadoop-namenode-node1.log -Dhadoop.home.dir=/main/hadoop-2.8.4 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/main/hadoop-2.8.4/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
此时应该可以访问node1的hdfs界面,状态为standby:
http://192.168.101.172:50070/
然后另外一个master先同步上一个master namenode的元数据信息再启动:
主要是为了同步data/hdfs/namenode/这些信息(包括namespaceID),否则两个节点不一致 会报错
/main/hadoop-2.8.4/bin/hdfs namenode -bootstrapStandby
会有成功下在到文件的日志:
18/06/28 11:01:56 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:01:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ===================================================== About to bootstrap Standby ID nn2 from: Nameservice ID: hjbdfs Other Namenode ID: nn1 Other NN's HTTP address: http://node1:50070 Other NN's IPC address: node1/192.168.101.172:8020 Namespace ID: 675265321 Block pool ID: BP-237410497-192.168.101.172-1530153904905 Cluster ID: CID-604da42a-d0a8-403b-b073-68c857c9b772 Layout version: -63 isUpgradeFinalized: true ===================================================== Re-format filesystem in Storage Directory /main/hadoop-2.8.4/data/hdfs/namenode ? (Y or N) Y 18/06/28 11:02:06 INFO common.Storage: Storage directory /main/hadoop-2.8.4/data/hdfs/namenode has been successfully formatted. 18/06/28 11:02:06 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:02:06 WARN common.Util: Path /main/hadoop-2.8.4/data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration. 18/06/28 11:02:06 INFO namenode.FSEditLog: Edit logging is async:true 18/06/28 11:02:06 INFO namenode.TransferFsImage: Opening connection to http://node1:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:675265321:1530153904905:CID-604da42a-d0a8-403b-b073-68c857c9b772&bootstrapstandby=true 18/06/28 11:02:06 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 18/06/28 11:02:06 INFO namenode.TransferFsImage: Transfer took 0.01s at 0.00 KB/s 18/06/28 11:02:06 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 323 bytes. 18/06/28 11:02:06 INFO util.ExitUtil: Exiting with status 0 18/06/28 11:02:06 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at node2/192.168.101.206 ************************************************************/
接着启动:
/main/hadoop-2.8.4/sbin/hadoop-daemon.sh start namenode
web界面的状态同样也是standby
手动强制让一个节点变为active的主节点,有问题,会抛出EOFExcption,导致namenode节点挂掉:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hdfs haadmin -transitionToActive nn1 --forcemanual
使用zookeeper自动接管namenode:
先把整个集群关闭,zookeeper不关,输入bin/hdfs zkfc -formatZK,格式化ZKFC
关闭
/main/hadoop-2.8.4/sbin/stop-dfs.sh
启动
/main/hadoop-2.8.4/bin/hdfs zkfc -formatZK
执行完毕之后zookeepr多出来一个hadoop-ha的节点:
启动整个集群:
/main/hadoop-2.8.4/sbin/start-dfs.sh
可以看到这个命令先后启动了namenode datanode journalnode 并启动了zkfc,进行了注册:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/sbin/start-dfs.sh 18/06/28 11:57:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [node1 node2] node1: starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node1.out node2: starting namenode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-namenode-node2.out node3: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node3.out node4: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node4.out node5: starting datanode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-datanode-node5.out Starting journal nodes [node3 node4 node5] node3: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node3.out node4: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node4.out node5: starting journalnode, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-journalnode-node5.out 18/06/28 11:58:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting ZK Failover Controllers on NN hosts [node1 node2] node1: starting zkfc, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-zkfc-node1.out node2: starting zkfc, logging to /main/hadoop-2.8.4/logs/hadoop-hadoop-zkfc-node2.out
可以看到zookeeper上已经注册好了服务:
zookeeper上两个节点的内容:
[zk: localhost:2181(CONNECTED) 10] get /hadoop-ha/hjbdfs/ActiveBreadCrumb hjbdfsnn2node2 �>(�> cZxid = 0x30000000a ctime = Thu Jun 28 11:49:36 CST 2018 mZxid = 0x30000000a mtime = Thu Jun 28 11:49:36 CST 2018 pZxid = 0x30000000a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 26 numChildren = 0
[zk: localhost:2181(CONNECTED) 16] get /hadoop-ha/hjbdfs/ActiveStandbyElectorLock hjbdfsnn1node1 �>(�> cZxid = 0x3000001a7 ctime = Thu Jun 28 11:54:13 CST 2018 mZxid = 0x3000001a7 mtime = Thu Jun 28 11:54:13 CST 2018 pZxid = 0x3000001a7 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x163f1ef62ad008c dataLength = 26 numChildren = 0
此时的NameNode:
[hadoop@node1 hadoop-2.8.4]$ jps 14885 NameNode 15191 DFSZKFailoverController 15321 Jps
[hadoop@node2 hadoop-2.8.4]$ jps 18850 NameNode 19059 Jps 18952 DFSZKFailoverController
3个DataNode都一样:
[hadoop@node3 hadoop-2.8.4]$ jps 5409 DataNode 5586 Jps 5507 JournalNode
此时可以发现一个节点是active的,一个节点是standby的.
初步判断安装成功,接下来是测试:
后来重启节点之后,active的节点是node1,开始zookeeper对NameNode的自动故障切换测试:
kill掉主节点Node1的NN进程:
[hadoop@node1 hadoop-2.8.4]$ jps 18850 NameNode 18952 DFSZKFailoverController 19103 Jps
可以看到故障切换的hadoop-hadoop-zkfc-node2.log日志:
2018-06-28 16:07:40,181 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ====== 2018-06-28 16:07:40,181 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(hadoop:2121) 2018-06-28 16:07:40,220 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to node1... 2018-06-28 16:07:40,222 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to node1 port 2121 2018-06-28 16:07:40,229 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_5.3 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.54 2018-06-28 16:07:40,236 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 2018-06-28 16:07:40,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521 2018-06-28 16:07:40,725 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521 2018-06-28 16:07:40,729 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent 2018-06-28 16:07:40,729 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: ssh-rsa,ssh-dss 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,730 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96 2018-06-28 16:07:40,731 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none 2018-06-28 16:07:40,732 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none 2018-06-28 16:07:40,769 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent 2018-06-28 16:07:40,770 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY 2018-06-28 16:07:40,804 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true 2018-06-28 16:07:40,811 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'node1' (RSA) to the list of known hosts. 2018-06-28 16:07:40,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent 2018-06-28 16:07:40,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received 2018-06-28 16:07:40,817 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent 2018-06-28 16:07:40,818 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received 2018-06-28 16:07:40,820 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password 2018-06-28 16:07:40,820 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic 2018-06-28 16:07:40,824 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password 2018-06-28 16:07:40,824 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey). 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to node1 2018-06-28 16:07:40,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 8020 2018-06-28 16:07:40,950 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Indeterminate response from trying to kill service. Verifying whether it is running using nc... 2018-06-28 16:07:40,966 WARN org.apache.hadoop.ha.SshFenceByTcpPort: nc -z node1 8020 via ssh: bash: nc: command not found 2018-06-28 16:07:40,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Verified that the service is down. 2018-06-28 16:07:40,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from node1 port 2121 2018-06-28 16:07:40,972 INFO org.apache.hadoop.ha.NodeFencer: ====== Fencing successful by method org.apache.hadoop.ha.SshFenceByTcpPort(hadoop:2121) ====== 2018-06-28 16:07:40,972 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/hjbdfs/ActiveBreadCrumb to indicate that the local node is the most recent active... 2018-06-28 16:07:40,973 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed 2018-06-28 16:07:40,979 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at node2/192.168.101.206:8020 active... 2018-06-28 16:07:41,805 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at node2/192.168.101.206:8020 to active state
启动Yarn,测试resourcemanager ha ,node1输入:
[hadoop@node1 sbin]$ /main/hadoop-2.8.4/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-resourcemanager-node1.out node3: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node3.out node4: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node4.out node5: starting nodemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-nodemanager-node5.out
node2启动resourcemanager:
[hadoop@node2 sbin]$ /main/hadoop-2.8.4/sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /main/hadoop-2.8.4/logs/yarn-hadoop-resourcemanager-node2.out
浏览器敲node2的resourcemanager地址http://192.168.101.206:8088 会自动跳转到 http://node1:8088 也就是http://192.168.101.172:8088/cluster
测试HDFS文件上传/下载/删除:
新建了一个 word_i_have_a_dream.txt 里面放的马丁路德金的英文演讲稿.
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -put word_i_have_a_dream.txt /word.txt 18/06/28 16:39:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /word.txt 18/06/28 16:39:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 4805 2018-06-28 16:39 /word.txt
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -rm /word.txt 18/06/28 16:39:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Deleted /word.txt
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /word.txt 18/06/28 16:40:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ls: `/word.txt': No such file or directory
下载是get命令,更多见官网或其他博客如https://www.cnblogs.com/lzfhope/p/6952869.html
运行经典wordcount测试:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop jar /main/hadoop-2.8.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar wordcount /word.txt /wordoutput 18/06/28 16:50:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/06/28 16:50:36 INFO input.FileInputFormat: Total input files to process : 1 18/06/28 16:50:36 INFO mapreduce.JobSubmitter: number of splits:1 18/06/28 16:50:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530173899165_0001 18/06/28 16:50:37 INFO impl.YarnClientImpl: Submitted application application_1530173899165_0001 18/06/28 16:50:37 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1530173899165_0001/ 18/06/28 16:50:37 INFO mapreduce.Job: Running job: job_1530173899165_0001 18/06/28 16:50:49 INFO mapreduce.Job: Job job_1530173899165_0001 running in uber mode : false 18/06/28 16:50:49 INFO mapreduce.Job: map 0% reduce 0% 18/06/28 16:50:58 INFO mapreduce.Job: map 100% reduce 0% 18/06/28 16:51:07 INFO mapreduce.Job: map 100% reduce 100% 18/06/28 16:51:08 INFO mapreduce.Job: Job job_1530173899165_0001 completed successfully 18/06/28 16:51:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=4659 FILE: Number of bytes written=330837 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4892 HDFS: Number of bytes written=3231 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=6187 Total time spent by all reduces in occupied slots (ms)=6411 Total time spent by all map tasks (ms)=6187 Total time spent by all reduce tasks (ms)=6411 Total vcore-milliseconds taken by all map tasks=6187 Total vcore-milliseconds taken by all reduce tasks=6411 Total megabyte-milliseconds taken by all map tasks=6335488 Total megabyte-milliseconds taken by all reduce tasks=6564864 Map-Reduce Framework Map input records=32 Map output records=874 Map output bytes=8256 Map output materialized bytes=4659 Input split bytes=87 Combine input records=874 Combine output records=359 Reduce input groups=359 Reduce shuffle bytes=4659 Reduce input records=359 Reduce output records=359 Spilled Records=718 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=194 CPU time spent (ms)=1860 Physical memory (bytes) snapshot=444248064 Virtual memory (bytes) snapshot=4178436096 Total committed heap usage (bytes)=317718528 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=4805 File Output Format Counters Bytes Written=3231
可以看到HDFS上/wordoutput下生成了输出文件:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls / 18/06/28 16:52:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items drwx------ - hadoop supergroup 0 2018-06-28 16:50 /tmp -rw-r--r-- 3 hadoop supergroup 4805 2018-06-28 16:43 /word.txt drwxr-xr-x - hadoop supergroup 0 2018-06-28 16:51 /wordoutput
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput 18/06/28 16:52:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 3 hadoop supergroup 0 2018-06-28 16:51 /wordoutput/_SUCCESS -rw-r--r-- 3 hadoop supergroup 3231 2018-06-28 16:51 /wordoutput/part-r-00000
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput/_SUCCESS 18/06/28 16:53:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 0 2018-06-28 16:51 /wordoutput/_SUCCESS
可以看到_SUCCESS大小是0 所以内容在part-r-00000上,撸到本地看一眼:
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -ls /wordoutput/part-r-00000 18/06/28 16:54:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 3 hadoop supergroup 3231 2018-06-28 16:51 /wordoutput/part-r-00000
[hadoop@node1 hadoop-2.8.4]$ /main/hadoop-2.8.4/bin/hadoop fs -get /wordoutput/part-r-00000 word_success.txt 18/06/28 16:54:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@node1 hadoop-2.8.4]$ vim word_success.txt
生成了我们想要的结果
接下来试试kill掉NN master测试下HA:
注意:
如果报错链接被拒绝,就要去确认之前另一个master namenode自身的服务是否监听正常,防火墙是否关闭,以及它的hosts文件与hdfs-site.xml文件中namenode的配置是否一致.
如果配置文件中是主机名,那么每个节点都要在hosts中映射所有集群节点的ip,自身主机名要映射为内网ip,不能映射为127.0.0.1,否则会导致自身服务绑定在127.0.0.1:xxxx上,局域网其他节点无法链接!!!
如果遇到问题 需要重新格式化NameNode,需要清理所有节点的老信息,否则会因为老的DataNode节点的namespaceID不同而无法正确启动:
rm -rf /main/hadoop-2.8.4/data/journal/* rm -rf /main/hadoop-2.8.4/data/hdfs/namenode/* rm -rf /main/hadoop-2.8.4/data/hdfs/datanode/* rm -rf /main/hadoop-2.8.4/logs/*
注意故障切换sshfence节点的端口如果ssh不是默认的22端口,需要设置为sshfence([[username][:port]])比如 sshfence(hadoop:2121) 否则active的namenode挂了之后,sshfence无法进去到另外一台机器去,导致无法自动切换主备.
安装若遇到问题,在/main/hadoop-2.8.4/logs目录下的各种*.log文件有详细的日志
HBASE安装
HMaster没有单点问题,HBase中可以启动多个HMaster,通过Zookeeper的Master Election机制保证总有一个Master运行。 所以这里要配置HBase高可用的话,只需要启动两个HMaster,让Zookeeper自己去选择一个Master Acitve。
5台机器都解压到/main下:
tar -zvxf /main/soft/hbase-1.2.6.1-bin.tar.gz -C /main/
在Hadoop配置的基础上,配置环境变量HBASE_HOME、hbase-env.sh
vim /etc/profile 设置如下:
#java 之前hadoop时候设置的 export JAVA_HOME=/main/jdk1.8.0_171 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin #hbase export HBASE_HOME=/main/hbase-1.2.6.1 export PATH=$HBASE_HOME/bin:$PATH
HBASE的hbase-env.sh设置如下:
设置javahome 关闭Hbase自带的zk 使用我们自己安装的zk 同时设置ssh的端口
export JAVA_HOME=/main/jdk1.8.0_171 export HBASE_MANAGES_ZK=false
export HBASE_SSH_OPTS="-p 2121"
设置hbase-site.xml
官网有配置文件说明:
http://abloz.com/hbase/book.html
https://hbase.apache.org/2.0/book.html#example_config
https://hbase.apache.org/2.0/book.html#config.files
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--命名空间--> <property> <name>hbase.rootdir</name>
<!--这里要和hadoop的HDFS的servicename名字一致,否则会报错!--> <value>hdfs://hjbdfs/hbase</value> <description>The directory shared by RegionServers.</description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.master.port</name> <value>16000</value> <description>The port the HBase Master should bind to.</description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>192.168.101.173,192.168.101.183,192.168.101.193</value> <description>逗号分割的zk服务器地址 Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> <description>Property from ZooKeeper's config zoo.cfg.The port at which the clients will connect.</description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/main/zookeeper/data</value> <description>zk配置文件zoo.cfg中的data目录地址 Property from ZooKeeper config zoo.cfg.The directory where the snapshot is stored.</description> </property> <property> <name>hbase.tmp.dir</name> <value>/main/hbase-1.2.6.1/hbase/tmp</value> </property> </configuration>
拷贝haddop的配置文件到HBASE目录下,关联二者:
[hadoop@node1 hbase-1.2.6.1]$ cp /main/hadoop-2.8.4/etc/hadoop/core-site.xml /main/hbase-1.2.6.1/conf/ [hadoop@node1 hbase-1.2.6.1]$ cp /main/hadoop-2.8.4/etc/hadoop/hdfs-site.xml /main/hbase-1.2.6.1/conf/
vim regionservers
node3
node4
node5
官网说明:
regionservers: A plain-text file containing a list of hosts which should run a RegionServer in your HBase cluster. By default this file contains the single entry localhost
. It should contain a list of hostnames or IP addresses, one per line, and should only contain localhost
if each node in your cluster will run a RegionServer on its localhost
interface.
vim backup-masters
在node1的机器上写node2 这样的话,以后在node1上start集群,就会把node2作为备份的Master
同理可以在node2上写node1,这样以后可以同时在两个节点进行操作
node2
官网说明:
backup-masters: Not present by default. A plain-text file which lists hosts on which the Master should start a backup Master process, one host per line.
启动HMaster 任意一个master节点均可:
[hadoop@node2 bin]$
/main/hbase-1.2.6.1/bin
/start-hbase.sh starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node2.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node5.out node3: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node3.out node4: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node4.out node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node1: starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node1.out node1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
在第一个HMaster的节点上执行:
[hadoop@node1 hbase-1.2.6.1]$ /main/hbase-1.2.6.1/bin/start-hbase.sh starting master, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-master-node1.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node3: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node3.out node4: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node4.out node5: starting regionserver, logging to /main/hbase-1.2.6.1/bin/../logs/hbase-hadoop-regionserver-node5.out node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node3: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node4: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 node5: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
会根据配置文件自动去SSH到RegionServer进行启动,而自身节点如果不是RegionServer,启动完也不会有变化.
现在开始进入第二个HMaster节点,手动再启动一个Hmaster
RegionServer上启动前后的进程变化:
[hadoop@node3 hbase-1.2.6.1]$ jps 7460 DataNode 8247 Jps 7562 JournalNode 7660 NodeManager [hadoop@node3 hbase-1.2.6.1]$ vim conf/hbase-env.sh [hadoop@node3 hbase-1.2.6.1]$ jps 7460 DataNode 8408 Jps 7562 JournalNode 8300 HRegionServer 7660 NodeManager
从节点是node1:
主节点是node2 :
我们去node2 kill掉进程,测试master的灾备:
[hadoop@node2 bin]$ jps 3809 Jps 1412 NameNode 3607 HMaster 1529 DFSZKFailoverController [hadoop@node2 bin]$ kill 3607 [hadoop@node2 bin]$ jps 3891 Jps 1412 NameNode 1529 DFSZKFailoverController
node1成功变成了主节点:
可以用./hbase-daemon.sh start master命令再次启动挂掉的master
以上安装完毕之后,node1 node2:
[hadoop@node1 hbase-1.2.6.1]$ jps 31458 NameNode 31779 DFSZKFailoverController 5768 Jps 5482 HMaster 31871 ResourceManager
node3-5:
[hadoop@node3 hbase-1.2.6.1]$ jps 9824 Jps 9616 HRegionServer 7460 DataNode 7562 JournalNode 7660 NodeManager
Spark的安装:
在另外一堆集群上,也如上安装了hadoop
192.168.210.114 node1 192.168.210.115 node2 192.168.210.116 node3 192.168.210.117 node4 192.168.210.134 node5 192.168.210.135 node6 192.168.210.136 node7 192.168.210.137 node8
spark安装在1-5上 spark依赖于hadoop:
所以下载的时候需要注意http://spark.apache.org/downloads.html 选择和hadoop兼容的版本:
同时需要下载其对应的scala版本: 去http://spark.apache.org/docs/latest/ 看看读应的scala版本
所以我们下载了scala 2.11.11 解压到/main下设置好 PATH即可.
下载spark-2.3.1-bin-hadoop2.7.tgz解压到/main后:
编辑 spark-env.sh,根据实际情况设置:
#!/usr/bin/env bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # Options read by executors and drivers running inside the cluster # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node # - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program # - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data # - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client/cluster mode # - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files # - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN # - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1). # - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G) # - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G) # Options for the daemons used in the standalone deploy mode # - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master # - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y") # - SPARK_WORKER_CORES, to set the number of cores to use on this machine # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g) # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y") # - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g). # - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y") # - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y") # - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y") # - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons # - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers # Generic options for the daemons used in the standalone deploy mode # - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf) # - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs) # - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp) # - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER) # - SPARK_NICENESS The scheduling priority for daemons. (Default: 0) # - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file. # Options for native BLAS, like Intel MKL, OpenBLAS, and so on. # You might get better performance to enable these options if using native BLAS (see SPARK-21305). # - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL # - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS export JAVA_HOME=/main/server/jdk1.8.0_11 export SCALA_HOME=/main/scala-2.11.11 export HADOOP_HOME=/main/hadoop-2.8.4 export HADOOP_CONF_DIR=/main/hadoop-2.8.4/etc/hadoop export SPARK_WORKER_MEMORY=4g export SPARK_EXECUTOR_MEMORY=4g export SPARK_DRIVER_MEMORY=4G export SPARK_WORKER_CORES=2 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.210.38:2181,192.168.210.58:2181,192.168.210.78:2181 -Dspark.deploy.zookeeper.dir=/spark" export SPARK_SSH_OPTS="-p 2121"
vim slaves
node1
node2
node3
node4
node5
然后即可启动,先启动所有的slaves,然后手动启动两个master:
start-all会自动把自己作为master启动,然后再去启动slaves文件中所有的worker
[hadoop@node1 conf]$ /main/spark-2.3.1-bin-hadoop2.7/sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out node2: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out node1: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node1.out node3: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out node4: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node4.out node5: starting org.apache.spark.deploy.worker.Worker, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node5.out
[hadoop@node1 conf]$ jps 11299 Master 11411 Worker 5864 NameNode 6184 DFSZKFailoverController 11802 Jps 6301 ResourceManager 6926 HMaster
其他机器都有worker进程,然后为了HA我们再去node2启动一个Spark的Master
[hadoop@node2 conf]$ jps 5536 Jps 2209 DFSZKFailoverController 2104 NameNode 2602 HMaster 5486 Worker [hadoop@node2 conf]$ /main/spark-2.3.1-bin-hadoop2.7/sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /main/spark-2.3.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node2.out [hadoop@node2 conf]$ jps 5568 Master 2209 DFSZKFailoverController 2104 NameNode 2602 HMaster 5486 Worker 5631 Jps
去看看两个master:
另一个是空的:
Storm安装:
先抄一张图:
下载apache-storm-1.2.2.tar.gz 解压到node6-8节点/main下,设置:
export STORM_HOME=/main/apache-storm-1.2.2
vim storm.yaml 如下,3台相同:
storm.zookeeper.servers: - "192.168.210.38" - "192.168.210.58" - "192.168.210.78" storm.zookeeper.port: 2181 storm.local.dir: "/main/apache-storm-1.2.2/data" nimbus.seeds: ["192.168.210.135"] supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 storm.health.check.dir: "healthchecks" storm.health.check.timeout.ms: 5000
其中storm.local.dir指定的目录需要提前创建,supervisor.slots.ports配置的端口数量决定了每台supervisor机器的worker数量,每个worker会有自己的监听端口用于监听任务。
创建目录:
mkdir /main/apache-storm-1.2.2/data
主控节点(可以是多个,见nimbus.seeds: ["192.168.210.135"] 配置)
先启动主控:
# 启动主控 nohup /main/apache-storm-1.2.2/bin/storm nimbus & # 启动主控 ui nohup /main/apache-storm-1.2.2/bin/storm ui & # 启动supervisor nohup /main/apache-storm-1.2.2/bin/storm supervisor &
最后在另外两台机器启动supervisor:
nohup /main/apache-storm-1.2.2/bin/storm supervisor &
主控节点的ui页面可以看到所有supervisor信息
然后从github下载storm源码(或者从storm的apache主页下载src的zip解压),切换到对应的分支,如1.1.x或1.x或2.x. 目前2.x还是快照版,我们安装的是1.2.2,切换分支到1.x之后,
特别是github下载的源码,一定要执行一遍:
mvn clean install -DskipTests
因为github上的代码可能更新,相关的jar并没有发布到maven仓库,所以上面的命令就编译了一堆storm依赖包到本地mavan仓库.
ps:上述过程不要使用maven第三方镜像,否则很可能会出错!
我们切换到1.x后发现最新的是storm-starter-1.2.3-SNAPSHOT版本,测试一下服务器的1.2.2是否兼容:
把storm-starter-1.2.3-SNAPSHOT.jar传到服务器,运行一下demo试试:
[hadoop@node6 main]$ /main/apache-storm-1.2.2/bin/storm jar storm-starter-1.2.3-SNAPSHOT.jar org.apache.storm.starter.WordCountTopology word-count Running: /main/server/jdk1.8.0_11/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/main/apache-storm-1.2.2 -Dstorm.log.dir=/main/apache-storm-1.2.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /main/apache-storm-1.2.2/*:/main/apache-storm-1.2.2/lib/*:/main/apache-storm-1.2.2/extlib/*:storm-starter-1.2.3-SNAPSHOT.jar:/main/apache-storm-1.2.2/conf:/main/apache-storm-1.2.2/bin -Dstorm.jar=storm-starter-1.2.3-SNAPSHOT.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} org.apache.storm.starter.WordCountTopology word-count 955 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.2 old null 991 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -7683379793985025786:-5178094576454792625 1122 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : node6:6627 1182 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds [] 1190 [main] INFO o.a.s.u.NimbusClient - Found leader nimbus : node6:6627 1250 [main] INFO o.a.s.StormSubmitter - Uploading dependencies - jars... 1251 [main] INFO o.a.s.StormSubmitter - Uploading dependencies - artifacts... 1251 [main] INFO o.a.s.StormSubmitter - Dependency Blob keys - jars : [] / artifacts : [] 1289 [main] INFO o.a.s.StormSubmitter - Uploading topology jar storm-starter-1.2.3-SNAPSHOT.jar to assigned location: /main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar Start uploading file 'storm-starter-1.2.3-SNAPSHOT.jar' to '/main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar' (106526828 bytes) [==================================================] 106526828 / 106526828 File 'storm-starter-1.2.3-SNAPSHOT.jar' uploaded to '/main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar' (106526828 bytes) 2876 [main] INFO o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /main/apache-storm-1.2.2/data/nimbus/inbox/stormjar-5ed677a5-9af1-4b5e-8467-83f637e00506.jar 2876 [main] INFO o.a.s.StormSubmitter - Submitting topology word-count in distributed mode with conf {"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-7683379793985025786:-5178094576454792625","topology.workers":3,"topology.debug":true} 2876 [main] WARN o.a.s.u.Utils - STORM-VERSION new 1.2.2 old 1.2.2 4091 [main] INFO o.a.s.StormSubmitter - Finished submitting topology: word-count
显示提交人物完毕,接下来看看UI:
也可以停止这个Topology:
[hadoop@node6 main]$ /main/apache-storm-1.2.2/bin/storm kill word-count Running: /main/server/jdk1.8.0_11/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/main/apache-storm-1.2.2 -Dstorm.log.dir=/main/apache-storm-1.2.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /main/apache-storm-1.2.2/*:/main/apache-storm-1.2.2/lib/*:/main/apache-storm-1.2.2/extlib/*:/main/apache-storm-1.2.2/extlib-daemon/*:/main/apache-storm-1.2.2/conf:/main/apache-storm-1.2.2/bin org.apache.storm.command.kill_topology word-count