avatar hadoop 安装流程、hadoop/hdfs热备份方案
avator hadoop的安装是一个磨砺人心智的过程,仅在此记录曾经的辛酸:
1、基本配置:hosts、防火墙、免密钥;
2、浮动IP配置:
安装ucarp-1.5.2-1.el6.rf.x86_64.rpm包;
将ucarp.sh, vip-down.sh和vip-up.sh拷贝到主备两台机器的/etc目录下,增加执行权限:
#!/bin/sh ucarp --interface=eth0 --srcip=192.168.1.1 --vhid=24 --pass=mypassword \ --192.168.1.204 \ --upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh
#! /bin/sh /sbin/ip addr del 192.168.1.204/24 dev eth0
#! /bin/sh /sbin/ip addr add 192.168.1.204/24 dev eth0 AvatarNode=$(/xxx/jdk/bin/jps | grep "AvatarNode") if [ -n "$AvatarNode" ]; then Standby=$(/xxx/hadoop/bin/hadoop org.apache.hadoop.hdfs.AvatarShell -showAvatar | grep "Standby") if [ -n "$Standby" ]; then /xxx/hadoop/bin/hadoop dfsadmin -saveNamespace /xxx/hadoop/bin/hadoop org.apache.hadoop.hdfs.AvatarShell -setAvatar primary fi fi
将ucarp.sh中的第一个IP地址修改为本机的固定IP,将ucarp.sh中的第二个IP地址修改为浮动IP,将vip-down.sh和vip-up.sh中的IP地址修改为浮动IP;
分别在主备机上执行ucarp.sh,先执行者为主机,后执行者为备机。ucarp.sh为永久执行程序,所以须用nohup后台执行;
浮动IP设置成功后的查看命令:ip address show;同时可以进行测试,如关闭第一台机器或拔掉其网线,ping虚拟IP仍然能通,标志配置成功!
3、NFS配置:
作用:利用NFS实现FLOG和EDIT的热备份
在standby机器上将hdfs-site.xml中的配置文件里的dfs.name.dir.shared0/1属性的父目录挂载到NFS服务中,vim /etc/exports增加"/xxx/avatarshare *(rw,sync,no_root_squash)";
4、节点参数配置:
primary(AvatarNode0)节点:
1).core-site.xm:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- special parameters for avatarnode --> <property> <name>fs.default.name0</name> <value>hdfs://0.0.0.0:9000</value> </property> <property> <name>fs.default.name1</name> <value>hdfs://192.168.1.2:9000</value> </property> <!-- special parameters for avatarnode --> <property> <name>fs.default.name</name> <value>hdfs://192.168.1.204:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
其中,fs.default.name0为AvatarNode0上NameNode的RPC服务器地址,值为hdfs://0.0.0.0:9000,使得客户端用物理IP和虚拟IP都可以访问此NameNode;fs.default.name1则是AvatarNode1上NameNode的RPC服务器地址;
2).hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <!-- special parameters for avatarnode --> <configuration> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir.shared0</name> <value>/xxx/avatarshare/share0/namenode</value> </property> <property> <name>dfs.name.edits.dir.shared0</name> <value>/xxx/avatarshare/share0/editlog</value> </property> <property> <name>dfs.http.address0</name> <value>0.0.0.0:50070</value> </property> <property> <name>dfs.name.dir.shared1</name> <value>/xxx/avatarshare/share1/namenode</value> </property> <property> <name>dfs.name.edits.dir.shared1</name> <value>/xxx/avatarshare/share1/editlog</value> </property> <property> <name>dfs.http.address1</name> <value>192.168.1.2:50070</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:50070</value> </property> <property> <name>dfs.name.dir</name> <value>/xxx/local/namenode</value> </property> <property> <name>dfs.name.edits.dir</name> <value>/xxx/local/editlog</value> </property> </configuration>
在此配置文件里,除了Hadoop的配置项外,Avatar的配置项dfs.name.dir.shared0,dfs.name.edits.dir.shared0,dfs.name.dir.shared1,dfs.name.edits.dir.shared1,分别为AvatarNode0上HDFS的镜像日志存储目录,AvatarNode1上HDFS的镜像日志存储目录。可以看到这些目录都在NFS的共享目录中,当AvatarNode0上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share0中写日志,AvatarNode1上的StandbyNameNode就会去读这些日志,反之,当AvatarNode1上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share1中写日志,AvatarNode0上的StandbyNameNode就会去读这些日志。
standby节点(AvatarNode1)节点
1).core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- special parameters for avatarnode --> <property> <name>fs.default.name0</name> <value>hdfs://192.168.1.1:9000</value> </property> <property> <name>fs.default.name1</name> <value>hdfs://0.0.0.0:9000</value> </property> <!-- special parameters for avatarnode --> <property> <name>fs.default.name</name> <value>hdfs://192.168.1.204:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
2).hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <!-- special parameters for avatarnode --> <configuration> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir.shared0</name> <value>/xxx/avatarshare/share0/namenode</value> </property> <property> <name>dfs.name.edits.dir.shared0</name> <value>/xxx/avatarshare/share0/editlog</value> </property> <property> <name>dfs.http.address0</name> <value>192.168.1.1:50070</value> </property> <property> <name>dfs.name.dir.shared1</name> <value>/xxx/avatarshare/share1/namenode</value> </property> <property> <name>dfs.name.edits.dir.shared1</name> <value>/xxx/avatarshare/share1/editlog</value> </property> <property> <name>dfs.http.address1</name> <value>0.0.0.0:50070</value> </property> <property> <name>dfs.name.dir</name> <value>/xxx/local/namenode</value> </property> <property> <name>dfs.name.edits.dir</name> <value>/xxx/local/editlog</value> </property> </configuration>
datanode节点
1).core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- special parameters for avatarnode --> <property> <name>fs.default.name0</name> <value>hdfs://192.168.1.1:9000</value> </property> <property> <name>fs.default.name1</name> <value>hdfs://192.168.1.2:9000</value> </property> <!-- special parameters for avatarnode --> <property> <name>fs.default.name</name> <value>hdfs://192.168.1.204:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
2).hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <!-- special parameters for avatarnode --> <configuration> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.http.address0</name> <value>192.168.1.1:50070</value> </property> <property> <name>dfs.http.address1</name> <value>192.168.1.2:50070</value> </property> <!-- special parameters for avatarnode --> <property> <name>dfs.http.address</name> <value>192.168.1.204:50070</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.data.dir</name> <value>/data0,/data1</value> </property> </configuration>
5、启动:
1).在primary机器上执行命令"mount -v -t nfs -o tcp,soft,retry=2,timeo=2,rsize=32768,wsize=32768 192.168.1.x:/xxx/avatarshare /xxx/avatarshare",将standby节点的持久化目录挂载过来;
2).在primary节点上执行"$HADOOP_HOME/bin/hadoop namenode -format";然后将/xxx/local目录下的namenode、editlog的copy到NFS共享目录share0、share1中;再执行"$HADOOP_HOME/bin/hadoop namenode org.apache.hadoop.hdfs.server.namenode.AvatarNode -zero -format"进行格式化操作;
3).在standby节点上执行"$HADOOP_HOME/bin/hadoop namenode org.apache.hadoop.hdfs.server.namenode.AvatarNode -one –format"进行格式化;
4).启动primary节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.namenode.AvatarNode –zero";
5).启动standby节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.namenode.AvatarNode -one -standby -sync";
6).启动datanode节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.datanode";
6、过程是艰辛的,当初摸索了3天结果因为版本问题弄的蛋都碎了;常见问题解决思路如下:
1).hdfs-site.xml挂载路径的问题:自己尝试下自己的理解,然后将配置改成自己认为的0/1,最后你会恍然大悟的;
2).最尼玛蛋疼的就是格式化的问题:反复的格式化蛋都碎过好几次,最后在版本不对的情况下摸索了一个方法,先启动primary节点,然后再格式化standby节点并启动之,当然这是在上述步骤失效的情况下的下策;
3).可以用netstat -anp |grep myport看本机要启动的服务是否启动到位,经常遇到的情况是地址绑定到ipv6上了,这里两种思路:一是彻底禁用ipv6,而是在hadoop-evn.sh中添加"export HADOOP_OPTS="-Djava.net.preferlIPv4Stack=true"让java程序使用ipv4(ps:特别观察下preferIPv4Stack,而不是.preferlIPv4Stack,多一个l耗费了我大半天,关键是尼玛心累)
7、如需帮助请发送邮件到lifeonhadoop@gmail.com;同时找先前资料将本文进一步完善,让大家按着步骤一次成功!