Hadoop-HA模式-部署步骤

Hadoop-HA模式-部署步骤

1.在宿主机(192.168.15.47)上创建文件夹:/data/zkdocker/bigdata/download/ha
2.创建hadoopHA配置文件:core-site.xml,hdfs-site.xml,hosts_ansible.ini,mapred-site.xml,yarn-site.xml
3.在宿主机(192.168.15.47)目录/data/zkdocker/bigdata/download下载好软件:
    hadoop-2.7.7.tar.gz,jdk-8u201-linux-x64.tar.gz,zookeeper-3.4.14.tar.gz
4.宿主机上使用docker创建10个centos7镜像:
    docker run -it --name=hadoop01 --hostname=hadoop01 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop02 --hostname=hadoop02 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop03 --hostname=hadoop03 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop04 --hostname=hadoop04 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop05 --hostname=hadoop05 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop06 --hostname=hadoop06 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=hadoop07 --hostname=hadoop07 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=slave1 --hostname=slave1 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=slave2 --hostname=slave2 -v /data/zkdocker/bigdata:/tmp centos7
    docker run -it --name=slave3 --hostname=slave3 -v /data/zkdocker/bigdata:/tmp centos7
5.宿主机进入容器命令: docker exec -it hadoop01 /bin/bash
6.每个容器中都执行安装ssh:
    yum install which -y
    yum install openssl openssh-server openssh-clients
    mkdir  /var/run/sshd/
    sed -i "s/UsePAM.*/UsePAM no/g" /etc/ssh/sshd_config
    ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
    ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
    ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
    /usr/sbin/sshd -D &
7.每个容器中都创建hadoop用户
    useradd hadoop
    passwd hadoop  密码:hadoop
8.配置10个容器的hosts
    vi /etc/hosts
9.进入hadoop账户
    su - hadoop
10.ssh免密登录:
    每个容器都执行:ssh-keygen -t rsa
                   ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01
    hadoop01,hadoop02上执行:ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01
                            ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop02
                            以此类推...
11.宿主机上用ansible复制xml
    yum install -y ansible
    vi /etc/ansible/hosts
    ansible all -m ping
    su - hadoop
    ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/core-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644"
    ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/hdfs-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644"
    ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/mapred-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644"
    ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/yarn-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644"
12.每个容器都用hadoop账户安装java,hadoop,slave1-3安装zookeeper
    mkdir /home/hadoop/javalib18
    cd /home/hadoop/javalib18
    tar -zxf /tmp/download/jdk-8u201-linux-x64.tar.gz
    mv jdk1.8.0_201/ jdk
    cd ..
    tar -zxf /tmp/download/hadoop-2.7.7.tar.gz
    mv hadoop-2.7.7/ hadoop
    tar -zxf zookeeper-3.4.14.tar.gz
    mv zookeeper-3.3.14/ zookeeper
13.每个容器都配置~/.bashrc
    vi ~/.bashrc
    export JAVA_HOME=/home/hadoop/javajdk18/jdk
    export ZOOKEEPER_HOME=/home/hadoop/zookeeper
    export HADOOP_HOME=/home/hadoop/hadoop
    export PATH=$PATH:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
14.source编译
    source ~/.bashrc
15.在slave1,slave2,slave3分别启动zookeeper
    zkServer.sh start
    zkServer.sh status
    验证成功:两个follower,一个leader
16.在slave1,slave2,slave3分别启动journalnode
    hadoop-daemon.sh start journalnode
    ps ajx|grep java|awk '{print $11}'|cut -d _ -f 2
    验证成功:三个journalnode进程
17.在hadoop01容器中格式化第一个namenode:
    hdfs namenode -format
    hadoop-daemon.sh start namenode
    验证成功:一个namenode进程
18.在hadoop02容器中同步第一个namenode的数据
    hdfs namenode -bootstrapStandby
    hadoop-daemon.sh start namenode
    验证成功:两个个namenode进程
19.web查看namenode
    http://hadoop01:50070  --> 只是都在standby状态
    http://hadoop02:50070  --> 只是都在standby状态
20.在hadoop01容器中手动切换nn1为激活状态
    hdfs haadmin -transitionToActive nn1 -->此命令激活失败,必须强制切换
    hdfs haadmin -transitionToActive --forcemanual nn1
    验证成功:http://hadoop01:50070  --> active状态
              http://hadoop02:50070  --> 还是standby状态

              hdfs haadmin -getServiceState nn1  --> active
              hdfs haadmin -getServiceState nn2  --> standby
21.在zookeeper上配置故障自动转移节点
    hdfs zkfc -formatZK
    在slave1上运行zkCli.sh: 
    ls /
    [zookeeper, hadoop-ha]
22.在hadoop01容器中启动集群
    start-dfs.sh

    目前所有容器中的进程情况:
    hadoop01    namenode zkfc
    hadoop02    namenode zkfc
    hadoop03
    hadoop04
    hadoop05    datanode
    hadoop06    datanode
    hadoop07    datanode
    slave1      journalnode zookeeper  
    slave2      journalnode zookeeper
    slave3      journalnode zookeeper
     

23.验证高可用,kill一个nn1:
    hadoop01中kill -9 namenode的进程号
    hdfs haadmin -getServiceState nn1 --> 失败
    hdfs haadmin -getServiceState nn2 --> active
24.重新启动,同步nn2, 再启动nn1:
    hdfs namenode -bootstrapStandby
    hadoop-daemon.sh start namenode

25.用ansible批量建立互信(没有用好,无效果)
    playbook:pushssh.yaml
---
  - hosts: all
    user: hadoop
    tasks:
     - name: ssh-copy
       authorized_key: user=hadoop key="{{ lookup('file', '/home/hadoop/.ssh/id_rsa.pub') }}"

    运行命令:ansible-playbook pushssh.yaml

26.Yarn Ha启动
    在hadoop03容器中启动:start-yarn.sh
    在hadoop04容器中启动:start-yarn.sh
27.检测rm状态
    bin/yarn rmadmin -getServiceState rm1   --> active
    bin/yarn rmadmin -getServiceState rm2   --> standby

    在hadoop03容器中kill掉rm1
    bin/yarn rmadmin -getServiceState rm1   --> 离线
    bin/yarn rmadmin -getServiceState rm2   --> active

    启动rm1
    在hadoop03容器中启动:sbin/yarn-daemon.sh start resourcemanager
    bin/yarn rmadmin -getServiceState rm1   --> standby
    bin/yarn rmadmin -getServiceState rm2   --> active

 core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hadoop/tmp</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>slave1:2181,slave2:2181,slave3:2181</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.nameservices</name>
        <value>ns1</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.ns1</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
        <value>hadoop01:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
        <value>hadoop01:50070</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
        <value>hadoop02:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
        <value>hadoop02:50070</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://slave1:8485;slave2:8485;slave3:8485/ns1</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/hadoop/hadoop/journaldata</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.ns1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!--日志聚合-->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!--任务历史服务-->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://hadoop01:19888/jobhistory/logs/</value>
        </property>

        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>86400</value>
        </property>

        <!--启用resourcemanager ha-->
        <property>
                <name>yarn.resourcemanager.ha.enabled</name>
                <value>true</value>
        </property>

        <!--声明两台resourcemanager的地址-->
        <property>
                <name>yarn.resourcemanager.cluster-id</name>
                <value>cluster-yarn1</value>
        </property>

        <property>
                <name>yarn.resourcemanager.ha.rm-ids</name>
                <value>rm1,rm2</value>
        </property>

        <property>
                <name>yarn.resourcemanager.hostname.rm1</name>
                <value>hadoop03</value>
        </property>

        <property>
                <name>yarn.resourcemanager.hostname.rm2</name>
                <value>hadoop04</value>
        </property>

        <!--指定zookeeper集群的地址-->
        <property>
                <name>yarn.resourcemanager.zk-address</name>
                <value>slave1:2181,slave2:2181,slave3:2181</value>
        </property>

        <!--启用自动恢复-->
        <property>
                <name>yarn.resourcemanager.recovery.enabled</name>
                <value>true</value>
        </property>

        <!--指定resourcemanager的状态信息存储在zookeeper集群-->
        <property>
                <name>yarn.resourcemanager.store.class</name>     
                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        </property>
</configuration>

 

 

posted @ 2019-11-14 16:04  hot小热  阅读(537)  评论(0编辑  收藏  举报