搭建hadoop

搭建hadoop

总体架构

官方文档

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

 

1、软件下载

  java: java version "1.8.0_141"

  hadoop: https://hadoop.apache.org/

    下载了hadoop-3.2.0.tar.gz

2、环境准备

  在三台机器上都需要操作

  1.创建hadoop用户

  2.su - hadoop 在第一台上SSH生成密钥,将公钥地址复制到三台的/home/hadoop/.ssh/authorized_keys内

  3.写入hosts

$ cat /etc/hosts
172.17
.5.204 hadoop1 172.17.6.162 node-0001 172.17.5.173 node-0002

  4.写入Hadoop环境变量

$ cat /etc/profile
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
$ source /etc/profile

  

3.安装配置

  将Hadoop软件包上传至第一台机器/opt下,解压

[hadoop@ecs-6735-0328886 hadoop]$ cd /opt/hadoop/etc/hadoop
[hadoop@ecs-6735-0328886 hadoop]$ ls
capacity-scheduler.xml            kms-log4j.properties
configuration.xsl                 kms-site.xml
container-executor.cfg            log4j.properties
core-site.xml                     mapred-env.cmd
hadoop-env.cmd                    mapred-env.sh
hadoop-env.sh                     mapred-queues.xml.template
hadoop-metrics2.properties        mapred-site.xml
hadoop-policy.xml                 shellprofile.d
hadoop-user-functions.sh.example  ssl-client.xml.example
hdfs-site.xml                     ssl-server.xml.example
httpfs-env.sh                     user_ec_policies.xml.template
httpfs-log4j.properties           workers
httpfs-signature.secret           yarn-env.cmd
httpfs-site.xml                   yarn-env.sh
kms-acls.xml                      yarnservice-log4j.properties
kms-env.sh                        yarn-site.xml

  配置Java

$ cat hadoop-env.sh | grep JAVA_HOME
export JAVA_HOME="/opt/jdk"

  验证

$ hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.2.0.jar

  配置文件

  core-site.xml

$ cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop1:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmp</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
</configuration>

  hdfs-site.xml

$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop1:50090</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/hadoop/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>

  workers

$ cat workers 
hadoop1
node-0001
node-0002

 

4、分发配置

  将配置好的包分发到另外两台机器上

$ scp -r /opt/hadoop hadoop@node-0001:/opt
$ scp -r /opt/hadoop hadoop@node-0002:/opt

 

5、启动

  执行HDFS的格式化命令来格式化工作空间

$ hdfs namenode -format

  启动

$ cd /opt/hadoop/sbin
$ ./start-dfs.sh
Starting namenodes on [hadoop1]
Starting datanodes
Starting secondary namenodes [hadoop1]
$ ./start-yarn.sh 
Starting resourcemanager
Starting nodemanagers

  检查进程

$ jps
721 ResourceManager
450 SecondaryNameNode
876 NodeManager
32524 NameNode
18730 DataNode
10814 Jps
$ jps
23232 DataNode
13222 NodeManager
5979 Jps

  关闭

$ stop-all.sh

 

6、测试

  测试上传文件到hdfs

[hadoop@ecs-6735-0328886 ~]$ touch cst_test
[hadoop@ecs-6735-0328886 ~]$ echo > cst_test << EOF
> hello world!!
> It's a lovely day!
> EOF
[hadoop@ecs-6735-0328886 ~]$ hdfs dfs -mkdir /test1
[hadoop@ecs-6735-0328886 ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2022-06-22 15:31 /test1
[hadoop@ecs-6735-0328886 ~]$ hdfs dfs -put ./cst_test /test1
[hadoop@ecs-6735-0328886 ~]$ 
[hadoop@ecs-6735-0328886 ~]$ hdfs dfs -ls /test1
Found 1 items
-rw-r--r--   3 hadoop supergroup          1 2022-06-22 15:32 /test1/cst_test
[hadoop@ecs-6735-0328886 ~]$ 

 

posted @ 2022-06-22 16:34  Justtosee  阅读(82)  评论(0编辑  收藏  举报