数据仓库_hadoop(1)
1.安装hadoop的hdfs伪分布式部署
2.hadoop fs常规命令
3.配置文件在官方哪里找
4.整理 jdk、ssh、hosts文件
1.安装hadoop的hdfs伪分布式部署
1.1 创建用户和目录
[root@aliyun ~]# useradd hadoop [root@aliyun ~]# su - hadoop [hadoop@aliyun ~]$ mkdir app software sourcecode log tmp data lib [hadoop@aliyun ~]$ ll total 28 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 app #解压的文件夹 软连接 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 data #数据 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 lib #第三方的jar drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 log #日志文件夹 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 software #压缩包 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 sourcecode #源代码编译 drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 tmp #临时文件夹
1.2下载/上传压缩包
[hadoop@aliyun ~]$ cd software/ [hadoop@aliyun software]$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz
1.3 解压
[hadoop@aliyun software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/ ... ... ... [hadoop@aliyun software]$ cd ../app/ [hadoop@aliyun app]$ ln -s hadoop-2.6.0-cdh5.16.2/ hadoop [hadoop@aliyun app]$ ll total 4 lrwxrwxrwx 1 hadoop hadoop 23 Nov 28 11:36 hadoop -> hadoop-2.6.0-cdh5.16.2/ drwxr-xr-x 14 hadoop hadoop 4096 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
1.4环境要求
[root@aliyun java]# mkdir /usr/java [root@aliyun java]# cd /usr/java [root@aliyun java]# rz -E [root@aliyun java]# tar -xzvf jdk-8u144-linux-x64.tar.gz [root@aliyun java]# chown -R root:root jdk1.8.0_144/ [root@aliyun java]# ln -s jdk1.8.0_144/ jdk [root@aliyun java]# ll total 4 lrwxrwxrwx 1 root root 13 Nov 28 12:01 jdk -> jdk1.8.0_144/ drwxr-xr-x 8 root root 4096 Jul 22 2017 jdk1.8.0_144 [root@aliyun java]# vim /etc/profile #env export JAVA_HOME=/usr/java/jdk export PATH=$JAVA_HOME/bin:$PATH [root@aliyun java]# source /etc/profile [root@aliyun java]# which java /usr/java/jdk/bin/java
1.5 JAVA_HOME 显性配置
[hadoop@aliyun hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk
[root@aliyun java]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.16.39.48 aliyun
1.6配置文件
etc/hadoop/core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://aliyun:9000</value> </property> </configuration> etc/hadoop/hdfs-site.xml: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
1.7 ssh无密码信任关系
家目录下输入 $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys [hadoop@aliyun ~]$ ssh aliyun date Thu Nov 28 12:15:08 CST 2019
1.8 环境变量 hadoop
[hadoop@aliyun ~]$ vi .bashrc export HADOOP_HOME=/home/hadoop/app/hadoop export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH [hadoop@aliyun ~]$ source .bashrc [hadoop@aliyun ~]$ which hadoop ~/app/hadoop/bin/hadoop
1.9 格式化
[hadoop@aliyun ~]$ hdfs namenode -format
has been successfully formatted.
1.10 第一次启动
[hadoop@aliyun ~]$ start-dfs.sh [hadoop@aliyun ~]$ jps 10804 SecondaryNameNode 10536 NameNode 10907 Jps 10654 DataNode [hadoop@aliyun ~]$
坑:第一次启动会输入yes确定信任关系,我们打开./ssh下的known_hosts文件,这个文件中存放信任关系
[hadoop@aliyun .ssh]$ cat known_hosts aliyun,172.16.39.48 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM= localhost ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM= 0.0.0.0 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM=
将来也许在启动hadoop的时候一直要输入密码,就是这里面已经存在了主机的信任关系,但是密匙对是新的,删除这个文件或者内容即可
1.11 DN SNN都以 ruozedata001启动
NN:core-site.xml fs.defaultFS控制
DN: slaves文件
2NN:hdfs-site.xml
<property> <name>dfs.namenode.secondary.http-address</name> <value>aliyun:50090</value> #注意端口号,新旧版本有区别 </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>aliyun:50091</value> #注意端口号,新旧版本有区别 </property>
2.hadoop fs常规命令
hadoop fs -mkdir / hadoop fs -put hadoop fs -get hadoop fs -cat hadoop fs -rm hadoop fs -ls
3.配置文件在官方哪里找
4.整理 jdk、ssh、hosts文件
jdk和ssh是hadoop运行的先决条件
hosts文件存放主机名和ip地址的映射
学习中,博客都是自己学习用的笔记,持续更新改正。。。