CentOS7下安装配置 Hadoop 2.8.x, JDK安装, 免密码登录, Hadoop Java示例程序运行

01_note_Hadoop的源起与体系介绍;实施Hadoop集群;CDH家族

解压tar包安装JDK以及环境变量配置

      tar -xzvf jdkxxx.tar.gz to /usr/app/ (自定义app用来存放安装后的app)

                java -version 查看目前系统java版本以及环境

                rpm -qa | grep java 查看安装包以及依赖

                yum -y remove xxxx (删除grep出来的每一个包)

                配置环境变量 /etc/profile,配置完之后启用配置source /etc/profile

                Note:

         关闭防火墙以避免意外报错(CentOS7使用systemctl命令)

                                sudo service iptables stop

                                检查一下状态 sudo chkconfig iptables off

         chkconfig 检查所有服务情况

主机互访问免密码设置 (理解免密码访问方向,copy RSA from A to B, A can access B withou pwd)

                ssh-keygen -t rsa

                scp .ssh/id_rsa.pub henry@10.0.0.12:/home/henry/.ssh/authorized_keys

                ssh-copy-id -i id_rsa.pub -p 22 henry@10.0.0.x (copy多个key到authorized_keys实现多台无密码访问)

                id_rsa >>  私钥

                id_rsa_pub >>  公钥

                本机的公钥也需要放进去

                hostname 和 IP免密码可能不通用,两种都尝试一下是否成功

                Note:

        scp远程copy: scp -r /usr/jdk1.8.0 henry@10.0.0.12:/usr/ (-r附带copy文件夹内容)

        注意Permission denied情况,如果直接用普通用户写到没有写入权限的path比如/usr,会报错,解决办法是用

        root@xxxx写入或者写到\home\user下

        /etc/host 解析主机名为IP地址(需要配置所有相关的master/slave主机)

 

Install hadoop and configure

                download hadoop-2.8.0.tar.gz and extract to /home/henry

                cd /etc/hadoop (for version 2.8; /config for version 1.x)

                参照2.x:

         http://www.cnblogs.com/edisonchou/p/5929094.html

                                http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html  

                vi hadoop-env.sh file to configure java home

                vi core-site.xml

                vi hdfs-site.xml

                vi yarn-site.xml

                vi mapred-site.xml (需要首先重新copy一个:cp mapred-site.xml.template mapred-site.xml)

                vi master (配置namenode主机名,2.x中可能只需要配置slaves )

                vi slaves (配置datanode主机名,注意删除localhost,否则master本身也会作为datanode)

                sudo vi /etc/profile 配置hadoop_home

                hadoop namenode -format

                启动HADOOP尝试

                                sbin/start-dfs.sh  可能会需要输入yes继续,注意要回到$才可以

                                sbin/start-yarn.sh

                验证是否启动成功: /usr/jdkxxx/bin/jps (java相关的进程统计,java process state)

                web界面访问http://10.0.0.11:50070,或者配置host之后用http://master.henry:50070访问

                Note:

         注意需要关闭防火墙systemctl stop firewalld.service (CentOS7)

                                开机禁用防火墙systemctl disable firewalld.service (CentOS7)

                                查看防火墙状态systemctl status firewalld.service (CentOS7)

         env - 列出环境变量配置情况

 

配置Hadoop2.8.1编译环境 - http://blog.csdn.net/bjtudujunlin/article/details/50728581

                yum intall svn (!!!注意宿主防火墙可能导致部分网络请求失败,例如Avast)

                yum install autoconf automake libtool cmake依赖包

                download and install maven

                download and install google protoc (需要编译 指定编译后的路径./configure --prefix=/usr/app/protoc)

                config /etc/profile

                mvn -v ok

                protoc --version  ok

    

      svn下载源码编译hadoop

                mvn package -DskipTests -Pdist,native,docs -Dtar (-Dtar 附带生成一个.tar 安装包)

      svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/  (hadoop trunk 或/common/tags/x.x.x for old

      version)

                pom.xml文件中定义了编译之后的存放目录

                <outputDirectory>hadoop-dist/target</outputDirectory>

Hadoop1.2.1实验环境运行sample算法

                mkdir input

                echo "Hello world!" > test1.txt  (echo标准输出到屏幕,> 输出到文件)

                echo "Hello hadoop!" > test2.txt

                hadoop_home/bin/hadoop fs -ls  (查看hdfs下文件)

                hadoop_home/bin/hadoop fs -put ../input ./in (hdfs根目录./就是/user/henry)

                hadoop_home/bin/hadoop fs -ls

                hadoop_home/bin/hadoop fs -ls ./in/*

                hadoop_home/bin/hadoop fs -cat ./in/test1.txt  (试验ok没问题说明hdfs上传ok)

                bin/hadoop jar hadoop-example-1.2.1.jar wordcount in out (hadoop内置的sample计算单词数)

                hadoop_home/bin/hadoop fs -ls   (产生了out文件夹)

                hadoop_home/bin/hadoop fs -ls ./out

                hadoop_home/bin/hadoop fs -cat ./out/part-xxx (成功运行一个MapReduce job)

                Note:

                                (如果报错: org.apache.hadoop.mapred.SafeModeException: JobTracker is in safe mode, 关闭安全模式)

                                hadoop dfsadmin -safemode leave

               

Hadoop2.8.1实验环境运行sample算法  

    Note:

         貌似先做个MapReduce sample (例如hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-

         examples-2.8.1.jar pi 5 5)才可以使用hadoop fs -xxx命令, 例如-ls,-put...

                                hadoop fs 根目录是/user/henry

                                (如果报错(could only be replicated to 0 nodes, instead of 1))

                                !!!防火墙防火墙,systemctl disable firewalld.service, 但是开机重启才会生效禁用,需要先stop       

         http://localhost:50070/dfshealth.jsp  for 1.2.1

         http://localhost:50070/dfshealth.html for 2.x (可以查看文件系统hdfs文件)

                   http://localhost:50030/jobtracker.jsp  for 1.2.1

         hadoop fs -put(上传)/-get(下载)/-rm/-rmr/   (2.x.x 命令为hdfs dfs -xxx)

         hdfs dfsadmin -report (获取hdfs基本统计信息)

    目前搭建的测试环境

         master-namenode

         slave01-datanode1

         slave02-datanode2

posted @ 2017-09-28 10:56  手擀面炒饭  阅读(192)  评论(0编辑  收藏  举报