Hadoop(一)MapReduce demo

Mapreduce基础编程模型:将一个大任务拆分成一个个小任务,再进行汇总。
MapReduce是分两个阶段:map阶段:拆;reduce阶段:聚合。

hadoop环境安装

    安装:
1、解压 : tar -zxvf hadoop-2.4.1.tar.gz -C /root/training/
2、设置环境变量: vi ~/.bash_profile
		HADOOP_HOME=/root/training/hadoop-2.7.3
		export HADOOP_HOME

		PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
		export PATH
		
	生效环境变量: source ~/.bash_profile

    第一节:Hadoop的目录结构

第二节:Hadoop的本地模式
	1、特点:不具备HDFS,只能测试MapReduce程序
	2、修改hadoop-env.sh(echo $JAVA_HOME查出jdk安装路径:xx,将export JAVA_HOME=${JAVA_HOME}替换成export JAVA_HOME=xx)
	
	   修改第25行:export JAVA_HOME=/usr/java/jdk8u202-b08(行号可通过:esc后再set number来显示)
	   
	3、演示Demo: $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
		命令:hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount ~/data/hadoop/input/test.txt ~/data/hadoop/output/wc
		日志:19/09/16 10:45:00 INFO mapreduce.Job:  map 100% reduce 100%
	        结果查看:
                                    cd  ~/data/hadoop/output/
                                    ls


(前者是运行的结果集,后者是执行程序的状态)
more part-r-00000

		注意:MR有一个默认的排序规则

第三节:Hadoop的伪分布模式
	1、特点:具备Hadoop的所有功能,在单机上模拟一个分布式的环境
	         (1)HDFS:主:NameNode,数据节点:DataNode
			 (2)Yarn:容器,运行MapReduce程序
			            主节点:ResourceManager
						从节点:NodeManager
						
	2、步骤:
	(1)hdfs-site.xml
		<!--配置HDFS的冗余度-->
		<property>
		  <name>dfs.replication</name>
		  <value>1</value>
		</property>

		<!--配置是否检查权限-->
		<property>
		  <name>dfs.permissions</name>
		  <value>false</value>
		</property>	

	(2)core-site.xml
		<!--配置HDFS的NameNode-->
		<property>
		  <name>fs.defaultFS</name>
		  <value>hdfs://192.168.88.11:9000</value>
		</property>

		<!--配置DataNode保存数据的位置-->
		<property>
		  <name>hadoop.tmp.dir</name>
		  <value>/root/training/hadoop-2.7.3/tmp</value>
		</property>		
		
		
	(3) mapred-site.xml
		<!--配置MR运行的框架-->
		<property>
		  <name>mapreduce.framework.name</name>
		  <value>yarn</value>
		</property>		
		
	(4) yarn-site.xml
		<!--配置ResourceManager的地址-->
		<property>
		  <name>yarn.resourcemanager.hostname</name>
		  <value>192.168.88.11</value>
		</property>

		<!--配置NodeManager执行任务的方式-->
		<property>
		  <name>yarn.nodemanager.aux-services</name>
		  <value>mapreduce_shuffle</value>
		</property>		
		
	(5) 格式化NameNode
	    hdfs namenode -format
		日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
		
		
	(6) 启动:start-all.sh
	           (*) HDFS: 存储数据
			   (*) Yarn:执行计算
			   
	(7) 访问:(*)命令行
	                (*)Java API
			(*)Web Console:
					HDFS:http://192.168.88.11:50070
					Yarn:http://192.168.88.11:8088

到这里已经能够通过外部访问了

web console无法通过http://ip:port访问服务页面问题排查

原文出自(https://blog.csdn.net/hanwenshan123/article/details/78717782)

问题1:hdfs-site.xml配置项

            通过jps命令查看java进程的状态,HADOOP相关的进程运行正常。(jps是jdk提供的一个查看当前java进程的小工具, 可以看做是JavaVirtual Machine Process Status Tool的缩写)
            [root@node4 ~]# jps
                25059 SecondaryNameNode
                25347 ResourceManager
                25556 NodeManager
                24805 DataNode
                29269 Jps
                24633 NameNode
            通过netstat命令查看网络端口服务情况,发现local address列给出的ip地址除了127.0.0.1就是0.0.0.0,这些本地有效的地址,是无法对外提供服务的,这才是问题的关键。
                  [root@node4 ~]# netstat -ntlp
                        Active Internet connections (only servers)
                        Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
                        tcp        0      0 127.0.0.1:43759         0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 0.0.0.0:50070        0.0.0.0:*               LISTEN      24633/java          
                        tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      12782/sshd          
                        tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      2325/master         
                        tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      24633/java          
                        tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      25059/java          
                        tcp6       0      0 :::22                   :::*                    LISTEN      12782/sshd          
                        tcp6       0      0 127.0.0.1:8088          :::*                    LISTEN      25347/java          
                        tcp6       0      0 ::1:25                  :::*                    LISTEN      2325/master         
                        tcp6       0      0 :::13562                :::*                    LISTEN      25556/java          
                        tcp6       0      0 :::43451                :::*                    LISTEN      25556/java          
                        tcp6       0      0 127.0.0.1:8030          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8031          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8032          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8033          :::*                    LISTEN      25347/java          
                        tcp6       0      0 :::8040                 :::*                    LISTEN      25556/java          
                        tcp6       0      0 :::8042                 :::*                    LISTEN      25556/java
修改HADOOP_HOME/etc/hadoop/hdfs-site.xml文件,加入
    <property>
        <name>dfs.namenode.http-address</name>
        <value>node4:50070</value>
    </property>
    或者加入
     <property>
        <name>dfs.namenode.http-address</name>
        <value>hdfs://192.168.88.11:50070</value>
    </property>
再次用netstat -ntlp查看
    [root@node4 ~]# netstat -ntlp
    Active Internet connections (only servers)
                        Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
                        tcp        0      0 127.0.0.1:43759         0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 10.60.8.28.50070        0.0.0.0:*               LISTEN      24633/java          
                        tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      12782/sshd          
                        tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      2325/master         
                        tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      24805/java          
                        tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      24633/java          
                        tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      25059/java          
                        tcp6       0      0 :::22                   :::*                    LISTEN      12782/sshd          
                        tcp6       0      0 127.0.0.1:8088          :::*                    LISTEN      25347/java          
                        tcp6       0      0 ::1:25                  :::*                    LISTEN      2325/master         
                        tcp6       0      0 :::13562                :::*                    LISTEN      25556/java          
                        tcp6       0      0 :::43451                :::*                    LISTEN      25556/java          
                        tcp6       0      0 127.0.0.1:8030          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8031          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8032          :::*                    LISTEN      25347/java          
                        tcp6       0      0 127.0.0.1:8033          :::*                    LISTEN      25347/java          
                        tcp6       0      0 :::8040                 :::*                    LISTEN      25556/java          
                        tcp6       0      0 :::8042                 :::*                    LISTEN      25556/java

问题2:selinux

按照道理应该可以访问50070端口了,但是仍然不行。再检查selinux,发现状态是enabled。
- 查看SELINUX的状态
            [root@node4 ~]# /usr/sbin/sestatus -v
                    SELinux status:                 enabled
                    SELinuxfs mount:                /sys/fs/selinux
                    SELinux root directory:         /etc/selinux
                    Loaded policy name:             targeted
                    Current mode:                   enforcing
                    Mode from config file:          enforcing
                    Policy MLS status:              enabled
                    Policy deny_unknown status:     allowed
                    Max kernel policy version:      28

                    Process contexts:
                    Current context:                unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
                    Init context:                   system_u:system_r:init_t:s0
                    /usr/sbin/sshd                  system_u:system_r:sshd_t:s0-s0:c0.c1023

                    File contexts:
                    Controlling terminal:           unconfined_u:object_r:user_devpts_t:s0
                    /etc/passwd                     system_u:object_r:passwd_file_t:s0
                    /etc/shadow                     system_u:object_r:shadow_t:s0
                    /bin/bash                       system_u:object_r:shell_exec_t:s0
                    /bin/login                      system_u:object_r:login_exec_t:s0
                    /bin/sh                         system_u:object_r:bin_t:s0 -> system_u:object_r:shell_exec_t:s0
                    /sbin/agetty                    system_u:object_r:getty_exec_t:s0
                    /sbin/init                      system_u:object_r:bin_t:s0 -> system_u:object_r:init_exec_t:s0
                    /usr/sbin/sshd                  system_u:object_r:sshd_exec_t:s0

编辑/etc/selinux/config文件SELINUX=enforcing修改成SELINUX=disable,重启服务器。再试。修改后的selinux

    [root@node4 ~]# /usr/sbin/sestatus -v
    SELinux status:                 disabled

问题3:firewall(iptables端口开放)

关闭selinux之后,仍然无法访问页面,再查看iptables防火墙的设置

[root@node4 sbin]# firewall-cmd --state
    running
[root@node4 sbin]# firewall-cmd --get-service
    RH-Satellite-6 amanda-client amanda-k5-client bacula bacula-client bitcoin bitcoin-rpc bitcoin-testnet bitcoin-testnet-rpc ceph ceph-mon cfengine condor-collector ctdb dhcp dhcpv6 dhcpv6-client dns docker-registry dropbox-lansync elasticsearch freeipa-ldap freeipa-ldaps freeipa-replication freeipa-trust ftp ganglia-                            client ganglia-master high-availability http https imap imaps ipp ipp-client ipsec iscsi-target kadmin kerberos kibana klogin kpasswd kshell ldap ldaps libvirt libvirt-tls managesieve mdns mosh mountd ms-wbt mssql mysql nfs nrpe ntp openvpn ovirt-imageio ovirt-storageconsole ovirt-vmconsole pmcd pmproxy pmwebapi pmwebapis pop3 pop3s postgresql privoxy proxy-dhcp ptp pulseaudio puppetmaster quassel radius rpc-bind rsh rsyncd samba samba-client sane sip sips smtp smtp-submission smtps snmp snmptrap spideroak-lansync squid ssh synergy syslog syslog-tls telnet tftp tftp-client tinc tor-socks transmission-client vdsm vnc-server wbem-https xmpp-bosh xmpp-client xmpp-local xmpp-server

增加50070端口到允许,重启防火墙服务

[root@node4 sbin]# firewall-cmd --zone=public --add-port=50070/tcp --permanent
        success
[root@node4 sbin]# firewall-cmd --reload
        success

处理结果

问题4.8088端口无法访问yarn

修改yarn-site.xml文件,在<configuration></configuration>添加:
     <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.88.11:8088</value>
     </property>

集群分布模式

1.将hadoop整个安装目录拷贝到其他两台机器

scp -r /home/xxxx/hadoop XXX@hadoop02:/home/xxxx/
scp -r /home/xxxx/hadoop XXX@hadoop03:/home/xxxx/

2.修改主机上的slaves文件内容为从节点的主机名称:

 hadoop1
 hadoop2
 hadoop3
posted @ 2019-09-18 09:24  cherishDouble  阅读(1157)  评论(1编辑  收藏  举报