hadoop2.4.1伪分布式环境搭建
注意:所有的安装用普通哟用户安装,所以首先使普通用户可以以sudo执行一些命令:
0.虚拟机中前期的网络配置参考:
http://www.cnblogs.com/qlqwjy/p/7783253.html
1.赋予hadoop用户以sudo执行一些命令
visodo
或者
vim /etc/sudoers
添加下面第二行内容:
登录hadoop用户查看命令:
[hadoop@localhost java]$ sudo -l #查看当前用户可以以sudo命令执行哪些命令 Matching Defaults entries for hadoop on this host: requiretty, !visiblepw, always_set_home, env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS", env_keep+="MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE", env_keep+="LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES", env_keep+="LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE", env_keep+="LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY", secure_path=/sbin\:/bin\:/usr/sbin\:/usr/bin User hadoop may run the following commands on this host: (ALL) ALL
------------------------安装hadoop运行环境,切换到hadoop用户----------------------
我所有的文件上传采用的sftp,建议安装git工具自带ssh和sftp等。注意自己的linux位数,我刚开始安装的64位JDK,结果linux是32位,JDK不能用
查看位数:
uname -a
或者
getconf LONG_BIT
1.安装JDK
(1)上传到服务器之后解压
sudo tar -zxvf ./jdk-7u65-linux-i586.tar.gz
(2)查看当前安装目录:
[hadoop@localhost jdk1.7.0_65]$ pwd /opt/java/jdk1.7.0_65
(3)配置环境变量 ;
[hadoop@localhost jdk1.7.0_65]$ tail -4 ~/.bashrc export JAVA_HOME=/opt/java/jdk1.7.0_65 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${PATH}
重新加载环境变量:
[hadoop@localhost jdk1.7.0_65]$ source ~/.bashrc
(4)执行java或者javac测试:
[hadoop@localhost jdk1.7.0_65]$ java -vsersion Unrecognized option: -vsersion Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. [hadoop@localhost jdk1.7.0_65]$ javac -version javac 1.7.0_65
2. 安装hadoop2.4.1
(1)将文件上传到服务器
sftp> put hadoop-2.4.1.tar.gz
(2)解压
sudo tar -zxvf ./hadoop-2.4.1.tar.gz
(3)解压后查看目录:
[hadoop@localhost hadoop-2.4.1]$ ls bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
其中java相关的jar包存放在share目录,下面还有个docs目录,没啥用,删掉就行了。
bin是可执行文件
etc是hadoop是相关配置文件
lib,libexec是相关的本地服务
sbin是hadoop的管理执行文件
(4)修改配置文件:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop
- 修改:hadoop-env.sh(设置JDK环境变量)
#第27行
export JAVA_HOME=/opt/java/jdk1.7.0_65
- 修改:core-site.xml
<!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <!-- 指定hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoop-2.4.1/data/</value> </property>
- 修改hdfs-site.xml hdfs-default.xml
<!-- 指定HDFS副本的数量 --> <property> <name>dfs.replication</name> <value>1</value> </property>
-
- 修改 mapred-site.xml (mapreduce)
首先将mapred-site.xml.template改名字为mapred-site.xml。否则hadoop不会读取
[hadoop@localhost hadoop]$ sudo mv ./mapred-site.xml.template ./mapred-site.xml
修改:
<!-- 指定mapreduce运行在yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
-
- 修改 yarn-site.xml (修改yarn)
<!-- 指定YARN的老大(ResourceManager)的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> </property> <!-- reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
(5)关闭linux的防火墙:
[root@localhost ~]# service iptables stop #关闭防火墙 iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] [root@localhost ~]# ls anaconda-ks.cfg install.log install.log.syslog [root@localhost ~]# service iptables status #查看iptables状态 iptables: Firewall is not running.
3.启动hadoop与测试hadoop
(1)前期准备
- 首先将hadoop添加到环境变量,便于在任意目录使用hadoop的命令:
export JAVA_HOME=/opt/java/jdk1.7.0_65 export HADOOP_HOME=/opt/hadoop/hadoop-2.4.1 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
- 格式化namenode(是对namenode进行初始化)
hdfs namenode -format (hadoop namenode -format)
执行命令之后会在我们的配置的hadoop的临时目录下面创建 dfs/name/current/ 目录并且写入四个文件:
[root@localhost data]# ll ./dfs/name/current/ total 16 -rw-r--r--. 1 root root 351 Apr 11 02:51 fsimage_0000000000000000000 -rw-r--r--. 1 root root 62 Apr 11 02:51 fsimage_0000000000000000000.md5 -rw-r--r--. 1 root root 2 Apr 11 02:51 seen_txid -rw-r--r--. 1 root root 202 Apr 11 02:51 VERSION
(2)启动hadoop(最好设置ssh秘钥登录,否则会输入多次密码,可以自己写个shell脚本调用hdfs和yarn两个ssh脚本)
- 启动HDFS
先启动HDFS,到hadoop安装目录下: /opt/hadoop/hadoop-2.4.1/sbin
sbin/start-dfs.sh
验证是否启动成功
[root@localhost sbin]# jps 664 SecondaryNameNode 803 Jps 500 DataNode 422 NameNode
解释: 上面启动hadoop的时候会读取启动localhost的Namenode,因为hadoop的安装目录下的etc下有个slaves文件,指定从哪些机器启动Namenode
如果搭建多个节点需要在下面的配置文件增加节点,正规的分布式集群
[root@localhost hadoop]# cat ./slaves
localhost
- 启动yarn
[root@localhost sbin]# ./start-yarn.sh
再次查看:
[root@localhost sbin]# jps 1154 NodeManager 882 ResourceManager 664 SecondaryNameNode 500 DataNode 1257 Jps 422 NameNode
(3)测试上面启动的hdfs和yarn
http://192.168.2.136:50070 (HDFS管理界面)
http://192.168.2.136:8088 (MR管理界面)
- 测试hdfs
我们也可以通过网页浏览hafs文件:
首先我们上传一个文件:
[root@localhost ~]# ll total 60 -rw-------. 1 root root 2388 Sep 9 2013 anaconda-ks.cfg -rw-r--r--. 1 root root 37667 Sep 9 2013 install.log -rw-r--r--. 1 root root 9154 Sep 9 2013 install.log.syslog [root@localhost ~]# hadoop fs -put install.log hdfs://localhost:9000/ #将当前目录下的install.log上传到hsfs的根目录下
接下来我们再次查看数据会发现:
点开也可以下载文件:
我们在本地删掉install.log然后从hdfs中下载文件:
[root@localhost ~]# rm -rf ./install.log #删除文件 [root@localhost ~]# ls anaconda-ks.cfg install.log.syslog
[root@localhost ~]# hadoop fs -get hdfs://localhost:9000/install.log #hadoop下载文件
[root@localhost ~]# ls
anaconda-ks.cfg install.log install.log.syslo
- 测试mapreduce
由于我们没有编写mapreduce程序,所以我们需要利用hadoop自带的一些程序进行测试,下面测试一个求PI的值和一个统计单词出现次数的mapreduce程序
进入到hadoop的mapreduce目录下:
[root@localhost mapreduce]# pwd /opt/hadoop/hadoop-2.4.1/share/hadoop/mapreduce
例一:计算求pi值的mapreduce程序
[root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar pi 5 5 #执行求pi值的mapreduce,开启5个map,每个map取样5个 Number of Maps = 5 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 18/04/11 03:54:52 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 18/04/11 03:54:53 INFO input.FileInputFormat: Total input paths to process : 5 18/04/11 03:54:53 INFO mapreduce.JobSubmitter: number of splits:5 18/04/11 03:54:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0001 18/04/11 03:54:56 INFO impl.YarnClientImpl: Submitted application application_1523441540916_0001 18/04/11 03:54:56 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0001/ 18/04/11 03:54:56 INFO mapreduce.Job: Running job: job_1523441540916_0001 18/04/11 03:55:26 INFO mapreduce.Job: Job job_1523441540916_0001 running in uber mode : false 18/04/11 03:55:26 INFO mapreduce.Job: map 0% reduce 0% 18/04/11 03:57:27 INFO mapreduce.Job: map 40% reduce 0% 18/04/11 03:57:31 INFO mapreduce.Job: map 80% reduce 0% 18/04/11 03:57:32 INFO mapreduce.Job: map 100% reduce 0% 18/04/11 03:57:57 INFO mapreduce.Job: map 100% reduce 100% 18/04/11 03:57:58 INFO mapreduce.Job: Job job_1523441540916_0001 completed successfully 18/04/11 03:58:00 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=116 FILE: Number of bytes written=559767 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1315 HDFS: Number of bytes written=215 HDFS: Number of read operations=23 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=5 Launched reduce tasks=1 Data-local map tasks=5 Total time spent by all maps in occupied slots (ms)=633857 Total time spent by all reduces in occupied slots (ms)=17751 Total time spent by all map tasks (ms)=633857 Total time spent by all reduce tasks (ms)=17751 Total vcore-seconds taken by all map tasks=633857 Total vcore-seconds taken by all reduce tasks=17751 Total megabyte-seconds taken by all map tasks=649069568 Total megabyte-seconds taken by all reduce tasks=18177024 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=90 Map output materialized bytes=140 Input split bytes=725 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=140 Reduce input records=10 Reduce output records=0 Spilled Records=20 Shuffled Maps =5 Failed Shuffles=0 Merged Map outputs=5 GC time elapsed (ms)=21046 CPU time spent (ms)=17350 Physical memory (bytes) snapshot=619728896 Virtual memory (bytes) snapshot=2174615552 Total committed heap usage (bytes)=622153728 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=590 File Output Format Counters Bytes Written=97 Job Finished in 188.318 seconds Estimated value of Pi is 3.68000000000000000000 #计算结果
例二:一个wordcount的mapreduce(给一篇英文文章,会统计每个单词出现的次数)
(1)编辑一个英文文件
[root@localhost mapreduce]# cat ./test.txt hello lll hello kkk hello meinv hello
(2)为了计算我们需要将文件上传到hdfs中
先在hdfs中建一个目录:(两种创建目录的方式)
[root@localhost mapreduce]# hadoop fs -mkdir hdfs://localhost:9000/wordcount #第一种 [root@localhost mapreduce]# hadoop fs -mkdir /wordcount/input #第二种。/是相对于hdfs的根目录
然后我们可以在hdfs的web管理中看到目录:(其中tmp和user是我们执行上一个程序产生的目录)
接下来我们将上面的英文文件上传到hdfs的wordcount/input/目录下
[root@localhost mapreduce]# hadoop fs -put test.txt /wordcount/input
从web中查看目录;
测试wordcount程序:(mapreduce启动很慢,因为要启动很多程序)
测试统计hdfs的/wordcount/input目录下的所有的文件,并将统计结果输出到/wordcount/output目录中,/是hdfs的根目录
[root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount /wordcount/input /wordcount/output 18/04/11 04:09:58 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 18/04/11 04:10:00 INFO input.FileInputFormat: Total input paths to process : 1 18/04/11 04:10:00 INFO mapreduce.JobSubmitter: number of splits:1 18/04/11 04:10:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0002 18/04/11 04:10:02 INFO impl.YarnClientImpl: Submitted application application_1523441540916_0002 18/04/11 04:10:02 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0002/ 18/04/11 04:10:02 INFO mapreduce.Job: Running job: job_1523441540916_0002 18/04/11 04:10:22 INFO mapreduce.Job: Job job_1523441540916_0002 running in uber mode : false 18/04/11 04:10:22 INFO mapreduce.Job: map 0% reduce 0% 18/04/11 04:10:36 INFO mapreduce.Job: map 100% reduce 0% 18/04/11 04:10:48 INFO mapreduce.Job: map 100% reduce 100% 18/04/11 04:10:49 INFO mapreduce.Job: Job job_1523441540916_0002 completed successfully 18/04/11 04:10:50 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=185961 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=150 HDFS: Number of bytes written=28 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=11652 Total time spent by all reduces in occupied slots (ms)=9304 Total time spent by all map tasks (ms)=11652 Total time spent by all reduce tasks (ms)=9304 Total vcore-seconds taken by all map tasks=11652 Total vcore-seconds taken by all reduce tasks=9304 Total megabyte-seconds taken by all map tasks=11931648 Total megabyte-seconds taken by all reduce tasks=9527296 Map-Reduce Framework Map input records=4 Map output records=7 Map output bytes=66 Map output materialized bytes=50 Input split bytes=111 Combine input records=7 Combine output records=4 Reduce input groups=4 Reduce shuffle bytes=50 Reduce input records=4 Reduce output records=4 Spilled Records=8 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=609 CPU time spent (ms)=3400 Physical memory (bytes) snapshot=218648576 Virtual memory (bytes) snapshot=725839872 Total committed heap usage (bytes)=137433088 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=39 File Output Format Counter
查看hdfs的/wordcount/output目录下的文件信息:
[root@localhost mapreduce]# hadoop fs -ls /wordcount/output 查看目录信息 Found 2 items -rw-r--r-- 1 root supergroup 0 2018-04-11 04:10 /wordcount/output/_SUCCESS -rw-r--r-- 1 root supergroup 28 2018-04-11 04:10 /wordcount/output/part-r-00000
查看统计结果文件信息:
[root@localhost mapreduce]# hadoop fs -cat /wordcount/output/part-r-00000 hello 4 kkk 1 lll 1 meinv 1
也可以从web中下载查看: