第一篇主要是整体的步骤,其实中间遇到很多问题,第二篇将遇到的问题全部列举下来:
1.1包不能加载警告
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop2.5.1官网上提供的已经是64位操作系统版本,但是仍然报这个错误
1.1.1测试本地库
[root@cluster3 ~]# export HADOOP_ROOT_LOGGER=DEBUG,console [root@cluster3 script]# hadoop fs -text /usr/local/script/hdfile1.txt 14/11/01 10:58:15 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: /usr/local/hadoop/hadoop-2.5.1/lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.12' not found (required by /usr/local/hadoop/hadoop-2.5.1/lib/native/libhadoop.so.1.0.0) 14/11/01 10:58:15 DEBUG util.NativeCodeLoader: java.library.path=/usr/local/hadoop/hadoop-2.5.1/lib/native 14/11/01 10:58:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/01 10:58:15 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based [root@cluster1 lib64]# ll /lib64/libc.so.6 lrwxrwxrwx 1 root root 11 Oct 31 17:27 /lib64/libc.so.6 -> libc-2.5.so
可以看到上边要求的是glibc_2.12,所以需要升级glibc(对hadoop重新编译即可,不需要升级glibc)
编译hadoop源码
2、配置本地yum源
修改yum的配置文件,使用本地ISO做yum源
创建目录 mkdir /mnt/cdrom mount /dev/cdrom /mnt/cdrom 复制到本地 cp -avf /mnt/cdrom /yum 创建文件: vi /etc/yum.repos.d/CentOS-Local.repo [Local] name=Local Yum baseurl=file:///yum/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 enabled=1
# cd /etc/yum.repos.d/ # mv CentOS-Base.repo CentOS-Base.repo.bak 禁用默认的yum 网络源 # cp CentOS-Media.repo CentOS-Media.repo.bak 是yum 本地源的配置文件 修改配置文件 # vi CentOS-Media.repo baseurl=file:///media/CentOS_6.3_Final/ enabled=1 #启用yum [root@cluster3 yum.repos.d]# yum -y install gcc
3、clone虚拟机后,修改主机名
修改主机名 修改/etc/sysconfig/network中的hostname为【修改后的主机名】 修改/etc/hosts文件中的 【原来主机名】为【修改后的主机名】 reboot,重启系统。 查看hostname ,是否修改成功
4测试程序
[root@cluster3 input]# hadoop dfs -mkdir /hadoop [root@cluster3 input]# hadoop dfs -mkdir /hadoop/input [root@cluster3 hadoop-2.5.1]# hadoop dfs -put /usr/local/hadoop/hadoop-2.5.1/test/text1.txt /hadoop/input [root@cluster3 hadoop-2.5.1]# hadoop dfs -put /usr/local/hadoop/hadoop-2.5.1/test/text2.txt /hadoop/input DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. [root@cluster3 hadoop-2.5.1]# hadoop jar /usr/local/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /hadoop/input/* /hadoop/output 14/11/06 15:44:51 INFO client.RMProxy: Connecting to ResourceManager at cluster3/192.168.220.63:8032 14/11/06 15:44:52 INFO input.FileInputFormat: Total input paths to process : 2 14/11/06 15:44:52 INFO mapreduce.JobSubmitter: number of splits:2 14/11/06 15:44:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415259711375_0001 14/11/06 15:44:53 INFO impl.YarnClientImpl: Submitted application application_1415259711375_0001 14/11/06 15:44:53 INFO mapreduce.Job: The url to track the job: http://cluster3:8088/proxy/application_1415259711375_0001/ 14/11/06 15:44:53 INFO mapreduce.Job: Running job: job_1415259711375_0001 14/11/06 15:45:04 INFO mapreduce.Job: Job job_1415259711375_0001 running in uber mode : false 14/11/06 15:45:04 INFO mapreduce.Job: map 0% reduce 0% 14/11/06 15:45:57 INFO mapreduce.Job: map 100% reduce 0% 14/11/06 15:46:17 INFO mapreduce.Job: map 100% reduce 100% 14/11/06 15:46:18 INFO mapreduce.Job: Job job_1415259711375_0001 completed successfully 14/11/06 15:46:18 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=55 FILE: Number of bytes written=291499 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=241 HDFS: Number of bytes written=25 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=106968 Total time spent by all reduces in occupied slots (ms)=9679 Total time spent by all map tasks (ms)=106968 Total time spent by all reduce tasks (ms)=9679 Total vcore-seconds taken by all map tasks=106968 Total vcore-seconds taken by all reduce tasks=9679 Total megabyte-seconds taken by all map tasks=109535232 Total megabyte-seconds taken by all reduce tasks=9911296 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=41 Map output materialized bytes=61 Input split bytes=216 Combine input records=4 Combine output records=4 Reduce input groups=3 Reduce shuffle bytes=61 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=1085 CPU time spent (ms)=3400 Physical memory (bytes) snapshot=502984704 Virtual memory (bytes) snapshot=2204106752 Total committed heap usage (bytes)=257171456 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=25 File Output Format Counters Bytes Written=25 [root@cluster3 hadoop-2.5.1]# hadoop dfs -ls /hadoop/ DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 2 items drwxr-xr-x - root supergroup 0 2014-11-06 15:44 /hadoop/input drwxr-xr-x - root supergroup 0 2014-11-06 15:46 /hadoop/output [root@cluster3 hadoop-2.5.1]# hadoop dfs -cat /hadoop/output/part-r-00000 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. hadoop 1 hello 2 world 1
5.连接失败
[root@cluster3 hadoop-2.5.1]# hadoop jar /usr/local/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /hadoop/input/* /hadoop/output 14/11/06 11:28:15 INFO client.RMProxy: Connecting to ResourceManager at cluster3/192.168.220.63:8032 java.net.ConnectException: Call From cluster3/192.168.220.63 to cluster3:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 解决办法: namenode未启动
6.没有datanode
14/11/06 09:39:10 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hadoop/input/text1.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791) 解决办法: 由于执行了多次hdfs namenode -format 需要手动清除下name和data数据
7.数据丢失危险
2014-11-06 10:20:14,903 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories! 2014-11-06 10:20:14,903 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories! 通过在dfs.namenode.name.dir和dfs.datanode.data.dir设置多个挂载在不同物理硬盘或者NFS挂载的目录即可
8.http://192.168.220.63:50070访问不了,NodeManager启动一下,过一会就没了。
关闭防火墙服务
[root@cluster3 hadoop]# service iptables stop
关闭开机自动启动
[root@cluster3 hadoop]# chkconfig iptables off