Step 0: 安装及启动
一、Setting up a Single Node Cluster:
http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/SingleCluster.html
1、目的:如何配置一个单节点集群,使之掌握对MR和HDFS的使用。
2、依赖软件:
JDK
SSH
3、启动前的准备:
下载二进制包,解压进入主目录
只需要修改 etc/hadoop/hadoop-env.sh 配置中两项:
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
# Assuming your installation directory is /usr/local/hadoop
#(当前hadoop的存放位置,佷重要,在自己的ubuntu下设置错误,执行hadoop命令会报找不到相关类)
export HADOOP_PREFIX=/usr/local/hadoop
执行:$ bin/hadoop,显示相关主要使用方法
4、集群的三种启动方式:
Local (Standalone) Mode
Pseudo-Distributed Mode
Fully-Distributed Mode
二、Standalone Operation 本地\单节点
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
例:统计配置文件下匹配的字符串数量
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
$ cat output/*
注:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount input output
三、Pseudo-Distributed Mode 伪分布