安装hadoop
1、环境变量
export JAVA_HOME=/root/soft/jdk8
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export HADOOP_HOME=/root/soft/hdp312
export PATH=$PATH:$MAVEN_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
2、三种方式启动集群
3、实操
hadoop 命令启动驱动时,程序会利用Configuration API来加载默认配置目录下下的配置文件
所以可以指定-conf 来覆盖默认配置比如连接信息
Standalone Operation
拷贝安装目录中配置/etc/hadoop到一个新的目录devconf/hadoop/,修改其中的配置文件为本地运行模式
core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>file:///</value> </property> </configuration>
mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>local</value> </property> </configuration>
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input $ cp etc/hadoop/*.xml input
export HADOOP_CONF_DIR=/root/soft/hdp312/devconf/hadoop/
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep input output 'dfs[a-z.]+' $ cat output/*
注意:export HADOOP_CONF_DIR=/root/soft/hdp312/devconf/hadoop/
Pseudo-Distributed Operation
改变HADOOP_CONF_DIR指向伪分布式配置
etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
<property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
注意:workers中添加数据节点
hdfs dfs -mkdir /input
hdfs dfs -put etc/hadoop/*.xml input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep /input output 'dfs[a-z.]+'#执行作业按照伪分布式方式,在hdfs上运行