安装准备:

1.安装配置java1.8.0_141环境

2.增加master节点地址映射

vim /etc/profile

追加如下内容:

127.0.0.1 master

127.0.0.1 iZuf6hxhy307mpxxtvmtb3Z 

iZuf6hxhy307mpxxtvmtb3Z 是我的阿里云服务器的主机名,防止出现异常:SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException

下载、安装、配置Hadoop2.7.5

下载:

wget -c http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

安装解压:

mkdir /opt/hadoop/
tar –zxvf hadoop-2.7.5.tar.gz -C /opt/hadoop/

配置hadoop单机模式:

vim /etc/profile
#将以下内容追加到/etc/profile文件中
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5 
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin  
export PATH=$PATH:$HADOOP_HOME/lib  
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
#更新配置文件
source /etc/profile

#修改hadoop-env.sh文件中的JAVA_HOME环境变量:
vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#追加如下内容
export JAVA_HOME=/opt/jdk/jdk1.8.0_141/
#更新hadoop-env.sh文件
source $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#查看hadoop版本:
hadoop version
#若能正确显示hadoop版本,则hadoop单机模式安装成功。

配置hadoop本地库

vim /etc/profile
export  HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export  HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
source /etc/profile

配置hadoop伪分布模式:

vim $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://master:9000</value>

        </property>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>file:/opt/hadoop/hadoop-2.7.5/tmp</value>

                <description>Abase for other temporarydirectories.</description>

        </property>

</configuration>
vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>

    <property>

      <name>dfs.namenode.secondary.http-address</name>

      <value>master:50090</value>

    </property>

    <property>

      <name>dfs.replication</name>

      <value>1</value>

    </property>

    <property>

      <name>dfs.namenode.name.dir</name>

      <value>file:/opt/hadoop/hadoop-2.7.5/hdfs/name</value>

    </property>

    <property>

      <name>dfs.datanode.data.dir</name>

      <value>file:/opt/hadoop/hadoop-2.7.5/hdfs/data</value>

    </property>

</configuration>
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
 <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:10020</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.awebapp.ddress</name>
          <value>master:19888</value>
  </property>
</configuration>
vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>Master</value>
</property>
</configuration>

 

修改完配置文件后,对namenode进行格式化:

hdfs namenode –format

 启动hadoop守护进程

启动dfs:

start-dfs.sh

启动YARN:

start-yarn.sh

 

启动JobHistoryServer:

mr-jobhistory-daemon.sh start historyserver
#因为mapred-site.xml文件中配置了JobHistoryServer,所以必须启动hadoop才能正常运行。

下载、安装、配置spark

下载

wget -c http://mirrors.hust.edu.cn/apache/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz

解压安装

mkdir /opt/spark/
tar -xvf spark-2.2.1-bin-hadoop2.7.tgz -C /opt/spark/

配置spark

vim /etc/profile

export SPARK_HOME=/opt/spark/spark-2.2.1-bin-hadoop2.7/

export PATH=${SPARK_HOME}/bin:$PATH

source /etc/profile

配置pyspark

vim /etc/profile

export PYTHONPATH=$SPARK_HOME/python:/usr/bin/python

unzip $SPARK_HOME/python/lib/py4j-0.10.4-src.zip -d $SPARK_HOME/python

之后就可以用import pyspark在python中使用pyspark模块了

 

参考:

http://blog.csdn.net/u010171031/article/details/51849562

http://blog.csdn.net/xianglingchuan/article/details/61651339

http://blog.csdn.net/codeman_cdb/article/details/50986532

http://blog.csdn.net/young_kim1/article/details/50324345