大数据集群搭建

1、virtualbox

  1、关闭虚拟机选择第一个休眠,会记录各种进程的状态。如果直接关掉虚拟机会关闭各种进程,导致环境崩溃。

2、Centos7

  1、修改网络

  网卡桥接,配置主机和虚拟机相互pingtong

  vim /etc/sysconfig/network-scripts/ifcfj-enp0s3

  BOOTPROTO=static

  IPADDR=192.168.0.106 跟自己主机同一网段

  GATEWAY=192.168.0.1

  NETMASK=255.255.255.0

  ONBOOT=yes

  2、修改主机名

  hostnamectl set-name spark

  vi /etc/selinux/config

      systemctl stop firewalld

  SELINUX= disabled

  3、修改host文件

  vi /etc/hosts

  192.168.0.106 spark1

  192.168.0.107 spark2

  192.168.0.108 spark3

  4、配置免密登陆访问

  ssh-keygen -t rsa

  touch /root/.ssh/authorized_keys

  cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

  ssh-copy-id -i spark3

3、JDK1.7

4、MobaXterm

5、hadoop2.4.1

 tar -zxvf hadoop2.4.2.tar.gz

 mv hadoop2.4.1 hadoop

  vim ~/.bashrc

  export HADOOP_HOME=/usr/local/hadoop
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  source ~/.bashrc

  配置hadoop下面的etc/hadoop目录下面的配置文件

  修改core-site.xml

  

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://spark1:9000</value>
</property>
</configuration>

修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/usr/local/data/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/usr/local/data/datanode</value>
</property>
<property>
  <name>dfs.tmp.dir</name>
  <value>/usr/local/data/tmp</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>
</configuration>

修改mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>spark1</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

修改slaves

spark1
spark2
spark3

启动hadoop集群

  格式化namenode在spark1上面执行

  hdfs namenode -format

  start-dfs.sh

  安装一个jdk开发工具

  yum install java-1.8.0-openjdk-devel.x86_64

  启动完成后要确认

  spark1上面有namenode,datanode,secondarynamenode

  spark2上面有datanode

  spark3上面有datanode

       http://spark1:50070/dfshealth.html#tab-overview可以访问

  启动yarn集群

  start-yarn.sh

  spark1:resourcemanager、nodemanager

  spark2:nodemanager

  spark3:nodemanager

  http://spark1:8088/cluster可以访问

6、Hive 0.13

  1、配置hive

  tar -zxvf apache-hive-0.13-bin.tar.gz

  mv apache-hive-0.13-bin hive

  vim ~/.bashrc

  $HIVE_HOME=/usr/local/hive

  配置环境变量

  2、安装mysql-sever

  $ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp

  $ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

  yum install -y mysql

  service mysqld start

  chkconfig mysqld on

  yum install -y mysql-connector-java

  cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/

  3、登录mysql创建hive元数据库

  

 

   4、配置hive

    去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai

   

  

 

   验证hive就是输入hive看是否进入命令行

  create table t(id int);

  select * from t;

  drop table t;

7、Zookeeper3.4.5

  tar -zxvf zookeeper3.4.5

  mv zookeeper3.4.5 zk

  配置环境变量

  vim ~/.bashrc

  export ZOOKEEPER_HOME=/usr/local/zk

  export JAVA_HOME=/usr/lib/jvm/jre
  export HADOOP_HOME=/usr/local/hadoop
  export HIVE_HOME=/usr/local/hive
  export ZOOKEEPER_HOME=/usr/local/zk
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

  

  修改zookeeper的conf下面的配置文件

   mv zoo_sample.cfg zoo.cfg

  vim zoo.cfg

  dataDir=/usr/local/zk/data

  server.0=spark1:2888:3888

  server.1=spark2:2888:3888

  server.2=spark3:2888:3888

  窗口data目录和myid文件

  mkdir data

  vim myid 0

  拷贝zk文件和环境变量到spark2,spark3并且刷新环境变量,修改myid 为,1,2

  scp -r zk/ root@spark3:/usr/local/

       scp ~/.bashrc root@spark3:~/

 

  启动zk

  zkServer.sh start

8、kafka_2.9.2-0.8.1

  解压scala-2.11.4.tgz解压,该命成scala

  配置环境变量 

  export JAVA_HOME=/usr/lib/jvm/jre
  export HADOOP_HOME=/usr/local/hadoop
  export HIVE_HOME=/usr/local/hive
  export ZOOKEEPER_HOME=/usr/local/zk
  export SCALA_HOME=/usr/local/scala
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

  解压kafka_2.9.2-0.8.1

  修改配置文件kafak/config/server.properites

  zookeeper.connect=spark1:2181,spark2:2181,spark3:2181

  broker.id=0

  将slf4j-1.7.6.zip解压

  slf4j-nop-1.7.6.jar 复制到kafka/libs下面

  启动kafka

  nohup bin/kafka-server-start.sh config/server.properties &

  测试集群

  bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create

   bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test

  bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning

9、spark 1.3.0

  上传spark-1.3.0-bin-hadoop2.4.tgz,解压改名

  配置环境变量

  

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

  cd /usr/local/spark/conf

  cp spark-env.sh.template spark-env.sh

  vim spark-env.sh

export JAVA_HOME=/usr/lib/jvm/jre
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER=192.168.0.106
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

  cp slaves.template  slaves

  vim slaves

  spark1

  spark2

  spark3

  复制spark和~/.bashrc到spark2,spark3,然后source ~/.bashrc

启动spark

  cd /usr/local/spark/sbin

  ./start-all.sh

验证通过jps查看是否有spark1:master,worker,spark2:worker,spark3:worker

http://spark1:8080/

运行spark-shell,进入如下界面

 

 

 

posted on 2020-05-01 22:47  清浊  阅读(1248)  评论(0编辑  收藏  举报