Spark: 一、部署篇

spark-shell

1.spark-shell之本地模式

bin/spark-shell
或者
bin/spark-shell \
--master local[2]

2.spark-shell之standalone模式

/export/servers/spark/bin/spark-shell \
--master spark://node03:7077

3.spark-shell之standalone-HA模式

/bin/spark-shell \
--master spark://node03:7077,node02:7077,node01:7077

spark-submit

1.spark-submit之本地模式

SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10

2.spark-submit之standalone模式

SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node01:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
3

3.spark-submit之standalone--HA模式

SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node01:7077,node02:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10

4.spark-submit之Yarn-client模式

SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 512m \
--executor-memory 512m \
--num-executors 3 \
--total-executor-cores 3 \
--class org.apache.spark.examples.SparkPi ${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10

5.spark-submit之Yarn-cluster模式

SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--executor-memory 512m \
--num-executors 3 \
--total-executor-cores 3 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10

spark-sql

1.spark-sql 模式(spark on hive)

nohup /export/servers/hive/bin/hive --service metastore &

/export/servers/spark/bin/spark-sql \
--master local[2] \
--conf spark.sql.shuffle.partitions=2

开启sparksql的thrifserver,Spark Thrift Sevser将Spark Application当作一个服务运行,提供beeline客户端和
JDBC方式访问,与Hive中HiveServer2服务一样的

/export/servers/spark/sbin/start-thriftserver.sh \
--hiveconf hive.server2.thrift.port=10001 \
--hiveconf hive.server2.thrift.bind.host=node03 \
--master local[2]
  • 监控WEB UI界面:
    http://node03:4040/jobs/
  • 在实际大数据分析项目中,使用SparkSQL时,往往启动一个ThriftServer服务,分配较多资源(Executor数目和内存、CPU),不同的用户启动beeline客户端连接,编写SQL语句分析数据。
  • 停止使用:
/export/servers/spark/sbin/stop-thriftserver.sh
  • 使用SparkSQL的beeline客户端命令行连接ThriftServer,启动命令如下:
/export/servers/spark/bin/beeline
!connect jdbc:hive2://node03:10001
root
123456

hive查询

  • 方式1:hive (交互式命令行CLI)

  • 方式2.1:hive -e "create database mytest"

  • 方式2.2:hive -f /export/server/hive.sql (直接执行sql脚本)

  • 方式3:beeline (首先启动metastore服务,再启动hiveserver2服务)
    将Hive当作一个服务启动(类似MySQL数据库,启动一个服务)端口号为10000
    交互式命令行,CDH版本Hive建议使用此种方式,CLI方式过时。
    nohup /export/servers/hive/bin/hive --service metastore &
    nohup /export/servers/hive/bin/hive --service hiveserver2 &

  • JDBC/ODBC方式,类似MySQL中JDBC/ODBC方式

  • SparkSQL模块是从Hive框架衍生发展而来,所以Hive提供的所有功能(数据交互方式)都支持

posted @ 2021-12-26 21:17  空归  阅读(64)  评论(0编辑  收藏  举报