Spark: 一、部署篇
spark-shell
1.spark-shell之本地模式
bin/spark-shell
或者
bin/spark-shell \
--master local[2]
2.spark-shell之standalone模式
/export/servers/spark/bin/spark-shell \
--master spark://node03:7077
3.spark-shell之standalone-HA模式
/bin/spark-shell \
--master spark://node03:7077,node02:7077,node01:7077
spark-submit
1.spark-submit之本地模式
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10
2.spark-submit之standalone模式
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node01:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
3
3.spark-submit之standalone--HA模式
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node01:7077,node02:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10
4.spark-submit之Yarn-client模式
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 512m \
--executor-memory 512m \
--num-executors 3 \
--total-executor-cores 3 \
--class org.apache.spark.examples.SparkPi ${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10
5.spark-submit之Yarn-cluster模式
SPARK_HOME=/export/servers/spark
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--executor-memory 512m \
--num-executors 3 \
--total-executor-cores 3 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.12-3.2.2.jar \
10
spark-sql
1.spark-sql 模式(spark on hive)
nohup /export/servers/hive/bin/hive --service metastore &
/export/servers/spark/bin/spark-sql \
--master local[2] \
--conf spark.sql.shuffle.partitions=2
开启sparksql的thrifserver,Spark Thrift Sevser将Spark Application当作一个服务运行,提供beeline客户端和
JDBC方式访问,与Hive中HiveServer2服务一样的
/export/servers/spark/sbin/start-thriftserver.sh \
--hiveconf hive.server2.thrift.port=10001 \
--hiveconf hive.server2.thrift.bind.host=node03 \
--master local[2]
- 监控WEB UI界面:
http://node03:4040/jobs/ - 在实际大数据分析项目中,使用SparkSQL时,往往启动一个ThriftServer服务,分配较多资源(Executor数目和内存、CPU),不同的用户启动beeline客户端连接,编写SQL语句分析数据。
- 停止使用:
/export/servers/spark/sbin/stop-thriftserver.sh
- 使用SparkSQL的beeline客户端命令行连接ThriftServer,启动命令如下:
/export/servers/spark/bin/beeline
!connect jdbc:hive2://node03:10001
root
123456
hive查询
-
方式1:hive (交互式命令行CLI)
-
方式2.1:hive -e "create database mytest"
-
方式2.2:hive -f /export/server/hive.sql (直接执行sql脚本)
-
方式3:beeline (首先启动metastore服务,再启动hiveserver2服务)
将Hive当作一个服务启动(类似MySQL数据库,启动一个服务)端口号为10000
交互式命令行,CDH版本Hive建议使用此种方式,CLI方式过时。
nohup /export/servers/hive/bin/hive --service metastore &
nohup /export/servers/hive/bin/hive --service hiveserver2 & -
JDBC/ODBC方式,类似MySQL中JDBC/ODBC方式
-
SparkSQL模块是从Hive框架衍生发展而来,所以Hive提供的所有功能(数据交互方式)都支持