随笔分类 - spark
spark学习笔记
摘要:package com.shujia.spark.streaming import org.apache.spark.SparkConf import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream} import
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.kafka.clients.consumer.ConsumerRecord import org.apache.kafka.common.serialization.StringDeserial
阅读全文
摘要:package com.shujia.spark.streaming import java.util import org.apache.spark.SparkConf import org.apache.spark.streaming.{Durations, StreamingContext}
阅读全文
摘要:/*package com.shujia.spark.streaming import kafka.serializer.StringDecoder import org.apache.spark.SparkConf import org.apache.spark.storage.StorageLe
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, SaveMode,
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, SaveMode,
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.sql.SparkSession impor
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.{DataFrame, SparkSession} import org.
阅读全文
摘要:package com.shujia.spark.streaming import org.apache.spark.SparkConf import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream} import
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql.{DataFrame, SparkSession} object Demo8Stu { def main(args: Array[String]): Unit = { val spark
阅读全文
摘要:cache checkpoint groupBykey和reduceByKey区别 BlockManager MapReduce过程 RDD的五大特性 RDD的依赖关系 shuffle过程 spark搭建 spark运行时 spark-client spark-cluster 资源调度和任务申请
阅读全文
摘要:1、上传解压,配置环境变量 配置bin目录 2、修改配置文件 conf mv spark-env.sh.template spark-env.sh 增加配置 export SPARK_MASTER_IP=master export SPARK_MASTER_PORT=7077 export SPAR
阅读全文
摘要:spark-sql 写代码方式 1、idea里面将代码编写好打包上传到集群中运行,上线使用 spark-submit提交 2、saprk shell (repl) 里面使用sqlContext 测试使用,简单任务使用 spark-shell --master yarn-client 不能使用yarn
阅读全文
摘要:文字: 大数据计算分两步 1、资源调度 yarn-client 1、通过spark-submit提交任务 2、在本地启动Driver val sc = new SparkContext(conf) 3、Driver发请求给RM 启动AM 4、RM分配资源启动AM 5、AM向RM申请资源启动Excut
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} object Demo6SparkOnHive { def main(args: Array[String]):
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} object Demo5Submit { def main(args: Array[String]): Unit
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Row, SparkSe
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql._ import org.apache.spark.sql.expressions.Window object Demo3DataFrameApi { def main(args: Ar
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} object Demo2DataSource { def main(args: Array[String]): U
阅读全文
摘要:package com.shujia.spark.sql import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} object Demo1SparkSession { def main(args: Array[String]):
阅读全文