608088 - 博客园

2022年7月17日

摘要： Overview ReplicaSet is a process of monitoring pods, when pod is down, it will startup a new one. ReplicaSet identify which pods to monitor by labels. 阅读全文

posted @ 2022-07-17 15:38 608088 阅读(40) 评论(0) 推荐(0) 编辑

2022年7月15日

Node IP, Cluster IP, Pod IP

摘要： Node IP IP of the machine(physical or virtual) that installed Kubernetes. kubectl get nodes kubectl describe node k8s -A | grep InternalIP Cluster IP 阅读全文

posted @ 2022-07-15 22:19 608088 阅读(68) 评论(0) 推荐(0) 编辑

Expose Services

摘要： Overview Kubernetes NodePort, LoadBalancer, and Ingress. They are all different ways to route traffic from the internet to your services inside the Ku 阅读全文

posted @ 2022-07-15 20:38 608088 阅读(52) 评论(0) 推荐(0) 编辑

2022年5月29日

Spark 源码系列 - EventLoop

摘要：结论 EventLoop是一个调度，其内部使用LinkedBlockingDeque类型的eventQueue对象，存储待处理的任务。死循环调度线程 eventThread 不停的查询 eventQueue 中的新数据。通过post方法, 提交任务到队列中。 EventLoop -> onRec 阅读全文

posted @ 2022-05-29 11:36 608088 阅读(61) 评论(0) 推荐(0) 编辑

Spark 源码系列 - DAGScheduler 概述

摘要：结论 DAGScheduler 在主线程提交任务到EventLoop阻塞队列中 DAGScheduler 在主线程等待异步任务的执行完成 EventLoop 回调 DAGScheduler的onReceive方法，进行Stage拆分为什么不直接在主线程完成猜想是因为采用 "生产消费" 设计，调度阅读全文

posted @ 2022-05-29 11:27 608088 阅读(35) 评论(0) 推荐(0) 编辑

Spark 源码系列 - DAGScheduler 执行

摘要：结论 DAGScheduler → runJob def runJob[T, U]( val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties) ... // 等待job执行结束 ThreadU 阅读全文

posted @ 2022-05-29 10:47 608088 阅读(41) 评论(0) 推荐(0) 编辑

Spark 源码系列 - DAGScheduler 触发

摘要：结论 action类型的算子，最终由 SparkContext.runJob 触发 DAGScheduler.runJob action算子最终触发sparkContext.runJob saveAsTextFile action算子触发 /* 底层调用 SparkHadoopWriter.writ 阅读全文

posted @ 2022-05-29 08:01 608088 阅读(38) 评论(0) 推荐(0) 编辑

2022年5月23日

Spark 源码系列 - MapPartitionsRDD & ShuffledRDD

摘要：结论两种方式都是包装模式，即传入对象自己，然后生成新的对象；非shuffle类算子，每次调用创建 MapPartitionsRDD shuffle类算子，每次调用创建 ShuffledRDD 非shuffle类 val words = lines.flatMap(_.split("\\s+")) 阅读全文

posted @ 2022-05-23 22:04 608088 阅读(40) 评论(0) 推荐(0) 编辑

2022年5月21日

Spark 源码系列 - textFile -> inputSplit

摘要：理解 goalSize:每个分区的预估大小(字节) splitSize: 实际分区大小预估分区大小不足128M, 按照预估大小; 如果超过128M 就是128M 源码 public InputSplit[] getSplits(JobConf job, int numSplits) throws 阅读全文

posted @ 2022-05-21 23:42 608088 阅读(118) 评论(0) 推荐(0) 编辑

2022年5月4日

Spark 源码系列 - Yarn集群处理

摘要：环境调试引入依赖上篇文章提到SpartSubmit最终调用YarnClusterApplication对象的start方法， YarnClusterApplication代码存在于依赖包 spark-yarn 中，本文代码需要引入如下pom. 调试代码和生产代码不需要此依赖包。 <depende 阅读全文

posted @ 2022-05-04 12:08 608088 阅读(136) 评论(0) 推荐(0) 编辑

公告