摘要: Spark Core官网学习回顾 Speed disk 10x memory 100x Easy code interactive shell Unified Stack Batch Streaming ML Graph Deployment Local... 阅读全文
posted @ 2019-05-07 22:39 BBBone 阅读(139) 评论(0) 推荐(0) 编辑
摘要: 关联 DStream 和 RDDtransform(func)Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream.黑名单过滤实现思路:拿到访问日... 阅读全文
posted @ 2019-05-07 22:33 BBBone 阅读(2279) 评论(0) 推荐(1) 编辑
摘要: RDD源码解析一、RDD.scala- Resilient Distributed Dataset (RDD) 弹性分布式数据集 弹性: 体现在计算上面- the basic abstraction in Spark- Represents an immutable val RDDA == RDDB- partitioned collection of elements-... 阅读全文
posted @ 2019-05-07 22:18 BBBone 阅读(291) 评论(0) 推荐(0) 编辑
摘要: 基于Maven的构建是Apache Spark的参考构建。使用Maven构建Spark需要Maven 3.5.4和Java 8.请注意,从Spark 2.2.0开始,对Java 7的支持已被删除。包:jdk-8u51-linux-x64.tar.gzapache-maven-3.3.9-bin.tar.gzspark-2.4.2.tgzscala-2.11.8.tgz1、maven中的设置需要通过... 阅读全文
posted @ 2019-05-07 21:39 BBBone 阅读(502) 评论(0) 推荐(0) 编辑
摘要: 配置文件:pom.xml 2.11.8 2.2.0 2.6.0-cdh5.7.0 cloudera cloudera https://repository.cloudera.com/artifactory/cloudera-repos/ org.scala-lang ... 阅读全文
posted @ 2019-05-07 19:10 BBBone 阅读(2035) 评论(0) 推荐(0) 编辑