摘要: 顶点:VertexRDD边:EdgeRDD、Edge、EdgeDirectionTriplet:EdgeTriplet存储:PartitionStrategy通常的存储方式有两种:切边或切顶点,GraphX用的是切顶点,有四种存储方式:EdgePartition2DEdgePartition1DRa... 阅读全文
posted @ 2015-11-26 14:33 sunflower627 阅读(399) 评论(0) 推荐(0) 编辑
摘要: 1 OverviewGraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a... 阅读全文
posted @ 2015-11-26 14:32 sunflower627 阅读(366) 评论(0) 推荐(0) 编辑
摘要: 1 OverviewSpark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data ... 阅读全文
posted @ 2015-11-26 14:31 sunflower627 阅读(331) 评论(0) 推荐(0) 编辑
摘要: 1 OverviewSpark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as dist... 阅读全文
posted @ 2015-11-26 14:31 sunflower627 阅读(250) 评论(0) 推荐(0) 编辑
摘要: 1、RDD是Resilient Distributed Dataset(即"弹性分布式数据”)的缩写,它是Spark中的基本抽象类,包含在所有RDD中存在的基本操作:map、filter、persist。immutable:不可变的;implicit conversion:隐式变换;propagat... 阅读全文
posted @ 2015-11-26 14:30 sunflower627 阅读(295) 评论(0) 推荐(0) 编辑