2019 年 2月随笔档案 - DataNerd

Spark:The Definitive Book第十二章笔记

摘要：What Are the Low Level APIs? There are two sets of low level APIs: there is one for manipulating distributed data (RDDs), and another for distributing 阅读全文

posted @ 2019-02-28 11:24 DataNerd 阅读(146) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第十一章笔记

摘要：Datasets are a strictly Java Virtual Machine (JVM) language feature that work only with Scala and Java. Using Datasets, you can define the object that 阅读全文

posted @ 2019-02-23 14:51 DataNerd 阅读(341) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第十章笔记

摘要：What Is SQL? Big Data and SQL: Apache Hive Big Data and SQL: Spark SQL The power of Spark SQL derives from several key facts: SQL analysts can now tak 阅读全文

posted @ 2019-02-23 11:05 DataNerd 阅读(322) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第九章笔记

摘要：Spark Core DataSource: CSV JSON Parquet ORC JDBC/ODBC connections Plain text files The Structure of the Data Sources API Read API Structure The core s 阅读全文

posted @ 2019-02-23 09:58 DataNerd 阅读(453) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第八章笔记

摘要：Join Expressions A join brings together two sets of data, the left and the right, by comparing the value of one or more keys of the left and right and 阅读全文

posted @ 2019-02-19 12:29 DataNerd 阅读(207) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第七章笔记

摘要：分组的类型： The simplest grouping is to just summarize a complete DataFrame by performing an aggregation in a select statement. A “group by” allows you to 阅读全文

posted @ 2019-02-19 11:06 DataNerd 阅读(325) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第六章笔记

摘要：Where to Look for APIs DataFrame本质上是类型为Row的DataSet，需要多看https://spark.apache.org/docs/latest/api/scala/index.html org.apache.spark.sql.Dataset来发现API的更新阅读全文

posted @ 2019-02-16 12:40 DataNerd 阅读(374) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第五章笔记

摘要：DataFrame由record序列组成，record的类型是Row类型。 columns代表者计算表达式可以在独立的record上运行。 Schema定义了各列的名称和数据类型。分区定义了DataFrame和DataSet在集群上的物理分配。 Schemas 可以让数据源定义Schema（又叫做阅读全文

posted @ 2019-02-14 16:58 DataNerd 阅读(373) 评论(0) 推荐(0) 编辑

亡羊补牢

做人舒服自然，做事踏实靠谱。

02 2019 档案

公告