DataNerd - 博客园

2020年5月21日

hive的java.io.IOException: cannot find dir = hdfs://127.0.0.1:8020/WebLog/Parsed/part-r-00000 in pathToPartitionInfo: [hdfs:/WebLog/Parsed]解决

摘要：原因:是table 的location不对,hdfs:///WebLog/Parsed可能被解析为hdfs:/WebLog/Parsed,应该改为hdfs://127.0.0.1:8020/WebLog/Parsed 阅读全文

posted @ 2020-05-21 10:13 DataNerd 阅读(450) 评论(0) 推荐(0) 编辑

2020年5月11日

hive的java.lang.NoSuchMethodError: org.apache.hadoop.util.Time.monotonicNowNanos()问题解决

摘要：在Hive的cli中执行报一下错误错误分析自己在HADOOP的配置目录下的hadoop env.sh中添加了其中HBase的lib下有hadoop common 2.5.1.jar,我的HADOOP版本是2.8.5. 估计是Hive语句执行时调用的jar文件是hadoop common 2. 阅读全文

posted @ 2020-05-11 00:18 DataNerd 阅读(1760) 评论(1) 推荐(0) 编辑

2019年3月4日

Spark:The Definitive Book第十四章笔记

摘要： In addition to the Resilient Distributed Dataset (RDD) interface, the second kind of low level API in Spark is two types of “distributed shared variab 阅读全文

posted @ 2019-03-04 10:36 DataNerd 阅读(317) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第十三章笔记

摘要： This chapter covers the advanced RDD operations and focuses on key–value RDDs, a powerful abstraction for manipulating data. We also touch on some mor 阅读全文

posted @ 2019-03-04 10:03 DataNerd 阅读(304) 评论(0) 推荐(0) 编辑

2019年2月28日

Spark:The Definitive Book第十二章笔记

摘要： What Are the Low Level APIs? There are two sets of low level APIs: there is one for manipulating distributed data (RDDs), and another for distributing 阅读全文

posted @ 2019-02-28 11:24 DataNerd 阅读(145) 评论(0) 推荐(0) 编辑

2019年2月23日

Spark:The Definitive Book第十一章笔记

摘要： Datasets are a strictly Java Virtual Machine (JVM) language feature that work only with Scala and Java. Using Datasets, you can define the object that 阅读全文

posted @ 2019-02-23 14:51 DataNerd 阅读(330) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第十章笔记

摘要： What Is SQL? Big Data and SQL: Apache Hive Big Data and SQL: Spark SQL The power of Spark SQL derives from several key facts: SQL analysts can now tak 阅读全文

posted @ 2019-02-23 11:05 DataNerd 阅读(318) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第九章笔记

摘要： Spark Core DataSource: CSV JSON Parquet ORC JDBC/ODBC connections Plain text files The Structure of the Data Sources API Read API Structure The core s 阅读全文

posted @ 2019-02-23 09:58 DataNerd 阅读(443) 评论(0) 推荐(0) 编辑

2019年2月19日

Spark:The Definitive Book第八章笔记

摘要： Join Expressions A join brings together two sets of data, the left and the right, by comparing the value of one or more keys of the left and right and 阅读全文

posted @ 2019-02-19 12:29 DataNerd 阅读(207) 评论(0) 推荐(0) 编辑

Spark:The Definitive Book第七章笔记

摘要：分组的类型： The simplest grouping is to just summarize a complete DataFrame by performing an aggregation in a select statement. A “group by” allows you to 阅读全文

posted @ 2019-02-19 11:06 DataNerd 阅读(312) 评论(0) 推荐(0) 编辑

亡羊补牢

做人舒服自然，做事踏实靠谱。

公告