2019 年 5月文章档案 - niutao

struct streaming中的监听器StreamingQueryListener

摘要：在struct streaming提供了一个类，用来监听流的启动、停止、状态更新 StreamingQueryListener 实例化：StreamingQueryListener 后需要实现3个函数： abstract class StreamingQueryListener { import S 阅读全文

posted @ 2019-05-30 15:26 niutao 阅读(1588) 评论(0) 推荐(0) 编辑

关于hive on spark会话的共享状态

摘要：spark sql中有一个类： org.apache.spark.sql.internal.SharedState 它是用来做： 1、元数据地址管理（warehousePath） 2、查询结果缓存管理（cacheManager） 3、程序中的执行状态和metrics的监控（statusStore）阅读全文

posted @ 2019-05-23 23:33 niutao 阅读(573) 评论(0) 推荐(0) 编辑

记一次Cloudera中页面监控失效问题

摘要：因为做了cdh的迁移，启动后所有服务都是正常执行，不影响操作，但是尴尬的是，页面上的图表监控不见了这种情况的根本原因就是： Host Monitor和Service Monitor服务失效！解决：去主节点中的/var/lib目录，然后删掉：cloudera-host-monitor ， clo 阅读全文

posted @ 2019-05-23 09:51 niutao 阅读(754) 评论(0) 推荐(0) 编辑

spark on yarn 动态资源分配报错的解决：org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

摘要：组件：cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步：确认spark-defaults.conf中添加了如下配置： spark.shuffle.service.enabled true //启用External shuffle Service服务 spar 阅读全文

posted @ 2019-05-09 10:41 niutao 阅读(1779) 评论(0) 推荐(0) 编辑

sparkOnYarn报错org.apache.hadoop.fs.FSDataInputStream

摘要：Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at org.apache.spark.deploy.SparkSubmitArguments.hand 阅读全文

posted @ 2019-05-08 15:05 niutao 阅读(1841) 评论(0) 推荐(0) 编辑

记一次newApiHadoopRdd查询数据不一致问题

posted @ 2019-05-07 12:46 niutao 阅读(1055) 评论(1) 推荐(0) 编辑

记一次sparkOnyarn错误：java.lang.UnsatisfiedLinkError

摘要：错误大概这样： Caused by: java.util.concurrent.ExecutionException: Boxed Error Caused by: java.lang.UnsatisfiedLinkError: /opt/cdh/hadoop-2.6.0-cdh5.14.0/tmp 阅读全文

posted @ 2019-05-06 10:23 niutao 阅读(1432) 评论(0) 推荐(0) 编辑

关于自定义sparkSQL数据源（Hbase）操作中遇到的坑

摘要：自定义sparkSQL数据源的过程中，需要对sparkSQL表的schema和Hbase表的schema进行整合；对于spark来说，要想自定义数据源，你可以实现这3个接口： BaseRelation 代表了一个抽象的数据源。该数据源由一行行有着已知schema的数据组成（关系表）。 TableS 阅读全文

posted @ 2019-05-01 23:34 niutao 阅读(1644) 评论(1) 推荐(1) 编辑

05 2019 档案