随笔分类 - Big Data
Hadoop, Spark, etc
摘要:问题描述 Java API报错 java.io.IOException: Unable to find region for 2520192391014818087 in $TABLENAME ; ERROR Utils: Aborting task org.apache.hadoop.hbase.
阅读全文
摘要:问题描述 源表数据将HBase集群内节点的存储空间撑爆,导致HBase集群内节点拒绝服务; 思路 筛选出没用且占用空间最大的n张表,通过hbase client删除。 修复步骤 查询HDFS占用空间情况:hdfs dfs -df -h; 确认是否是HBase表占用的空间比较大:hdfs dfs -d
阅读全文
摘要:Spark Shell Example 1 Process Data from List: Example 2 Process Data from Local Text File RDD transformation and action can now be applied on the Exam
阅读全文
摘要:Reducer receives (key, values) pairs and aggregate values to a desired format, then write produced (key, value) pairs back into HDFS. E.g. Input: (ter
阅读全文
摘要:Mapper maps input key/value pairs into intermediate key/value pairs. E.g. Input: (docID, doc) Output: (term, 1) Mapper Class Prototype: Special Data T
阅读全文