摘要:
1 /* 2 * Licensed to the Apache Software Foundation (ASF) under one or more 3 * contributor license agreements. See the NOTICE file distributed with 4 * this work for additional information regarding 阅读全文
摘要:
下面这个网址介绍了自定义processor的具体方法 https://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/ 开发的详细API可以参照http://nifi.apache.org/developer-guide.h 阅读全文
摘要:
石杉老师讲要从以下几个方面去优化Spark的性能,其中Shuffle调优是重点。 下面是与调优相关的几篇不错的博客,以供参考 官网的调优 https://spark.apache.org/docs/latest/tuning.html 序列化 https://stackoverflow.com/qu 阅读全文
摘要:
下面是一个spark提交的例子 spark-submit --class HiveColNullRatioStats --master yarn --deploy-mode client --num-executors 3 --executor-memory 6G --executor-cores 阅读全文
摘要:
1. 全角到半角的转换 2. Use hdfs file 注册Function hadoop fs -put -f full2half-1.0-SNAPSHOT.jar /home/hypers/lib beeline -u jdbc:hive2://******:10000/ -n hdfs -p 阅读全文
摘要:
目前hive常用的存储格式 STORED AS (TextFile|RCFile|SequenceFile|AVRO|ORC|Parquet) TEXTFILE, SEQUENCEFILE, RCFILE, ORC, PARQUET, AVRO 下面是他们的详细对比: 存储空间最小, 查询的效率最高 阅读全文
摘要:
spark-submit --class WordCount \> --master yarn-cluster \> --num-executors 10 \> --executor-memory 6G \> --executor-cores 4 \> --driver-memory 1G \> / 阅读全文
摘要:
1. HBase to HBase Mapper 继承 TableMapper,输入为Rowkey和Result. Reducer 继承 TableReducer Driver 2. HBase to File Mapper No Reducer Reducer Driver 3. File to 阅读全文
摘要:
CellCounter: Count cells in HBase table completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster expo 阅读全文
摘要:
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.*; import... 阅读全文