随笔分类 - mapreduce
摘要:http://blog.javachen.com/2014/06/24/tuning-in-mapreduce/本文主要记录Hadoop 2.x版本中MapReduce参数调优,不涉及Yarn的调优。Hadoop的默认配置文件(以cdh5.0.1为例):core-default.xmlhdfs-de...
阅读全文
摘要:1 public class TopK extends Configured implements Tool { 2 3 public static class TopKMapper extends Mapper { 4 5 public static final int K = 100; 6 private TreeMap tm = new TreeMap(); 7 8 @Override 9 protected void map(Object key, Text value, Context context) ...
阅读全文
摘要:1 public class GroupComparator implements RawComparator { 2 3 @Override 4 public int compare(MyBinaryKey o1, MyBinaryKey o2) { 5 return o1.toString().compareTo(o2.toString()); 6 } 7 8 @Override 9 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {10 return Writabl...
阅读全文
摘要:来自:http://blog.csdn.net/hezuoxiang/article/details/6878026写了个mapreduce的JAVA程序,自定义了个partitionclass indexPartition extends HashPartitioner{public int getPartition(Text key, Text value,int numReduceTasks){Text tmp = new Text(key.toString().substring(0,key.toString().indexOf(":"))); super.getP
阅读全文
摘要:(1)scan.setCacheBlocks(false);初始化map任务 TableMapReduceUtil.initTableMapperJob本次mr任务scan的所有数据不放在缓存中,一方面节省了交换缓存的操作消耗,可以提升本次mr任务的效率,另一方面,一般mr任务scan的数据都是 一次性或者非经常用到的,因此不需要将它们替换到缓存中,缓存中还是放一些正常的多次访问的数据,这样可以提升查询性能。(2)conf.setBoolean("mapred.map.tasks.speculative.execution", false);是否开启mr的map备用任务机制
阅读全文
摘要:http://my.oschina.net/u/617085/blog/71740"Failed to set permissions of path"问题 参考文献:https://issues.apache.org/jira/browse/HADOOP-8089 错误信息如下: ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException Failed to set permissions of path:\usr\hadoop\tm
阅读全文