Hadoop使用常见问题以及解决方法7

Windows eclispe调试hive报does not have a scheme错误可能原因
1、Hive配置文件中的“hive.metastore.local”配置项值为false，需要将它修改为true，因为是单机版
2、没有设置HIVE_HOME环境变量，或设置错误
3、“does not have a scheme”很可能是因为找不到“hive-default.xml”。使用Eclipse调试Hive时，遇到找不到hive-default.xml的解决方法：http://bbs.hadoopor.com/thread-292-1-1.html

1、中文问题
从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的，后来经过查看源代码，发现hadoop仅仅是不支持以gbk格式输出中文而己。

这是TextOutputFormat.class中的代码，hadoop默认的输出都是继承自FileOutputFormat来的，FileOutputFormat的两个子类一个是基于二进制流的输出，一个就是基于文本的输出TextOutputFormat。

public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> {
  protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
private static final String utf8 = “UTF-8″;//这里被写死成了utf-8
private static final byte[] newline;
static {
   try {
      newline = “/n”.getBytes(utf8);
   } catch (UnsupportedEncodingException uee) {
      throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
   }
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
      this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
   } catch (UnsupportedEncodingException uee) {
      throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
   }
}
…
private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
      Text to = (Text) o;
      out.write(to.getBytes(), 0, to.getLength());//这里也需要修改
   } else {
      out.write(o.toString().getBytes(utf8));
   }
}
…
}
可以看出hadoop默认的输出写死为utf-8，因此如果decode中文正确，那么将Linux客户端的character设为utf-8是可以看到中文的。因为hadoop用utf-8的格式输出了中文。
因为大多数数据库是用gbk来定义字段的，如果想让hadoop用gbk格式输出中文以兼容数据库怎么办？
我们可以定义一个新的类：
public class GbkOutputFormat<K, V> extends FileOutputFormat<K, V> {
  protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
//写成gbk即可
private static final String gbk = “gbk”;
private static final byte[] newline;
static {
   try {
      newline = “/n”.getBytes(gbk);
   } catch (UnsupportedEncodingException uee) {
      throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
   }
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
      this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
   } catch (UnsupportedEncodingException uee) {
      throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
   }
}
…
private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
//       Text to = (Text) o;
//       out.write(to.getBytes(), 0, to.getLength());
//    } else {
      out.write(o.toString().getBytes(gbk));
   }
}
…
}
然后在mapreduce代码中加入conf1.setOutputFormat(GbkOutputFormat.class)
即可以gbk格式输出中文。

2、某次正常运行mapreduce实例时,抛出错误

java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

java.io.IOException: Could not get block locations. Aborting…

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

经查明，问题原因是linux机器打开了过多的文件导致。用命令ulimit -n可以发现linux默认的文件打开数目为1024，修改/ect/security/limit.conf，增加hadoop soft 65535

再重新运行程序（最好所有的datanode都修改），问题解决

3、运行一段时间后hadoop不能stop-all.sh的问题，显示报错

no tasktracker to stop ，no datanode to stop

问题的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下，linux默认会每隔一段时间（一般是一个月或者7天左右）去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop-hadoop-namenode.pid两个文件后，namenode自然就找不到datanode上的这两个进程了。

在配置文件中的export HADOOP_PID_DIR可以解决这个问题

posted @ 2012-04-12 19:47 张长胜阅读(429) 评论(0) 编辑收藏举报

刷新页面返回顶部

张长胜

Hadoop使用常见问题以及解决方法7

公告