异常信息:
java.net.UnknownHostException: unknown host: xxx-host
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:244)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1234)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
症状:
1. 在提交Job机器上ping xxx-host 正常。
2. 在提交Job机器上hadoop fs -ls hdfs://xxx-host:9000/xxx-path 正常
3. Job能正确读取输入数据:“INFO input.FileInputFormat: Total input paths to process : 80”
4. 然后立马就开始报上述的异常信息。
解决方法:
这类问题集中在hosts文件上,包括集群中所有机器的hosts文件
1. 集群中有的机器没有配置xxx-host (我的问题就是这么解决的)
2. 可能由于编码问题,导致某个xxx-host失效