Idiot-maker

  :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file

This is a typical case of the maven-assembly plugin breaking things.

Why this happened to us

Differents JARs (hadoop-commons for LocalFileSystem, hadoop-hdfs for DistributedFileSystem) each contain a different file called org.apache.hadoop.fs.FileSystem in their META-INFO/servicesdirectory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface, see org.apache.hadoop.FileSystem line 2116).

When we use maven-assembly, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the Filesystem list from hadoop-commons overwrites the list from hadoop-hdfs, so DistributedFileSystem was no longer declared.

How we fixed it

After loading the hadoop configuration, but just before doing anything Filesystem-related, we call this:

    hadoopConfig.set("fs.hdfs.impl", 
        org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
    );
    hadoopConfig.set("fs.file.impl",
        org.apache.hadoop.fs.LocalFileSystem.class.getName()
    );
posted on 2015-04-24 09:44  NickyYe  阅读(1101)  评论(0编辑  收藏  举报