Hadoop学习笔记（四）eclipse远程连接Hadoop和操作hdfs

Windows下的eclipse远程连接Linux下的Hadoop

添加jar包：

D:\Hadoop2.6.4\hadoop-2.6.4\share\hadoop\hdfs下的三个jar包以及D:\Hadoop2.6.4\hadoop-2.6.4\share\hadoop\hdfs\lib下的所有jar包

D:\Hadoop2.6.4\hadoop-2.6.4\share\hadoop\common下的三个jar包以及D:\Hadoop2.6.4\hadoop-2.6.4\share\hadoop\common\lib下的所有jar包

Java API操作hdfs：

1，FileSystem类：

FileSystem fs=FileSystem.get(conf)：获取本地文件系统

FileSystem fst=FileSystem.get(new URI("hdfs://hadoop3:9000"), conf, "hadoop");：获取的是分布式文件系统

2，Configuration对象用于读取配置文件：

conf.addResource("xxx.xml");用于加载配置文件

读取jar包中的配置文件：core-default.xml,hdfs-default.xml（默认上传的文件备份3份，块大小：128MB在hadoop/hdfs/hdfs-2.6.4.jar下）,yarn-default.xml,maprred-default.xml

例：

文件上传到hdfs后的真实存储位置：

/tmp/hadoop-hadoop/dfs/data/current/BP-533891505-192.168.25.133-1568809359662/current/finalized/subdir0/subdir0下的blk_1073741825

（hdfs中datanode的存储位置默认：/tmp/hadoop-hadoop/dfs/data）可以自己设置，这个目录下有一个.lock文件（锁文件）限定每个节点只开启一个DataNode进程。

每个文件上传到hdfs后会生成两个文件：

blk_1073741825（块池ID，全局唯一） blk_1073741825_1001.meta（原始文件的信息，记录长度，时间，偏移量等）

文件下载到Linux或Windows后：会生成.crc文件（校验下载的文件是否是同一个文件，文件的完整性）.crc文件用起始偏移量和结尾偏移量校验，如果要下载的文件的结尾内容改变，下载的文件不会改变，如果要下载的文件中间内容修改，则文件无法下载，会报检验和错误：Checksum error。

文件上传：fst.copyFromLocalFile(src, dst);FileSystem对象的方法

文件下载：fst.copyToLocalFile(src, dst);

文件夹创建：Path p=new Path("/user/hadoop/test1"); fs.mkdirs(p);//可以多级递归创建文件夹

文件夹、文件删除： fs.delete(p, false/true); //如果是true则递归删除，false为不进行递归删除

判断文件、目录是否存在：boolean ss=fs.exists(new Path("/tt")); System.out.print(ss); 结果为true则存在

文件重命名：fs.rename(new Path("/piao.txt"),new Path ("/飘英文版.txt"));

获取文件列表,信息：

RemoteIterator<LocatedFileStatus> lists= fs.listFiles(new Path("/"), false); //变量为true则递归输出，变量为false则只输出指定目录下的文件信息
//循环遍历迭代器
while(lists.hasNext())
{
LocatedFileStatus local= lists.next();
System.out.print(local);
System.out.print(local.getPath()+"\n"); //输出文件路径
System.out.print(local.getLen()+"\n"); //输出文件长度
System.out.print(local.getBlockSize()+"\n"); //输出文件的块大小
BlockLocation[] bls= local.getBlockLocations(); //返回文件的块信息，封装在数组中
System.out.print(bls.length+"\n"); //输出文件的块个数
}

获取目录信息：

      FileStatus[]  fsts=fs.listStatus(new Path("/"));
      for(FileStatus f:fsts)
      {
          if(f.isDirectory())   //如果是f.isFile()则输出文件信息，但没有块的相关信息
          {
              System.out.print(f+"\n");
          }
          
      }

用IO流方式操作hdfs：

//上传文件
    FileInputStream in=new FileInputStream(new File("D:\\piao.txt"));   //本地地址
    FSDataOutputStream out= fs.create(new Path("/piao1.txt"));     //hdfs目录
    IOUtils.copyBytes(in, out, 4096);

//下载文件
      FSDataInputStream in= fs.open(new Path("/piao1.txt"));
      FileOutputStream out=new FileOutputStream(new File("D:\\"));
      IOUtils.copyBytes(in, out, 4096);
IOUtiles不用关闭流

注意：

操作中常见错误1：Permission denied: user=Administrator, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x（权限错误）

解决方法：

posted on 2019-09-12 13:07 不愧下学阅读(926) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

寂天风

Hadoop学习笔记（四）eclipse远程连接Hadoop和操作hdfs

导航

公告