使用hdfs-mount来使hdfs对接greenplum

hdfs-mount

hdfs-mount是一个将HDFS挂载为本地Linux文件系统的工具,使用go语言开发,不依赖libdfs和java虚拟机。它允许将远程HDFS作为本地Linux文件系统挂载,并允许任意应用程序或shell脚本以高效和安全的方式访问HDFS作为普通文件和目录。

官方地址microsoft/hdfs-mount: A tool to mount HDFS as a local Linux file system

相关博文使用hdfs-mount挂载HDFS

下载

git clone --recursive https://github.com/microsoft/hdfs-mount.git

编译

cd hdfs-mount
make

我是使用go 1.10版本编译的,用go 1.15版本编译时报错,好像是因为go 1.11后添加了用于解决包依赖管理问题的go modules,会把当前项目当做module编译

从 Go 1.11 版本开始,新增支持了go modules用于解决包依赖管理问题。该工具提供了replace,就是为了解决包的别名问题,也能替我们解决golang.org/x无法下载的的问题。go module被集成到原生的go mod命令中,但是如果你的代码库在$GOPATH中,module功能是默认不会开启的,想要开启也非常简单,通过环境变量export GO111MODULE=on即可开启

用go 1.10编译时遇到关于代码中0o777报错的消息,改用go 1.15通过

下载相应依赖包

在遇到类似于golang.org/x/...的包会出现下载失败的情况(讲道理可以通过FQ解决,不过我用proxychains没有成功),一般可以通过下载github上的镜像仓库来解决

GOPATH是go语言表示项目路径的环境变量(挺容易让人迷糊的),通过看Makefile文件可以看到

export GOPATH=${PWD}/_gopath

比如当执行go get golang.org/x/net/context失败时,可以

mkdir -p ${GOPATH}/src/golang.org/x
cd ${GOPATH}/src/golang.org/x
git clone --depth=1 https://github.com/golang/net.git

目前需要安装的包有:

git clone --depth=1 https://github.com/golang/net.git
git clone --depth=1 https://github.com/golang/protobuf.git
git clone --depth=1 https://github.com/golang/sys.git

使用

一般我们是在hadoop用户下操作。创建挂载点。为了能让非root用户也能操作,需要取消注释/etc/fuse.conf中的user_allow_other,具体的含义可以man fuse查阅

mkdir -p /mnt/hdfs
sudo chown hadoop:hadoop /mnt/hdfs
sudo sed -i  's/^#\(user_allow_other\)/\1/' /etc/fuse.conf

示例:将本机的9000端口上的hdfs挂载到/mnt/hdfs/

hdfs-mount -fuse.debug -logLevel 2  127.0.0.1:9000 /mnt/hdfs

使用Ctrl + C发出卸载指令。当没有其他进程访问挂载点内容时,完成卸载。

注意事项

当遇到类似于FUSE: -> [ID=0x12b] Create error=EIO: mkdir /var/hdfs-mount: permission denied的问题,需要

sudo mkdir /var/hdfs-mount

当遇到类似于FUSE: -> [ID=0x12e] Open error=EIO: open /var/hdfs-mount/stage999669284: permission denied的问题,需要

sudo chown hadoop:hadoop /var/hdfs-mount

可能需要chmod 777 /var/hdfs-mount(否则可能只能在hdfs挂载点里创建文件,但是无法往文件里写入内容)。

另外也需要多加留意其他涉及到的路径中的各级文件权限和拥有者情况。

fio性能测试

使用fio测试磁盘I/O性能
fio简介
使用 fio 进行 IO 性能测试
在笔记本电脑上(SSD)测试了下性能,挺一般的呀

hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=read -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
test_: Laying out IO file (1 file / 4096MiB)
fio: native_fallocate call failed: Operation not supported
Jobs: 1 (f=1): [R(1)][15.7%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 06m:28s]       
test_: (groupid=0, jobs=1): err= 0: pid=7803: Fri Oct 23 21:10:46 2020
   read: IOPS=569, BW=9117KiB/s (9335kB/s)(640MiB/71870msec)
    clat (nsec): min=1880, max=34055M, avg=1754245.50, stdev=188244560.43
     lat (nsec): min=1905, max=34055M, avg=1754315.73, stdev=188244560.04
    clat percentiles (usec):
     |  1.00th=[      3],  5.00th=[      3], 10.00th=[      3],
     | 20.00th=[      3], 30.00th=[      4], 40.00th=[      4],
     | 50.00th=[      4], 60.00th=[      4], 70.00th=[      5],
     | 80.00th=[      6], 90.00th=[    229], 95.00th=[    314],
     | 99.00th=[    652], 99.50th=[    889], 99.90th=[   2540],
     | 99.95th=[   5211], 99.99th=[2298479]
   bw (  KiB/s): min= 5376, max=142592, per=100.00%, avg=93600.00, stdev=51755.78, samples=14
   iops        : min=  336, max= 8912, avg=5850.00, stdev=3234.74, samples=14
  lat (usec)   : 2=0.13%, 4=66.46%, 10=19.30%, 20=1.40%, 50=0.16%
  lat (usec)   : 100=0.02%, 250=4.11%, 500=6.46%, 750=1.24%, 1000=0.32%
  lat (msec)   : 2=0.26%, 4=0.06%, 10=0.05%, 100=0.01%, 2000=0.01%
  lat (msec)   : >=2000=0.01%
  cpu          : usr=0.09%, sys=0.38%, ctx=5337, majf=1, minf=4
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=40951,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=9117KiB/s (9335kB/s), 9117KiB/s-9117KiB/s (9335kB/s-9335kB/s), io=640MiB (671MB), run=71870-71870msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=write -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]      
test_: (groupid=0, jobs=1): err= 0: pid=7884: Fri Oct 23 21:15:24 2020
  write: IOPS=0, BW=4909B/s (4909B/s)(1040KiB/216938msec)
    clat (usec): min=50, max=215003k, avg=3336705.92, stdev=26664209.59
     lat (usec): min=50, max=215003k, avg=3336707.02, stdev=26664209.59
    clat percentiles (usec):
     |  1.00th=[      51],  5.00th=[      61], 10.00th=[     223],
     | 20.00th=[   31851], 30.00th=[   31851], 40.00th=[   31851],
     | 50.00th=[   32113], 60.00th=[   32113], 70.00th=[   32113],
     | 80.00th=[   35914], 90.00th=[   35914], 95.00th=[   40109],
     | 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=  448, max=  640, per=100.00%, avg=511.75, stdev=86.78, samples=4
   iops        : min=   28, max=   40, avg=31.75, stdev= 5.56, samples=4
  lat (usec)   : 100=7.69%, 250=3.08%
  lat (msec)   : 10=1.54%, 50=86.15%, >=2000=1.54%
  cpu          : usr=0.00%, sys=0.05%, ctx=2443, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,65,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=4909B/s (4909B/s), 4909B/s-4909B/s (4909B/s-4909B/s), io=1040KiB (1065kB), run=216938-216938msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randread -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [r(1)][0.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 40d:02h:46m:47s]
test_: (groupid=0, jobs=1): err= 0: pid=8306: Fri Oct 23 21:18:29 2020
   read: IOPS=0, BW=1381B/s (1381B/s)(160KiB/118618msec)
    clat (msec): min=3, max=85425, avg=11861.41, stdev=26347.28
     lat (msec): min=3, max=85425, avg=11861.41, stdev=26347.28
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    4], 20.00th=[    5],
     | 30.00th=[    6], 40.00th=[ 1011], 50.00th=[ 1011], 60.00th=[ 2022],
     | 70.00th=[ 4044], 80.00th=[ 9329], 90.00th=[15637], 95.00th=[17113],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   31, max=   96, per=100.00%, avg=47.67, stdev=27.02, samples=6
   iops        : min=    1, max=    6, avg= 2.67, stdev= 1.97, samples=6
  lat (msec)   : 4=10.00%, 10=20.00%, 2000=20.00%, >=2000=50.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=13, majf=0, minf=4
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=10,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=1381B/s (1381B/s), 1381B/s-1381B/s (1381B/s-1381B/s), io=160KiB (164kB), run=118618-118618msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randwrite -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]      
test_: (groupid=0, jobs=1): err= 0: pid=8384: Fri Oct 23 21:21:27 2020
  write: IOPS=0, BW=9023B/s (9023B/s)(1040KiB/118026msec)
    clat (usec): min=252, max=70636k, avg=1815754.02, stdev=8671513.94
     lat (usec): min=254, max=70636k, avg=1815755.22, stdev=8671514.08
    clat percentiles (usec):
     |  1.00th=[     253],  5.00th=[  263193], 10.00th=[  455082],
     | 20.00th=[  817890], 30.00th=[  817890], 40.00th=[  817890],
     | 50.00th=[  817890], 60.00th=[  817890], 70.00th=[  817890],
     | 80.00th=[  817890], 90.00th=[  817890], 95.00th=[  817890],
     | 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=   31, max=   96, per=100.00%, avg=33.46, stdev= 9.13, samples=61
   iops        : min=    1, max=    6, avg= 1.98, stdev= 0.67, samples=61
  lat (usec)   : 500=1.54%
  lat (msec)   : 50=1.54%, 250=1.54%, 500=6.15%, 750=6.15%, 1000=81.54%
  lat (msec)   : >=2000=1.54%
  cpu          : usr=0.02%, sys=0.00%, ctx=665, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,65,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=9023B/s (9023B/s), 9023B/s-9023B/s (9023B/s-9023B/s), io=1040KiB (1065kB), run=118026-118026msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=readwrite -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=rw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]      
test_: (groupid=0, jobs=1): err= 0: pid=8726: Fri Oct 23 21:29:25 2020
   read: IOPS=0, BW=2677B/s (2677B/s)(976KiB/373215msec)
    clat (usec): min=2, max=6145, avg=158.26, stdev=821.58
     lat (usec): min=2, max=6147, avg=158.69, stdev=821.77
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    5], 40.00th=[    8], 50.00th=[    9], 60.00th=[   11],
     | 70.00th=[   11], 80.00th=[   12], 90.00th=[   19], 95.00th=[  239],
     | 99.00th=[ 6128], 99.50th=[ 6128], 99.90th=[ 6128], 99.95th=[ 6128],
     | 99.99th=[ 6128]
   bw (  KiB/s): min=   31, max=  159, per=100.00%, avg=60.84, stdev=29.66, samples=32
   iops        : min=    1, max=    9, avg= 3.66, stdev= 1.84, samples=32
  write: IOPS=0, BW=2853B/s (2853B/s)(1040KiB/373215msec)
    clat (usec): min=1102, max=325828k, avg=5741580.74, stdev=40322672.52
     lat (usec): min=1103, max=325828k, avg=5741581.47, stdev=40322672.60
    clat percentiles (usec):
     |  1.00th=[    1106],  5.00th=[  267387], 10.00th=[  455082],
     | 20.00th=[  817890], 30.00th=[  817890], 40.00th=[  817890],
     | 50.00th=[  817890], 60.00th=[  817890], 70.00th=[  817890],
     | 80.00th=[  817890], 90.00th=[  817890], 95.00th=[  817890],
     | 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=   31, max=   95, per=100.00%, avg=33.41, stdev= 9.02, samples=61
   iops        : min=    1, max=    5, avg= 1.93, stdev= 0.60, samples=61
  lat (usec)   : 4=10.32%, 10=16.67%, 20=16.67%, 50=1.59%, 250=0.79%
  lat (msec)   : 2=2.38%, 10=0.79%, 50=0.79%, 250=0.79%, 500=3.17%
  lat (msec)   : 750=3.17%, 1000=42.06%, >=2000=0.79%
  cpu          : usr=0.01%, sys=0.00%, ctx=1922, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=61,65,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=2677B/s (2677B/s), 2677B/s-2677B/s (2677B/s-2677B/s), io=976KiB (999kB), run=373215-373215msec
  WRITE: bw=2853B/s (2853B/s), 2853B/s-2853B/s (2853B/s-2853B/s), io=1040KiB (1065kB), run=373215-373215msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randrw -bs=16k -size=4G  -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]      
test_: (groupid=0, jobs=1): err= 0: pid=9259: Fri Oct 23 21:36:50 2020
   read: IOPS=0, BW=494B/s (494B/s)(160KiB/331264msec)
    clat (msec): min=5, max=277461, avg=33004.04, stdev=86821.44
     lat (msec): min=5, max=277461, avg=33004.04, stdev=86821.44
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    6], 10.00th=[    6], 20.00th=[    6],
     | 30.00th=[ 1003], 40.00th=[ 1011], 50.00th=[ 1011], 60.00th=[ 1011],
     | 70.00th=[ 2106], 80.00th=[ 5000], 90.00th=[17113], 95.00th=[17113],
     | 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   32, max=   32, per=100.00%, avg=32.00, stdev= 0.00, samples=9
   iops        : min=    2, max=    2, avg= 2.00, stdev= 0.00, samples=9
  write: IOPS=0, BW=296B/s (296B/s)(96.0KiB/331264msec)
    clat (usec): min=1496, max=390563, avg=203656.63, stdev=155837.20
     lat (usec): min=1499, max=390566, avg=203658.48, stdev=155837.12
    clat percentiles (usec):
     |  1.00th=[  1500],  5.00th=[  1500], 10.00th=[  1500], 20.00th=[ 39584],
     | 30.00th=[ 39584], 40.00th=[200279], 50.00th=[200279], 60.00th=[263193],
     | 70.00th=[325059], 80.00th=[325059], 90.00th=[392168], 95.00th=[392168],
     | 99.00th=[392168], 99.50th=[392168], 99.90th=[392168], 99.95th=[392168],
     | 99.99th=[392168]
   bw (  KiB/s): min=   32, max=   64, per=100.00%, avg=48.00, stdev=18.48, samples=4
   iops        : min=    2, max=    4, avg= 3.00, stdev= 1.15, samples=4
  lat (msec)   : 2=6.25%, 10=12.50%, 50=6.25%, 250=6.25%, 500=18.75%
  lat (msec)   : 2000=25.00%, >=2000=25.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=39, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=10,6,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=494B/s (494B/s), 494B/s-494B/s (494B/s-494B/s), io=160KiB (164kB), run=331264-331264msec
  WRITE: bw=296B/s (296B/s), 296B/s-296B/s (296B/s-296B/s), io=96.0KiB (98.3kB), run=331264-331264msec

对接greenplum

主要是用greenplum的tablespaces来将表的存储文件存放在hdfs挂载点上。(我最初错误的弄成:修改gpinitsystem初始化集群命令所需要的初始化配置里的MASTER_DIRECTORYDATA_DIRECTORY参数了。而且还遇到了初始化过程中fd.cdurable_link_or_rename函数无法建立链接的错误)

Creating and Managing Tablespaces | Pivotal Greenplum Docs

CREATE TABLESPACE | Pivotal Greenplum Docs

hdfs dfs -mkdir -p /user/gpadmin
hdfs dfs -chown gpadmin:gpadmin /user/gpadmin
# hdfs dfs -chmod 777 /user/gpadmin

示例

CREATE TABLESPACE hdfs LOCATION '/mnt/hdfs/user/gpadmin'

ps: greenplum 6中取消了filesapce概念

posted @ 2020-10-21 16:04  Tifa_Best  阅读(698)  评论(0编辑  收藏  举报