使用hdfs-mount来使hdfs对接greenplum
hdfs-mount
hdfs-mount是一个将HDFS挂载为本地Linux文件系统的工具,使用go语言开发,不依赖libdfs和java虚拟机。它允许将远程HDFS作为本地Linux文件系统挂载,并允许任意应用程序或shell脚本以高效和安全的方式访问HDFS作为普通文件和目录。
官方地址microsoft/hdfs-mount: A tool to mount HDFS as a local Linux file system
下载
git clone --recursive https://github.com/microsoft/hdfs-mount.git
编译
cd hdfs-mount
make
我是使用go 1.10版本编译的,用go 1.15版本编译时报错,好像是因为go 1.11后添加了用于解决包依赖管理问题的go modules,会把当前项目当做module编译
从 Go 1.11 版本开始,新增支持了go modules
用于解决包依赖管理问题。该工具提供了replace
,就是为了解决包的别名问题,也能替我们解决golang.org/x无法下载的的问题。go module
被集成到原生的go mod
命令中,但是如果你的代码库在$GOPATH
中,module
功能是默认不会开启的,想要开启也非常简单,通过环境变量export GO111MODULE=on
即可开启
用go 1.10编译时遇到关于代码中0o777报错的消息,改用go 1.15通过
下载相应依赖包
在遇到类似于golang.org/x/...的包会出现下载失败的情况(讲道理可以通过FQ解决,不过我用proxychains
没有成功),一般可以通过下载github上的镜像仓库来解决
GOPATH
是go语言表示项目路径的环境变量(挺容易让人迷糊的),通过看Makefile文件可以看到
export GOPATH=${PWD}/_gopath
比如当执行go get golang.org/x/net/context
失败时,可以
mkdir -p ${GOPATH}/src/golang.org/x
cd ${GOPATH}/src/golang.org/x
git clone --depth=1 https://github.com/golang/net.git
目前需要安装的包有:
git clone --depth=1 https://github.com/golang/net.git
git clone --depth=1 https://github.com/golang/protobuf.git
git clone --depth=1 https://github.com/golang/sys.git
使用
一般我们是在hadoop用户下操作。创建挂载点。为了能让非root用户也能操作,需要取消注释/etc/fuse.conf
中的user_allow_other
,具体的含义可以man fuse
查阅
mkdir -p /mnt/hdfs
sudo chown hadoop:hadoop /mnt/hdfs
sudo sed -i 's/^#\(user_allow_other\)/\1/' /etc/fuse.conf
示例:将本机的9000端口上的hdfs挂载到/mnt/hdfs/
hdfs-mount -fuse.debug -logLevel 2 127.0.0.1:9000 /mnt/hdfs
使用Ctrl + C
发出卸载指令。当没有其他进程访问挂载点内容时,完成卸载。
注意事项
当遇到类似于FUSE: -> [ID=0x12b] Create error=EIO: mkdir /var/hdfs-mount: permission denied
的问题,需要
sudo mkdir /var/hdfs-mount
当遇到类似于FUSE: -> [ID=0x12e] Open error=EIO: open /var/hdfs-mount/stage999669284: permission denied
的问题,需要
sudo chown hadoop:hadoop /var/hdfs-mount
可能需要chmod 777 /var/hdfs-mount
(否则可能只能在hdfs挂载点里创建文件,但是无法往文件里写入内容)。
另外也需要多加留意其他涉及到的路径中的各级文件权限和拥有者情况。
fio
性能测试
使用fio测试磁盘I/O性能
fio简介
使用 fio 进行 IO 性能测试
在笔记本电脑上(SSD)测试了下性能,挺一般的呀
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=read -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
test_: Laying out IO file (1 file / 4096MiB)
fio: native_fallocate call failed: Operation not supported
Jobs: 1 (f=1): [R(1)][15.7%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 06m:28s]
test_: (groupid=0, jobs=1): err= 0: pid=7803: Fri Oct 23 21:10:46 2020
read: IOPS=569, BW=9117KiB/s (9335kB/s)(640MiB/71870msec)
clat (nsec): min=1880, max=34055M, avg=1754245.50, stdev=188244560.43
lat (nsec): min=1905, max=34055M, avg=1754315.73, stdev=188244560.04
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3],
| 20.00th=[ 3], 30.00th=[ 4], 40.00th=[ 4],
| 50.00th=[ 4], 60.00th=[ 4], 70.00th=[ 5],
| 80.00th=[ 6], 90.00th=[ 229], 95.00th=[ 314],
| 99.00th=[ 652], 99.50th=[ 889], 99.90th=[ 2540],
| 99.95th=[ 5211], 99.99th=[2298479]
bw ( KiB/s): min= 5376, max=142592, per=100.00%, avg=93600.00, stdev=51755.78, samples=14
iops : min= 336, max= 8912, avg=5850.00, stdev=3234.74, samples=14
lat (usec) : 2=0.13%, 4=66.46%, 10=19.30%, 20=1.40%, 50=0.16%
lat (usec) : 100=0.02%, 250=4.11%, 500=6.46%, 750=1.24%, 1000=0.32%
lat (msec) : 2=0.26%, 4=0.06%, 10=0.05%, 100=0.01%, 2000=0.01%
lat (msec) : >=2000=0.01%
cpu : usr=0.09%, sys=0.38%, ctx=5337, majf=1, minf=4
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=40951,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=9117KiB/s (9335kB/s), 9117KiB/s-9117KiB/s (9335kB/s-9335kB/s), io=640MiB (671MB), run=71870-71870msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=write -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test_: (groupid=0, jobs=1): err= 0: pid=7884: Fri Oct 23 21:15:24 2020
write: IOPS=0, BW=4909B/s (4909B/s)(1040KiB/216938msec)
clat (usec): min=50, max=215003k, avg=3336705.92, stdev=26664209.59
lat (usec): min=50, max=215003k, avg=3336707.02, stdev=26664209.59
clat percentiles (usec):
| 1.00th=[ 51], 5.00th=[ 61], 10.00th=[ 223],
| 20.00th=[ 31851], 30.00th=[ 31851], 40.00th=[ 31851],
| 50.00th=[ 32113], 60.00th=[ 32113], 70.00th=[ 32113],
| 80.00th=[ 35914], 90.00th=[ 35914], 95.00th=[ 40109],
| 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
| 99.95th=[17112761], 99.99th=[17112761]
bw ( KiB/s): min= 448, max= 640, per=100.00%, avg=511.75, stdev=86.78, samples=4
iops : min= 28, max= 40, avg=31.75, stdev= 5.56, samples=4
lat (usec) : 100=7.69%, 250=3.08%
lat (msec) : 10=1.54%, 50=86.15%, >=2000=1.54%
cpu : usr=0.00%, sys=0.05%, ctx=2443, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,65,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=4909B/s (4909B/s), 4909B/s-4909B/s (4909B/s-4909B/s), io=1040KiB (1065kB), run=216938-216938msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randread -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [r(1)][0.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 40d:02h:46m:47s]
test_: (groupid=0, jobs=1): err= 0: pid=8306: Fri Oct 23 21:18:29 2020
read: IOPS=0, BW=1381B/s (1381B/s)(160KiB/118618msec)
clat (msec): min=3, max=85425, avg=11861.41, stdev=26347.28
lat (msec): min=3, max=85425, avg=11861.41, stdev=26347.28
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 5],
| 30.00th=[ 6], 40.00th=[ 1011], 50.00th=[ 1011], 60.00th=[ 2022],
| 70.00th=[ 4044], 80.00th=[ 9329], 90.00th=[15637], 95.00th=[17113],
| 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
| 99.99th=[17113]
bw ( KiB/s): min= 31, max= 96, per=100.00%, avg=47.67, stdev=27.02, samples=6
iops : min= 1, max= 6, avg= 2.67, stdev= 1.97, samples=6
lat (msec) : 4=10.00%, 10=20.00%, 2000=20.00%, >=2000=50.00%
cpu : usr=0.00%, sys=0.00%, ctx=13, majf=0, minf=4
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=10,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=1381B/s (1381B/s), 1381B/s-1381B/s (1381B/s-1381B/s), io=160KiB (164kB), run=118618-118618msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randwrite -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test_: (groupid=0, jobs=1): err= 0: pid=8384: Fri Oct 23 21:21:27 2020
write: IOPS=0, BW=9023B/s (9023B/s)(1040KiB/118026msec)
clat (usec): min=252, max=70636k, avg=1815754.02, stdev=8671513.94
lat (usec): min=254, max=70636k, avg=1815755.22, stdev=8671514.08
clat percentiles (usec):
| 1.00th=[ 253], 5.00th=[ 263193], 10.00th=[ 455082],
| 20.00th=[ 817890], 30.00th=[ 817890], 40.00th=[ 817890],
| 50.00th=[ 817890], 60.00th=[ 817890], 70.00th=[ 817890],
| 80.00th=[ 817890], 90.00th=[ 817890], 95.00th=[ 817890],
| 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
| 99.95th=[17112761], 99.99th=[17112761]
bw ( KiB/s): min= 31, max= 96, per=100.00%, avg=33.46, stdev= 9.13, samples=61
iops : min= 1, max= 6, avg= 1.98, stdev= 0.67, samples=61
lat (usec) : 500=1.54%
lat (msec) : 50=1.54%, 250=1.54%, 500=6.15%, 750=6.15%, 1000=81.54%
lat (msec) : >=2000=1.54%
cpu : usr=0.02%, sys=0.00%, ctx=665, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,65,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=9023B/s (9023B/s), 9023B/s-9023B/s (9023B/s-9023B/s), io=1040KiB (1065kB), run=118026-118026msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=readwrite -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=rw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test_: (groupid=0, jobs=1): err= 0: pid=8726: Fri Oct 23 21:29:25 2020
read: IOPS=0, BW=2677B/s (2677B/s)(976KiB/373215msec)
clat (usec): min=2, max=6145, avg=158.26, stdev=821.58
lat (usec): min=2, max=6147, avg=158.69, stdev=821.77
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 5], 40.00th=[ 8], 50.00th=[ 9], 60.00th=[ 11],
| 70.00th=[ 11], 80.00th=[ 12], 90.00th=[ 19], 95.00th=[ 239],
| 99.00th=[ 6128], 99.50th=[ 6128], 99.90th=[ 6128], 99.95th=[ 6128],
| 99.99th=[ 6128]
bw ( KiB/s): min= 31, max= 159, per=100.00%, avg=60.84, stdev=29.66, samples=32
iops : min= 1, max= 9, avg= 3.66, stdev= 1.84, samples=32
write: IOPS=0, BW=2853B/s (2853B/s)(1040KiB/373215msec)
clat (usec): min=1102, max=325828k, avg=5741580.74, stdev=40322672.52
lat (usec): min=1103, max=325828k, avg=5741581.47, stdev=40322672.60
clat percentiles (usec):
| 1.00th=[ 1106], 5.00th=[ 267387], 10.00th=[ 455082],
| 20.00th=[ 817890], 30.00th=[ 817890], 40.00th=[ 817890],
| 50.00th=[ 817890], 60.00th=[ 817890], 70.00th=[ 817890],
| 80.00th=[ 817890], 90.00th=[ 817890], 95.00th=[ 817890],
| 99.00th=[17112761], 99.50th=[17112761], 99.90th=[17112761],
| 99.95th=[17112761], 99.99th=[17112761]
bw ( KiB/s): min= 31, max= 95, per=100.00%, avg=33.41, stdev= 9.02, samples=61
iops : min= 1, max= 5, avg= 1.93, stdev= 0.60, samples=61
lat (usec) : 4=10.32%, 10=16.67%, 20=16.67%, 50=1.59%, 250=0.79%
lat (msec) : 2=2.38%, 10=0.79%, 50=0.79%, 250=0.79%, 500=3.17%
lat (msec) : 750=3.17%, 1000=42.06%, >=2000=0.79%
cpu : usr=0.01%, sys=0.00%, ctx=1922, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=61,65,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=2677B/s (2677B/s), 2677B/s-2677B/s (2677B/s-2677B/s), io=976KiB (999kB), run=373215-373215msec
WRITE: bw=2853B/s (2853B/s), 2853B/s-2853B/s (2853B/s-2853B/s), io=1040KiB (1065kB), run=373215-373215msec
hadoop@ubuntu:/mnt/hdfs/user/hadoop$ fio -filename=/mnt/hdfs/user/hadoop/test.file -direct=0 -iodepth 1 -thread -rw=randrw -bs=16k -size=4G -numjobs=1 -runtime=60 -group_reporting -name=test_
test_: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 thread
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test_: (groupid=0, jobs=1): err= 0: pid=9259: Fri Oct 23 21:36:50 2020
read: IOPS=0, BW=494B/s (494B/s)(160KiB/331264msec)
clat (msec): min=5, max=277461, avg=33004.04, stdev=86821.44
lat (msec): min=5, max=277461, avg=33004.04, stdev=86821.44
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 6], 10.00th=[ 6], 20.00th=[ 6],
| 30.00th=[ 1003], 40.00th=[ 1011], 50.00th=[ 1011], 60.00th=[ 1011],
| 70.00th=[ 2106], 80.00th=[ 5000], 90.00th=[17113], 95.00th=[17113],
| 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
| 99.99th=[17113]
bw ( KiB/s): min= 32, max= 32, per=100.00%, avg=32.00, stdev= 0.00, samples=9
iops : min= 2, max= 2, avg= 2.00, stdev= 0.00, samples=9
write: IOPS=0, BW=296B/s (296B/s)(96.0KiB/331264msec)
clat (usec): min=1496, max=390563, avg=203656.63, stdev=155837.20
lat (usec): min=1499, max=390566, avg=203658.48, stdev=155837.12
clat percentiles (usec):
| 1.00th=[ 1500], 5.00th=[ 1500], 10.00th=[ 1500], 20.00th=[ 39584],
| 30.00th=[ 39584], 40.00th=[200279], 50.00th=[200279], 60.00th=[263193],
| 70.00th=[325059], 80.00th=[325059], 90.00th=[392168], 95.00th=[392168],
| 99.00th=[392168], 99.50th=[392168], 99.90th=[392168], 99.95th=[392168],
| 99.99th=[392168]
bw ( KiB/s): min= 32, max= 64, per=100.00%, avg=48.00, stdev=18.48, samples=4
iops : min= 2, max= 4, avg= 3.00, stdev= 1.15, samples=4
lat (msec) : 2=6.25%, 10=12.50%, 50=6.25%, 250=6.25%, 500=18.75%
lat (msec) : 2000=25.00%, >=2000=25.00%
cpu : usr=0.00%, sys=0.00%, ctx=39, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=10,6,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=494B/s (494B/s), 494B/s-494B/s (494B/s-494B/s), io=160KiB (164kB), run=331264-331264msec
WRITE: bw=296B/s (296B/s), 296B/s-296B/s (296B/s-296B/s), io=96.0KiB (98.3kB), run=331264-331264msec
对接greenplum
主要是用greenplum的tablespaces来将表的存储文件存放在hdfs挂载点上。(我最初错误的弄成:修改gpinitsystem
初始化集群命令所需要的初始化配置里的MASTER_DIRECTORY
和DATA_DIRECTORY
参数了。而且还遇到了初始化过程中fd.c中durable_link_or_rename函数无法建立链接的错误)
Creating and Managing Tablespaces | Pivotal Greenplum Docs
CREATE TABLESPACE | Pivotal Greenplum Docs
hdfs dfs -mkdir -p /user/gpadmin
hdfs dfs -chown gpadmin:gpadmin /user/gpadmin
# hdfs dfs -chmod 777 /user/gpadmin
示例
CREATE TABLESPACE hdfs LOCATION '/mnt/hdfs/user/gpadmin'
ps: greenplum 6中取消了filesapce概念