使用s3fs-fuse挂载minio文件时无法删除问题排查过程
使用s3fs-fuse挂载minio文件时无法删除问题排查过程
结论:部分场景无法满足,具体问题详见正文
1. 部署minio
docker run -p 9000:9100 -p 9090:9190 --name minio -v /opt/minio/data:/data -e "MINIO_ROOT_USER=admin" -e "MINIO_ROOT_PASSWORD=minio123.com" quay.io/minio/minio server /data --console-address ":9090"
登录minio页面并创建桶
2. 安装s3fs-fuse
直接安装或者下载源码编译安装:https://github.com/s3fs-fuse/s3fs-fuse
启动s3fs服务挂载minio桶到本地,如:
/usr/bin/s3fs -d data /data -f -o url=http://191.168.3.132:9100,passwd_file=/etc/passwd-s3fs,endpoint
=us-east-1,allow_other,use_cache=/tmp,max_stat_cache_size=1000,stat_cache_expire=900,retries=5,connect_timeout=10,use_pa
th_request_style
其中/etc/passwd-s3fs为用户密码,格式为:
用户名:密码
3. 测试上传下载
使用touch、echo、rm、cp等均可以,查看minio UI可以看见上传的文件,看似一切正常。
4. 发现文件夹无法删除问题
验证过程中发现先解压后删除长时间等待后删除报错,复现步骤:
LAPTOP-TC4A0SCV:/data/test4#tar xzf s3fs-fuse-1.93.tar.gz
LAPTOP-TC4A0SCV:/data/test4# rm -rf s3fs-fuse-1.93/test/
rm: can't remove 's3fs-fuse-1.93/test': I/O error
删除文件夹无法删除掉,但是文件夹下面的文件均已经被删除,就剩下无法删除文件夹本身了,但是通过minio UI管理界面可以删除,同时再删除本地文件夹。
5. 确认是否是s3fs的问题
由于s3fs-fuse工具是C++编写的二进制文件,打算利用Wireshark工具去抓包看看,抓包删除命令请求如下:
可以看出删除bin目录时出现500错误,并且删除了5次(客户端配置重试次数),查看s3fs的日志如下:
2024-01-04T10:24:59.217Z [INF] curl_util.cpp:url_to_host(334): url is http://191.168.3.132:9100
2024-01-04T10:24:59.220Z [INF] curl.cpp:RequestPerform(2520): HTTP response code 204
2024-01-04T10:24:59.220Z [INF] cache.cpp:DelStat(596): delete stat cache entry[path=/test3/spark-3.5.0-bin-hadoop3/yarn/spark-3.5.0-yarn-shuffle.jar]
2024-01-04T10:24:59.220Z [INF] cache.cpp:DelSymlink(754): delete symbolic link cache entry[path=/test3/spark-3.5.0-bin-hadoop3/yarn/spark-3.5.0-yarn-shuffle.jar]
2024-01-04T10:24:59.220Z [INF] fdcache.cpp:DeleteCacheFile(139): [path=/test3/spark-3.5.0-bin-hadoop3/yarn/spark-3.5.0-yarn-shuffle.jar]
2024-01-04T10:24:59.228Z [INF] s3fs.cpp:s3fs_rmdir(1358): [path=/test3/spark-3.5.0-bin-hadoop3/yarn]
2024-01-04T10:24:59.229Z [INF] s3fs.cpp:list_bucket(3434): [path=/test3/spark-3.5.0-bin-hadoop3/yarn]
2024-01-04T10:24:59.229Z [INF] curl.cpp:ListBucketRequest(3749): [tpath=/test3/spark-3.5.0-bin-hadoop3/yarn]
2024-01-04T10:24:59.229Z [INF] curl_util.cpp:prepare_url(257): URL is http://191.168.3.132:9100/data?delimiter=/&max-keys=2&prefix=test3/spark-3.5.0-bin-hadoop3/yarn/
2024-01-04T10:24:59.229Z [INF] curl_util.cpp:prepare_url(290): URL changed is http://191.168.3.132:9100/data/?delimiter=/&max-keys=2&prefix=test3/spark-3.5.0-bin-hadoop3/yarn/
2024-01-04T10:24:59.229Z [INF] curl.cpp:insertV4Headers(2892): computing signature [GET] [/] [delimiter=/&max-keys=2&prefix=test3/spark-3.5.0-bin-hadoop3/yarn/] []
2024-01-04T10:24:59.229Z [INF] curl_util.cpp:url_to_host(334): url is http://191.168.3.132:9100
2024-01-04T10:24:59.231Z [INF] curl.cpp:RequestPerform(2520): HTTP response code 200
2024-01-04T10:24:59.232Z [INF] curl.cpp:DeleteRequest(2971): [tpath=/test3/spark-3.5.0-bin-hadoop3/yarn/]
2024-01-04T10:24:59.232Z [INF] curl_util.cpp:prepare_url(257): URL is http://191.168.3.132:9100/data/test3/spark-3.5.0-bin-hadoop3/yarn/
2024-01-04T10:24:59.232Z [INF] curl_util.cpp:prepare_url(290): URL changed is http://191.168.3.132:9100/data/test3/spark-3.5.0-bin-hadoop3/yarn/
2024-01-04T10:24:59.232Z [INF] curl.cpp:insertV4Headers(2892): computing signature [DELETE] [/test3/spark-3.5.0-bin-hadoop3/yarn/] [] []
2024-01-04T10:24:59.232Z [INF] curl_util.cpp:url_to_host(334): url is http://191.168.3.132:9100
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InternalError</Code><Message>We encountered an internal error, please try again.: cause(file is corrupted)</Message><Key>test3/spark-3.5.0-bin-hadoop3/yarn/</Key><BucketName>data</BucketName><Resource>/data/test3/spark-3.5.0-bin-hadoop3/yarn/</Resource><Region>us-east-1</Region><RequestId>17A71DFA043AB078</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>2024-01-04T10:24:59.235Z [INF] curl.cpp:RequestPerform(2590): HTTP response code 500 was returned, slowing down
2024-01-04T10:25:01.295Z [INF] curl.cpp:RequestPerform(2719): ### retrying...
查看minio的日志如下:
API: DeleteObject(bucket=data, object=test3/spark-3.5.0-bin-hadoop3/yarn/)
Time: 10:26:23 UTC 01/04/2024
DeploymentID: 97c76f90-5749-4676-98d1-14441f0a340c
RequestID: 17A71E0352B75EB7
RemoteHost: 191.168.4.38
Host: 191.168.3.132:9100
UserAgent: s3fs/1.93 (commit hash unknown; OpenSSL)
Error: file is corrupted (cmd.StorageErr)
5: internal/logger/logger.go:259:logger.LogIf()
4: cmd/api-errors.go:2362:cmd.toAPIErrorCode()
3: cmd/api-errors.go:2387:cmd.toAPIError()
2: cmd/object-handlers.go:2720:cmd.objectAPIHandlers.DeleteObjectHandler()
1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
看样子是删除的文件夹无法被minio删除。抓包删除错误相应如下:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>InternalError</Code>
<Message>We encountered an internal error, please try again.: cause(file is corrupted)</Message>
<Key>test4/s3fs-fuse-1.93/test/</Key>
<BucketName>data</BucketName>
<Resource>/data/test4/s3fs-fuse-1.93/test/</Resource>
<Region>us-east-1</Region>
<RequestId>17A7550F007E8D1B</RequestId>
<HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
</Error>
我在思考,是否是s3fs客户端在请求的时候出现参数错误或者API调用错误导致,需要确认下,是否可以排除s3fs的问题。
6. 抓包s3fs-fuse的网络请求
抓包分为两部分,一部分是对解压请求进行抓包,另外一部分是对删除请求进行抓包。
解压过程如下:
通过观察解压操作,发现刚开始解压的时候调用了一次PUT请求去创建文件夹,请求如下:
PUT /data/test5/ HTTP/1.1
Host: 191.168.3.132:9100
User-Agent: s3fs/1.93 (commit hash unknown; OpenSSL)
Accept: */*
Authorization: AWS4-HMAC-SHA256 Credential=admin/20240105/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-meta-atime;x-amz-meta-ctime;x-amz-meta-gid;x-amz-meta-mode;x-amz-meta-mtime;x-amz-meta-uid, Signature=37c32c17df550e31c074e80aebddc2ef4c44d67ce1df0574c312b53733bc85cb
Content-Type: application/x-directory
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date: 20240105T083218Z
x-amz-meta-atime: 1704443538.826241406
x-amz-meta-ctime: 1704443538.826241406
x-amz-meta-gid: 0
x-amz-meta-mode: 493
x-amz-meta-mtime: 1704443538.826241406
x-amz-meta-uid: 0
Content-Length: 0
解压完成后再次调用了一次PUT请求创建文件夹,请求如下:
PUT /data/test5/ HTTP/1.1
Host: 191.168.3.132:9100
User-Agent: s3fs/1.93 (commit hash unknown; OpenSSL)
Accept: */*
Authorization: AWS4-HMAC-SHA256 Credential=admin/20240105/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-meta-atime;x-amz-meta-ctime;x-amz-meta-gid;x-amz-meta-mode;x-amz-meta-mtime;x-amz-meta-uid;x-amz-metadata-directive, Signature=1a463b97b0c171419791e6d619930618c80e4c37648937f7defaf129da64899d
Content-Type: application/x-directory
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-copy-source: /data/test5/
x-amz-date: 20240105T065914Z
x-amz-meta-atime: 1704437377.849501720
x-amz-meta-ctime: 1704437954.440296576
x-amz-meta-gid: 0
x-amz-meta-mode: 16877
x-amz-meta-mtime: 1704437377.849501720
x-amz-meta-uid: 0
x-amz-metadata-directive: REPLACE
Content-Length: 0
用过查阅S3 API文档,PUT请求为创建对象,其中第二次PUT请求中传递了x-amz-copy-source和x-amz-metadata-directive,对应是复制对象请求。但是为什么解压过程会触发两次请求呢,百思不得其解。
7. 删除报错问题复现
为了找到原因,硬着头皮去查看s3fs-fuse的源码,通过搜索PUT请求和日志发现是chmod命令触发的,也就是解压过程会先调用创建和授权命令,为了验证,进行如下操作:
LAPTOP-TC4A0SCV:/data/test4# ll
total 278
-rw-r----- 1 root root 284353 Jan 5 09:39 s3fs-fuse-1.93.tar.gz
LAPTOP-TC4A0SCV:/data/test4# mkdir pp
LAPTOP-TC4A0SCV:/data/test4# ll
total 278
drwxr-xr-x 1 root root 0 Jan 5 14:20 pp
-rw-r----- 1 root root 284353 Jan 5 09:39 s3fs-fuse-1.93.tar.gz
LAPTOP-TC4A0SCV:/data/test4# chmod 777 pp
LAPTOP-TC4A0SCV:/data/test4# ll
total 278
drwxrwxrwx 1 root root 0 Jan 5 14:20 pp
-rw-r----- 1 root root 284353 Jan 5 09:39 s3fs-fuse-1.93.tar.gz
LAPTOP-TC4A0SCV:/data/test4# ll pp
total 0
LAPTOP-TC4A0SCV:/data/test4# rm -rf pp
rm: can't remove 'pp': I/O error
LAPTOP-TC4A0SCV:/data/test4#
果不其然,复现了该问题,创建文件夹后再进行授权的话,会导致文件夹无法被删除。
8. 问题分析排查未果
通过分析mkdir和chmod发出去的PUT请求,唯一区别是请求头中主要多了x-amz-copy-source、x-amz-metadata-directive等,这个标识着到底是新创建对象还是复制或者更新等,其中x-amz-metadata-directive: REPLACE表示的是更新操作。
如果说s3fs-fuse实现的是标准的s3 api,那么是不是可以这样说,minio在实现s3 api的时候没有完全实现呢?通过查看minio的源码,找到了api-router.go的部分代码如下:
// GetObject
router.Methods(http.MethodGet).Path("/{object:.+}").HandlerFunc(
collectAPIStats("getobject", maxClients(gz(httpTraceHdrs(api.GetObjectHandler)))))
// CopyObject
router.Methods(http.MethodPut).Path("/{object:.+}").HeadersRegexp(xhttp.AmzCopySource, ".*?(\\/|%2F).*?").HandlerFunc(
collectAPIStats("copyobject", maxClients(gz(httpTraceAll(api.CopyObjectHandler)))))
// PutObjectRetention
router.Methods(http.MethodPut).Path("/{object:.+}").HandlerFunc(
collectAPIStats("putobjectretention", maxClients(gz(httpTraceAll(api.PutObjectRetentionHandler))))).Queries("retention", "")
// PutObjectLegalHold
router.Methods(http.MethodPut).Path("/{object:.+}").HandlerFunc(
collectAPIStats("putobjectlegalhold", maxClients(gz(httpTraceAll(api.PutObjectLegalHoldHandler))))).Queries("legal-hold", "")
// PutObject with auto-extract support for zip
router.Methods(http.MethodPut).Path("/{object:.+}").HeadersRegexp(xhttp.AmzSnowballExtract, "true").HandlerFunc(
collectAPIStats("putobject", maxClients(gz(httpTraceHdrs(api.PutObjectExtractHandler)))))
// PutObject
router.Methods(http.MethodPut).Path("/{object:.+}").HandlerFunc(
collectAPIStats("putobject", maxClients(gz(httpTraceHdrs(api.PutObjectHandler)))))
// DeleteObject
router.Methods(http.MethodDelete).Path("/{object:.+}").HandlerFunc(
collectAPIStats("deleteobject", maxClients(gz(httpTraceAll(api.DeleteObjectHandler)))))
其中CopyObject对应的PUT方法和匹配请求头中包含xhttp.AmzCopySource的请求,PutObject对应的是PUT方法请求。查看api.CopyObjectHandler源码,看了一会啥也不看出来,就是不懂,遂放弃。
9. 最后的挣扎
通过自己无法解决时,是否可以求助对象项目的支持呢,随即分别向minio和s3fs-fuse提出了issue,希望可以有解决思路。issue分别如下:
- #2395 :https://github.com/s3fs-fuse/s3fs-fuse/issues/2395
- #18739 : https://github.com/minio/minio/issues/18739
通过等待后,minio给出了回复,先是让找s3fs-fuse项目,然后说是不推荐传统的挂载使用方式,建议社区远离它。回复如下:
We really discourage using filesystem on top of object storage. We wish community embraces and moves away from legacy.
然s3fs-fuse提问没有回复,但是从我的判断来看,出现问题的是minio的问题,可能是实现s3 api接口的时候出现了某种问题,奈何看不懂minio的源码。对于minio和s3 api的关系,minio官方描述如下:
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
既然minio兼容s3 api,那么出现问题的话就是minio对s3 api兼容性实现存在问题。限于本人能力,此问题分析到这里。
总的来说,这种方案满足一般的文件管理,比如上传、下载、删除、读取等等,要达到和普通磁盘一样的操作等,还是无法满足。
- 博客:https://www.cnblogs.com/flowerbirds
- Gituhub:https://github.com/FlowerBirds