nfs客户端异常,导致 df -h一直卡主,NFSD: client xx.xx.xx.xx testing state ID with incorrect client ID
Mar 8 16:10:00 HKT-SW6-E5-STG-1-55 kernel: [19956403.500413] nfsd4_validate_stateid: 21 callbacks suppressed
Mar 8 16:10:00 HKT-SW6-E5-STG-1-55 kernel: [19956403.500415] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:00 HKT-SW6-E5-STG-1-55 kernel: [19956403.500819] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:00 HKT-SW6-E5-STG-1-55 kernel: [19956403.708314] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:00 HKT-SW6-E5-STG-1-55 kernel: [19956403.915963] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.123970] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.331941] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.332545] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.332893] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.333147] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
Mar 8 16:10:01 HKT-SW6-E5-STG-1-55 kernel: [19956404.333436] NFSD: client XX.XX.XX.XX testing state ID with incorrect client ID
以上为 /var/log/syslog系统日志的报错,通过dmesg -T这个命令也可以发现这个报错
字面意思为 客户端拿到的状态id异常,
# 这是gpt给的答案
根据你提供的信息,这似乎是关于 NFSD(Network File System 守护程序)的日志信息。这条日志表明客户端正在使用错误的客户端 ID 进行状态 ID 测试。这可能意味着客户端与服务器之间存在某种通信或身份验证问题。你可以尝试检查客户端和服务器之间的配置,确保它们使用正确的客户端 ID 进行通信。可能需要进一步调查以解决这个问题。
解决报错的办法参考别的博客,都没怎么说明白,但是问题是解决了的。
我的操作系统是ubuntu2004,需要修改nfs服务端的配置文件,ubuntu的配置文件为 /etc/default/nfs-kernel-server
- 修改nfs配置文件,/etc/default/nfs-kernel-server(这是ubuntu服务的默认位置,centos的可能不一样,参考的博客好像都是centos的位置,这里写上,供大家参考 /etc/sysconfig/nfs)
# 需要在末尾添加这行
RPCNFSDARGS="-N 4"
2.然后重启nfs服务
sudo systemctl restart nfs-kernel-server.service
3.客户端取消挂载重新挂载
sudo fusermount -uz /xx-xx
sudo mount ... 或者 sudo mount -a
重新挂载之后问题解决了,但是还是没有搞明白是什么原因,但是有个命令可以看出是因为nfs通信版本的问题导致的,
nfsstat -s
我比较了两台机器
故障机器
Server rpc stats:
calls badcalls badfmt badauth badclnt
219566150 0 0 0 0
Server nfs v3:
null getattr setattr lookup access
3 100% 0 0% 0 0% 0 0% 0 0%
readlink read write create mkdir
0 0% 0 0% 0 0% 0 0% 0 0%
symlink mknod remove rmdir rename
0 0% 0 0% 0 0% 0 0% 0 0%
link readdir readdirplus fsstat fsinfo
0 0% 0 0% 0 0% 0 0% 0 0%
pathconf commit
0 0% 0 0%
Server nfs v4:
null compound
5 0% 219566814 99%
Server nfs v4 operations:
op0-unused op1-unused op2-future access close
0 0% 0 0% 0 0% 25619252 3% 23241372 3%
commit create delegpurge delegreturn getattr
0 0% 0 0% 0 0% 4646785 0% 89930392 11%
getfh link lock lockt locku
13494359 1% 0 0% 0 0% 0 0% 0 0%
lookup lookup_root nverify open openattr
13696471 1% 0 0% 0 0% 23319996 3% 0 0%
open_conf open_dgrd putfh putpubfh putrootfh
0 0% 0 0% 219479982 28% 0 0% 43 0%
read readdir readlink remove rename
124653549 16% 33664 0% 0 0% 0 0% 0 0%
renew restorefh savefh secinfo setattr
0 0% 0 0% 0 0% 0 0% 0 0%
setcltid setcltidconf verify write rellockowner
0 0% 0 0% 0 0% 0 0% 0 0%
bc_ctl bind_conn exchange_id create_ses destroy_ses
0 0% 37 0% 34 0% 57 0% 30 0%
free_stateid getdirdeleg getdevinfo getdevlist layoutcommit
0 0% 0 0% 0 0% 0 0% 0 0%
layoutget layoutreturn secinfononam sequence set_ssv
0 0% 0 0% 9 0% 219569577 28% 0 0%
test_stateid want_deleg destroy_clid reclaim_comp allocate
77867 0% 0 0% 4 0% 31 0% 0 0%
copy copy_notify deallocate ioadvise layouterror
0 0% 0 0% 0 0% 0 0% 0 0%
layoutstats offloadcancel offloadstatus readplus seek
0 0% 0 0% 0 0% 0 0% 0 0%
write_same
0 0%
未报错机器
Server rpc stats:
calls badcalls badfmt badauth badclnt
219511732 0 0 0 0
Server nfs v4:
null compound
0 0% 219512233100%
Server nfs v4 operations:
op0-unused op1-unused op2-future access close
0 0% 0 0% 0 0% 25553455 3% 23254275 3%
commit create delegpurge delegreturn getattr
0 0% 0 0% 0 0% 4649957 0% 89986873 11%
getfh link lock lockt locku
14525315 1% 0 0% 0 0% 0 0% 0 0%
lookup lookup_root nverify open openattr
14727437 1% 0 0% 0 0% 23251345 3% 0 0%
open_conf open_dgrd putfh putpubfh putrootfh
0 0% 0 0% 219506748 28% 0 0% 1 0%
read readdir readlink remove rename
124621979 16% 33706 0% 0 0% 0 0% 0 0%
renew restorefh savefh secinfo setattr
0 0% 0 0% 0 0% 0 0% 0 0%
setcltid setcltidconf verify write rellockowner
0 0% 0 0% 0 0% 0 0% 0 0%
bc_ctl bind_conn exchange_id create_ses destroy_ses
0 0% 0 0% 1 0% 2 0% 1 0%
free_stateid getdirdeleg getdevinfo getdevlist layoutcommit
0 0% 0 0% 0 0% 0 0% 0 0%
layoutget layoutreturn secinfononam sequence set_ssv
0 0% 0 0% 0 0% 219516707 28% 0 0%
test_stateid want_deleg destroy_clid reclaim_comp allocate
0 0% 0 0% 0 0% 1 0% 0 0%
copy copy_notify deallocate ioadvise layouterror
0 0% 0 0% 0 0% 0 0% 0 0%
layoutstats offloadcancel offloadstatus readplus seek
0 0% 0 0% 0 0% 0 0% 0 0%
write_same
0 0%
明显可以看到,故障机器多了v3的通信数据包详情,添加配置之后,这个显示也没有变化,但是问题解决了,如果有哪位朋友知道具体原因,欢迎评论区讨论。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?