背景
ceph集群的osd突然down,storcli看到磁盘是offline状态
排查步骤
| sudo storcli64 /c0/eall/sall show |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Failure |
| Description = Show Drive Information Failed. |
| |
| Detailed Status : |
| =============== |
| |
| -------------------------------- |
| Drive Status ErrCd ErrMsg |
| -------------------------------- |
| /c0/e0/s1 Success 0 - |
| /c0/e0/s2 Success 0 - |
| /c0/e0/s4 Success 0 - |
| /c0/e0/s5 Success 0 - |
| /c0/e0/s7 Success 0 - |
| /c0/e0/s8 Success 0 - |
| /c0/e0/s10 Success 0 - |
| /c0/e0/s11 Failure 46 - |
| /c0/e0/s14 Success 0 - |
| /c0/e0/s15 Success 0 - |
| -------------------------------- |
| |
| |
| |
| Drive Information : |
| ================= |
| |
| ---------------------------------------------------------------------------------- |
| EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type |
| ---------------------------------------------------------------------------------- |
| 0:1 7 Onln 1 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:2 14 Onln 2 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:4 13 Onln 3 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:5 4 Onln 4 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:7 3 Onln 5 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:8 8 Onln 6 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:10 9 Onln 7 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:11 11 Failed 8 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM U - |
| 0:14 1 Onln 0 138.766 GB SATA SSD N N 512B INTEL SSDSC2BB150G7 U - |
| 0:15 2 Onln 0 138.766 GB SATA SSD N N 512B INTEL SSDSC2BB150G7 U - |
| ---------------------------------------------------------------------------------- |
| |
| EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|DG=DriveGroup |
| DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare |
| UBad=Unconfigured Bad|Sntze=Sanitize|Onln=Online|Offln=Offline|Intf=Interface |
| Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info |
| SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign |
| UGUnsp=UGood Unsupported|UGShld=UGood shielded|HSPShld=Hotspare shielded |
| CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded |
| UBUnsp=UBad Unsupported|Rbld=Rebuild |
/c0/e0/s11 的错误码 46
| |
| |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = None |
| |
| |
| Virtual Drives : |
| ============== |
| |
| --------------------------------------------------------------- |
| DG/VD TYPE State Access Consist Cache Cac sCC Size Name |
| --------------------------------------------------------------- |
| 0/0 RAID1 Optl RW Yes NRWTD - ON 138.766 GB |
| 1/1 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 2/2 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 3/3 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 4/4 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 5/5 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 6/6 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 7/7 RAID0 Optl RW Yes NRWTD - ON 7.276 TB |
| 8/8 RAID0 OfLn RW No NRWTD - ON 7.276 TB |
| --------------------------------------------------------------- |
| |
| VD=Virtual Drive| DG=Drive Group|Rec=Recovery |
| Cac=CacheCade|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded |
| Optl=Optimal|dflt=Default|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady |
| B=Blocked|Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack |
| AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled |
| Check Consistency |
此时vd8已经offline
| |
| |
| seqNum: 0x0000510a |
| Time: Sun Feb 12 18:11:01 2023 |
| |
| Code: 0x00000143 |
| Class: 3 |
| Locale: 0x21 |
| Event Description: Controller cache pinned for missing or offline VD 08/8 |
| Event Data: |
| =========== |
| Target Id: 8 |
| |
| |
| seqNum: 0x0000510b |
| Time: Sun Feb 12 18:11:01 2023 |
| |
| Code: 0x000000fc |
| Class: 3 |
| Locale: 0x01 |
| Event Description: VD 08/8 is now OFFLINE |
| Event Data: |
| =========== |
| Target Id: 8 |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = None |
| |
| Events = GETEVENTS |
| |
| Controller Properties : |
| ===================== |
| |
| ------------------------------------ |
| Ctrl Status Method Value |
| ------------------------------------ |
| 0 Success handleSuboption Events |
| ------------------------------------ |
故障触发点: Controller cache pinned for missing or offline VD
原因
磁盘因未知原因连接断开,cache上还有未回刷完成的数据。
解决办法
| $ sudo storcli64 /c0 show preservedcache |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = None |
| |
| |
| -------------------- |
| VD Size State |
| -------------------- |
| 8 7.276 TB Offline |
| -------------------- |
| $ sudo storcli64 /c0/v8 delete preservedcache |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = Virtual Drive preserved Cache Data Cleared. |
| $ sudo storcli64 /c0/e0/s11 set online |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = Set Drive Online Succeeded. |
| $ sudo storcli64 /c0/eall/sall show |
| CLI Version = 007.2309.0000.0000 Sep 16, 2022 |
| Operating system = Linux 5.4.0-137-generic |
| Controller = 0 |
| Status = Success |
| Description = Show Drive Information Succeeded. |
| |
| |
| Drive Information : |
| ================= |
| |
| --------------------------------------------------------------------------------- |
| EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type |
| --------------------------------------------------------------------------------- |
| 0:1 7 Onln 1 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:2 14 Onln 2 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:4 13 Onln 3 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:5 4 Onln 4 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:7 3 Onln 5 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:8 8 Onln 6 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:10 9 Onln 7 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:11 11 Onln 8 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - |
| 0:14 1 Onln 0 138.766 GB SATA SSD N N 512B INTEL SSDSC2BB150G7 U - |
| 0:15 2 Onln 0 138.766 GB SATA SSD N N 512B INTEL SSDSC2BB150G7 U - |
| --------------------------------------------------------------------------------- |
| |
| EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|DG=DriveGroup |
| DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare |
| UBad=Unconfigured Bad|Sntze=Sanitize|Onln=Online|Offln=Offline|Intf=Interface |
| Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info |
| SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign |
| UGUnsp=UGood Unsupported|UGShld=UGood shielded|HSPShld=Hotspare shielded |
| CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded |
| UBUnsp=UBad Unsupported|Rbld=Rebuild |
| $ sudo systemctl reset-failed ceph-osd@71 |
| $ sudo systemctl restart ceph-osd@71 |
dell
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· AI与.NET技术实操系列(六):基于图像分类模型对图像进行分类