三分薄地,认真耕耘

导航

< 2025年1月 >
29 30 31 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 1
2 3 4 5 6 7 8
统计
 

因为ceph集群的服务器硬盘都是直通的,当我们发现有硬盘存储坏道需要更换硬盘,但是因为盘序可能不是连续的,无法定位服务器上那块硬盘是故障的,如果冒然测试可能把正常的硬盘拔出,得不偿失,简单记录一下定位故障硬盘的思路。

1、硬盘定位思路

适用于故障硬盘亚健康但未离线硬盘

1.1 找到故障硬盘


[root@a01r1n06 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  3.7T  0 disk
├─sda1   8:1    0    2M  0 part
├─sda2   8:2    0    1G  0 part /boot
├─sda3   8:3    0 62.5G  0 part [SWAP]
└─sda4   8:4    0  3.6T  0 part /
sdb      8:16   0 14.6T  0 disk /public/home/test

1.2 通过smartctl工具找到硬盘的SN号


[root@a01r1n06 ~]# smartctl --all /dev/sdad
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sdad failed: No such device
[root@a01r1n06 ~]# smartctl --all /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WUH721816ALE6L4
Serial Number:    2CJSSATJ  ##硬盘序列号
LU WWN Device Id: 5 000cca 2a1e6fb30
Add. Product Id:  202116
Firmware Version: PCGAW232
User Capacity:    16,000,900,661,248 bytes [16.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   Unknown(0x0ffc) (unknown minor revision code: 0x009c)
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jul  6 14:50:20 2021 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(  101) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
.....

1.3 通过阵列卡工具找到对应序列号对应的服务器的槽位号;

Storcli工具安装

工具安装移步 -> https://www.xxshell.com/2800.html

通过Storcli找到硬盘solt

筛选硬盘SN找到Slot;


./storcli64 /call /eall /sall show all |grep -5 -i WKD26RCS

1.4 点亮硬盘定位灯更换硬盘

通过./storcli64 /c0/e0/s23 start locate点亮定位灯

2、使用IPMI系统定位硬盘

因为服务器没有安装Storcli,其有IPMI系统,通过IPMI系统来定位硬盘

找到对应的硬盘序列号,点击【定位物理设备】即可

参考原微博:https://www.xxshell.com/2995.html

posted on   平复心态  阅读(2796)  评论(0编辑  收藏  举报
编辑推荐:
· 敏捷开发:如何高效开每日站会
· 为什么 .NET8线程池 容易引发线程饥饿
· golang自带的死锁检测并非银弹
· 如何做好软件架构师
· 记录一次线上服务OOM排查
阅读排行:
· 为什么 .NET8线程池 容易引发线程饥饿
· 场景题:假设有40亿QQ号,但只有1G内存,如何实现去重?
· 在 .NET 中使用 Tesseract 识别图片文字
· Bolt.new 30秒做了一个网站,还能自动部署,难道要吊打 Cursor?
· C#/.NET/.NET Core技术前沿周刊 | 第 20 期(2025年1.1-1.5)
 
点击右上角即可分享
微信分享提示