三分薄地,认真耕耘

导航

 

因为ceph集群的服务器硬盘都是直通的,当我们发现有硬盘存储坏道需要更换硬盘,但是因为盘序可能不是连续的,无法定位服务器上那块硬盘是故障的,如果冒然测试可能把正常的硬盘拔出,得不偿失,简单记录一下定位故障硬盘的思路。

1、硬盘定位思路

适用于故障硬盘亚健康但未离线硬盘

1.1 找到故障硬盘


[root@a01r1n06 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  3.7T  0 disk
├─sda1   8:1    0    2M  0 part
├─sda2   8:2    0    1G  0 part /boot
├─sda3   8:3    0 62.5G  0 part [SWAP]
└─sda4   8:4    0  3.6T  0 part /
sdb      8:16   0 14.6T  0 disk /public/home/test

1.2 通过smartctl工具找到硬盘的SN号


[root@a01r1n06 ~]# smartctl --all /dev/sdad
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sdad failed: No such device
[root@a01r1n06 ~]# smartctl --all /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WUH721816ALE6L4
Serial Number:    2CJSSATJ  ##硬盘序列号
LU WWN Device Id: 5 000cca 2a1e6fb30
Add. Product Id:  202116
Firmware Version: PCGAW232
User Capacity:    16,000,900,661,248 bytes [16.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   Unknown(0x0ffc) (unknown minor revision code: 0x009c)
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jul  6 14:50:20 2021 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(  101) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
.....

1.3 通过阵列卡工具找到对应序列号对应的服务器的槽位号;

Storcli工具安装

工具安装移步 -> https://www.xxshell.com/2800.html

通过Storcli找到硬盘solt

筛选硬盘SN找到Slot;


./storcli64 /call /eall /sall show all |grep -5 -i WKD26RCS

1.4 点亮硬盘定位灯更换硬盘

通过./storcli64 /c0/e0/s23 start locate点亮定位灯

2、使用IPMI系统定位硬盘

因为服务器没有安装Storcli,其有IPMI系统,通过IPMI系统来定位硬盘

找到对应的硬盘序列号,点击【定位物理设备】即可

参考原微博:https://www.xxshell.com/2995.html

posted on 2021-07-06 15:28  平复心态  阅读(2372)  评论(0编辑  收藏  举报