ceph pg incomplete修复
1.查看incomplete的pgs
ceph health detail | grep incomplete
pg 2.ef is incomplete, acting [10,9,4]
pg 2.a9 is incomplete, acting [10,4,3]
pg 2.a7 is incomplete, acting [10,0,4]
pg 3.99 is incomplete, acting [10,1,4]
2.保存pg query
ceph pg 2.ef query > 2.ef.query
3.查看pg大小(acting 的osd上)
du -sh /var/lib/ceph/osd/ceph-x/pg.id_head
如果acting上的osd中pg大小全为0,则需要查找ceph pg 2.ef query中probing_osds
4.从含有完整数据的osd上导出有问题的pg(数据完整只是人工认为比较完整,也就是数据量最大的,下面假设osd10上的数据是完整的)
ceph_objectstore_tool --op export --pgid <pg.id> --data-path /mnt/old --journal-path /mnt/old/journal --file <pg.id>.export
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10/ --journal-path /var/lib/ceph/osd/ceph-10/journal --pgid 2.ef --op export --file 2.ef.export
5.设置标记
ceph osd set noout
ceph osd set pause
ceph osd set noout
ceph osd set norebalance
ceph osd set norecover
6.把相关osd 关闭
7.移出有问osd上的目录,比如osd 9上2.ef 这个pg目录为0(所有有问题的目录全部移出)
mv /var/lib/ceph/osd/ceph-9/current/2.ef_* /home/admin/
8.导入第四步导出的pg
scp 2.88.export node-*/root/
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --journal-path /var/lib/ceph/osd/ceph-4/journal --pgid 2.88 --op import --file 2.88.export
9.标记完成(主osd上)
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10/ --journal-path /var/lib/ceph/osd/ceph-10/journal --pgid 2.ef --op mark-complete
10.启动osd,去除标志