Ceph 知识摘录(Ceph对象存储网关中的索引工作原理)

Librados是一个非常出色的对象存储(库)但是它无法高效的列举对象。对象存储网关维护自有索引来提升列举对象的响应性能并维护了其他的一些元信息。
1、查看已存在的bucket信息

[root@ceph002 ~]# radosgw-admin bucket stats --bucket=ylytest
{
"bucket": "ylytest",
"pool": "default.rgw.buckets.data",
"index_pool": "default.rgw.buckets.index",
"id": "dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25",
"marker": "dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25",
"owner": "user001",
"ver": "0#9",
"master_ver": "0#0",
"mtime": "2018-09-27 14:59:23.104848",
"max_marker": "0#",
"usage": {
"rgw.main": {
"size_kb": 1,
"size_kb_actual": 4,
"num_objects": 1
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}
bucket的对象列表存储在一个单独的rados对象中。这个对象的名字是.dir.加上bucket id。索引对象存储在一个名为.rgw.buckets.index的独立存储池中。所以本例中,ylytest的索引应该是.dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25。

2、找到bucket索引
[root@ceph002 ~]# rados -p default.rgw.buckets.index ls - | grep "114118.25"
.dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25 ####default.rgw.buckets.index存储池返回的索引对象


索引信息实际上存储在Ceph的键/值数据库中。每个OSD都有一个本地leveldb键/值数据库。因此索引对象实际上只是一个占位符,Ceph通过它找到那个包含索引信息的OSD键/值数据库。

3、查看键/值数据库的内容,先来看看索引键(索引建就是对象名)
[root@ceph002 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25
myPublicKey.pem


4、查看键/值数据库的内容,先来看看索引值
[root@ceph002 ~]# rados -p default.rgw.buckets.index listomapvals .dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25
myPublicKey.pem
value (222 bytes) :
00000000 08 03 d8 00 00 00 0f 00 00 00 6d 79 50 75 62 6c |..........myPubl|
00000010 69 63 4b 65 79 2e 70 65 6d 29 01 00 00 00 00 00 |icKey.pem)......|
00000020 00 01 04 03 71 00 00 00 01 c3 01 00 00 00 00 00 |....q...........|
00000030 00 e5 7f ac 5b 7b b0 41 2d 21 00 00 00 35 38 61 |....[{.A-!...58a|
00000040 64 38 36 33 39 35 36 39 31 34 66 64 33 35 37 66 |d863956914fd357f|
00000050 36 39 64 30 39 30 62 35 63 65 32 66 38 00 07 00 |69d090b5ce2f8...|
00000060 00 00 75 73 65 72 30 30 31 07 00 00 00 75 73 65 |..user001....use|
00000070 72 30 30 31 19 00 00 00 61 70 70 6c 69 63 61 74 |r001....applicat|
00000080 69 6f 6e 2f 6f 63 74 65 74 2d 73 74 72 65 61 6d |ion/octet-stream|
00000090 00 c3 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000a0 00 01 01 04 00 00 00 0b 82 29 01 08 20 00 00 00 |.........).. ...|
000000b0 5f 33 49 36 31 75 67 66 77 32 5a 56 5f 75 54 55 |_3I61ugfw2ZV_uTU|
000000c0 39 5f 48 5f 44 53 54 6a 59 53 45 6b 2d 68 57 35 |9_H_DSTjYSEk-hW5|
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |..............|
000000de


索引占222字节,从上面的十六进制转储信息可以看到一些信息片段。上面的转储信息与radosgw-admin输出的对象元信息对比,你就会知道索引中存储的是什么。
[root@ceph002 ~]# radosgw-admin bucket list --bucket=ylytest
[
{
"name": "myPublicKey.pem",
"instance": "",
"namespace": "",
"owner": "user001",
"owner_display_name": "user001",
"size": 451,
"mtime": "2018-09-27 06:59:49.759279Z",
"etag": "58ad863956914fd357f69d090b5ce2f8\u0000",
"content_type": "application\/octet-stream\u0000",
"tag": "_3I61ugfw2ZV_uTU9_H_DSTjYSEk-hW5",
"flags": 0
}
]

可以确定索引包含如下信息:
name
owner owner既是键也是值。出现数据损坏时能通过扫描索引值来恢复索引键。
owner_display_name 为了兼容S3
etag 对象的MD5值,也是为了兼容S3;如果每次创建一个对象就要计算MD5,这将会损害写性能。
tag

5、找到键值数据库(计算出包含索引对象的OSD)
[root@ceph002 ~]# ceph osd map default.rgw.buckets.index default.rgw.buckets.index .dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25
osdmap e102 pool 'default.rgw.buckets.index' (10) object '.dir.dbb0bfc1-db62-4f67-981c-793b5d7f30cc.114118.25/default.rgw.buckets.index' -> pg 10.249bcd4f (10.7) -> up ([5,4,6], p5) acting ([5,4,6], p5)
键值数据库在OSD5及4、6上,其中5是主OSD(第一个)。

6、包含索引的键值数据库leveldb
[root@ceph002 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.44513 root default
-2 0.14838 host ceph002
0 0.03709 osd.0 up 1.00000 1.00000
3 0.03709 osd.3 up 1.00000 1.00000
6 0.03709 osd.6 up 1.00000 1.00000
8 0.03709 osd.8 up 1.00000 1.00000
-3 0.14838 host ceph003
1 0.03709 osd.1 up 1.00000 1.00000
4 0.03709 osd.4 up 1.00000 1.00000
7 0.03709 osd.7 up 1.00000 1.00000
10 0.03709 osd.10 up 1.00000 1.00000
-4 0.14838 host ceph004
2 0.03709 osd.2 up 1.00000 1.00000
5 0.03709 osd.5 up 1.00000 1.00000
9 0.03709 osd.9 up 1.00000 1.00000
11 0.03709 osd.11 up 1.00000 1.00000

[root@ceph004 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 39.5G 0 part
├─bclinux-root 253:0 0 35.5G 0 lvm /
└─bclinux-swap 253:1 0 4G 0 lvm [SWAP]
sdb 8:16 0 40G 0 disk
├─sdb1 8:17 0 38G 0 part /var/lib/ceph/osd/ceph-2
└─sdb2 8:18 0 2G 0 part
sdc 8:32 0 40G 0 disk
├─sdc1 8:33 0 38G 0 part /var/lib/ceph/osd/ceph-5
└─sdc2 8:34 0 2G 0 part
sdd 8:48 0 40G 0 disk
├─sdd1 8:49 0 38G 0 part /var/lib/ceph/osd/ceph-9
└─sdd2 8:50 0 2G 0 part
sde 8:64 0 40G 0 disk
├─sde1 8:65 0 38G 0 part /var/lib/ceph/osd/ceph-11
└─sde2 8:66 0 2G 0 part
[root@ceph004 ~]# cd /var/lib/ceph/osd/ceph-5
[root@ceph004 ceph-5]# clear
[root@ceph004 ceph-5]# ls
activate.monmap ceph_fsid fsid journal_uuid magic store_version systemd whoami
active current journal keyring ready superblock type
[root@ceph004 ceph-5]# cd current/omap/
[root@ceph004 omap]# ls
000016.sst 000018.sst 000021.sst 000022.log 000023.sst CURRENT LOCK LOG MANIFEST-000017
[root@ceph004 omap]# ls -lh
total 9.9M
-rw-r--r-- 1 ceph ceph 337K Sep 27 06:29 000016.sst
-rw-r--r-- 1 ceph ceph 479K Sep 27 09:38 000018.sst
-rw-r--r-- 1 ceph ceph 1.3M Sep 27 15:21 000021.sst
-rw-r--r-- 1 ceph ceph 7.0M Sep 28 11:13 000022.log
-rw-r--r-- 1 ceph ceph 1.3M Sep 28 01:11 000023.sst
-rw-r--r-- 1 ceph ceph 16 Sep 27 09:38 CURRENT
-rw-r--r-- 1 ceph ceph 0 Sep 26 09:26 LOCK
-rw-r--r-- 1 ceph ceph 57 Sep 26 09:26 LOG
-rw-r--r-- 1 ceph ceph 64K Sep 28 01:11 MANIFEST-000017

 

posted @ 2018-09-29 09:14  一如既往_wc  阅读(377)  评论(0编辑  收藏  举报