1. 程式人生 > >RGW 資料模型設計

RGW 資料模型設計

ceph是一個開源的統一分散式儲存系統,RADOS是提供了底層基礎物件儲存服務,它由mon和osd組成。RADOS主要操作的物件有pool,object和object的xattr、omap。
rados gateway是基於RADOS的一個物件儲存服務,對外提供了S3、swift和RESTful api介面,對外提供儲存服務。
bucket和object(key)是rados gateway構造的兩個主要的資料模型,本文主要是介紹gateway中bucket和key的設計。
bucket:是一個存放key的容器,也可以理解為一個目錄,但是bucket不可以巢狀。
key:也可以稱作物件,它代表這上傳到儲存服務中的一份完整資料。

接下來通過一組實際操作來介紹bucket和key的設計。
rados gateway中也構造了account、zone、region等資料結構,但不是本文介紹重點,這裡就不做詳細介紹。
要想在gateway中建立bucket,上傳資料,首先要有建立一個使用者拿到一對認證金鑰(access_key、secret_key)。

gateway user

建立使用者:

# radosgw-admin user create --uid=yankun --display-name=yankun
{
    "user_id": "yankun",
    "display_name"
: "yankun", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "yankun", "access_key": "FLNOEBKYFT7R0VA2ZH03", "secret_key": "2a3O5epEHpnRw26Rb6tukdYJz6nQes6hCoO5fIM3" } ], "swift_keys"
: [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "temp_url_keys": [] }

建立使用者之後就會獲得access_key和secret_key,然後就使用s3cmd這個客戶端來建立bucket,並上傳資料。
在s3cmd的配置檔案中,配置access_key、secret_key和服務地址。

RGW中的bucket

建立bucket

# s3cmd mb s3://where_is_my_bucket
# s3cmd mb s3://where_is_my_bucket1

檢視bucket資訊

# radosgw-admin bucket stats --bucket=where_is_my_bucket
{
    "bucket": "where_is_my_bucket",
    "pool": ".rgw.buckets",
    "index_pool": ".rgw.buckets.index",
    "id": "default.5762326.25",
    "marker": "default.5762326.25",
    "owner": "yankun",
    "ver": "0#9",
    "master_ver": "0#0",
    "mtime": "2017-09-12 10:16:47.000000",
    "max_marker": "0#",
    "usage": {
        "rgw.main": {
            "size_kb": 4105961,
            "size_kb_actual": 4105964,
            "num_objects": 3
        }
    },
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    }
}

bucket物件
使用者建立的bucket都會儲存在.users.uid pool 中物件yankun.buckets的omap中,key是bucket名字value是bucket的資訊。.users.id中儲存使用者的使用者名稱{username}和{username}.buckets

# rados -p .users.uid listomapkeys yankun.buckets
where_is_my_bucket
where_is_my_bucket1
# rados -p .users.uid getomapval yankun.buckets  where_is_my_bucket  binary_where_is_my_bucket                  
Writing to binary_where_is_my_bucket

# ceph-dencoder type RGWBucketEnt import binary_where_is_my_bucket decode dump_json
{
    "bucket": {
        "name": "where_is_my_bucket",
        "pool": ".rgw.buckets",
        "data_extra_pool": ".rgw.buckets.extra",
        "index_pool": ".rgw.buckets.index",
        "marker": "default.5762326.25",
        "bucket_id": "default.5762326.25"
    },
    "size": 4204504056,
    "size_rounded": 4204507136,
    "mtime": 1505182607,
    "count": 3
}

bucket在rados中的物件
每個bucket,rados都會為其在.rgw.buckets.index pool中建立一個物件,其命名格式為:.dir.{bucket_id}

# rados -p .rgw.buckets.index ls > .rgw.buckets.index                                                                                                                   
# grep default.5762326.25 .rgw.buckets.index 
.dir.default.5762326.25

bucket的元資訊
bucket的元資訊在rados中一個獨立的物件儲存在.rgw pool中的.bucket.meta.{bucket_name}:{marker}。

# rados -p .rgw ls
where_is_my_bucket1
.bucket.meta.where_is_my_bucket1:default.5762326.26
where_is_my_bucket
.bucket.meta.where_is_my_bucket:default.5762326.25
# rados -p .rgw get .bucket.meta.where_is_my_bucket:default.5762326.25 binary.bucket.meta.where_is_my_bucket:default.5762326.25
# ceph-dencoder type RGWBucketInfo  import .bucket.meta.where_is_my_bucket\:default.5762326.25  decode dump_json
{
    "bucket": {
        "name": "where_is_my_bucket",
        "pool": ".rgw.buckets",
        "data_extra_pool": ".rgw.buckets.extra",
        "index_pool": ".rgw.buckets.index",
        "marker": "default.5762326.25",
        "bucket_id": "default.5762326.25"
    },
    "creation_time": 1505182607,
    "owner": "yankun",
    "flags": 0,
    "region": "default",
    "placement_rule": "default-placement",
    "has_instance_obj": "true",
    "quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    },
    "num_shards": 0,
    "bi_shard_hash_type": 0
}

bucket的acl儲存在.bucket.meta.{bucket_name}:{marker}物件的xattr中。

#  rados -p .rgw getxattr .bucket.meta.where_is_my_bucket:default.5762326.25  user.rgw.acl > binary.bucket.acl
# ceph-dencoder type RGWAccessControlPolicy  import binary.bucket.acl  decode dump_json
{
    "acl": {
        "acl_user_map": [
            {
                "user": "yankun",
                "acl": 15
            }
        ],
        "acl_group_map": [],
        "grant_map": [
            {
                "id": "yankun",
                "grant": {
                    "type": {
                        "type": 0
                    },
                    "id": "yankun",
                    "email": "",
                    "permission": {
                        "flags": 15
                    },
                    "name": "yankun",
                    "group": 0
                }
            }
        ]
    },
    "owner": {
        "id": "yankun",
        "display_name": "yankun"
    }
}

RGW中的object

object只能儲存在bucket中,這裡構造了一個大檔案where_is_my_object.txt,用於上傳到bucket中。
構造大檔案

#dd if=/dev/zero of=./where_is_my_object.txt bs=2M count=1000
# du where_is_my_object.txt -h
2.0G    where_is_my_object.txt

上傳大檔案到bucket中

#s3cmd put where_is_my_object.txt s3://where_is_my_bucket
upload: 'where_is_my_object.txt' -> 's3://where_is_my_bucket/where_is_my_object.txt'  [1 of 1]
 2097152000 of 2097152000   100% in  123s    16.24 MB/s  done

object與bucket之間的對映
檔案上傳到bucket where_is_my_bucket中該bucket的id為default.5762326.25,該物件與bucket的關係維護在.dir.{bucket_id}物件的omap中。

# rados -p .rgw.buckets.index listomapkeys .dir.default.5762326.25
where_is_my_object.txt

物件命名格式
上傳的物件在rados中以一個物件存在或者多個物件存在,這主要看上傳物件的大小。
物件的資料儲存在.rgw.buckets pool中,如果上傳資料大小大於512KB,則會儲存多個物件,分別是一個頭物件(512KB)和一個或者多個尾物件(預設4MB)。頭物件命名格式為_,如where_is_my_bucket bucket中的where_is_my_object.txt物件在.rgw.buckets中的名字為:
default.5762326.25_where_is_my_object.txt;尾物件命名格式:{bucket_id}_shadow.{object_head:prefix}_{從1開始的自然序列}

# du default.5762326.25_where_is_my_object.txt 
512     default.5762326.25_where_is_my_object.txt
# du default.5762326.25__shadow_.h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_99
4096    default.5762326.25__shadow_.h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_99

物件的元資訊
物件的元資訊儲存在頭物件的xattr中

# rados -p .rgw.buckets listxattr default.5762326.25_where_is_my_object.txt
user.rgw.acl
user.rgw.content_type
user.rgw.etag
user.rgw.idtag
user.rgw.manifest
user.rgw.x-amz-date
user.rgw.x-amz-meta-s3cmd-attrs
user.rgw.x-amz-storage-class

物件的user.rgw.manifest屬性

# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt ./binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest
# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt user.rgw.manifest > ./binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest
# ceph-dencoder type  RGWObjManifest import binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest  decode dump_json
{
    "objs": [],
    "obj_size": 2097152000,
    "explicit_objs": "false",
    "head_obj": {
        "bucket": {
            "name": "where_is_my_bucket",
            "pool": ".rgw.buckets",
            "data_extra_pool": ".rgw.buckets.extra",
            "index_pool": ".rgw.buckets.index",
            "marker": "default.5762326.25",
            "bucket_id": "default.5762326.25"
        },
        "key": "",
        "ns": "",
        "object": "where_is_my_object.txt",
        "instance": ""
    },
    "head_size": 524288,
    "max_head_size": 524288,
    "prefix": ".h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_",
    "tail_bucket": {
        "name": "where_is_my_bucket",
        "pool": ".rgw.buckets",
        "data_extra_pool": ".rgw.buckets.extra",
        "index_pool": ".rgw.buckets.index",
        "marker": "default.5762326.25",
        "bucket_id": "default.5762326.25"
    },
    "rules": [
        {
            "key": 0,
            "val": {
                "start_part_num": 0,
                "start_ofs": 524288,
                "part_size": 0,
                "stripe_max_size": 4194304,
                "override_prefix": ""
            }
        }
    ]
}

Object ACL:

# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt  user.rgw.acl > binary.object.acl
# ceph-dencoder type RGWAccessControlPolicy  import binary.object.acl  decode dump_json
{
    "acl": {
        "acl_user_map": [
            {
                "user": "yankun",
                "acl": 15
            }
        ],
        "acl_group_map": [],
        "grant_map": [
            {
                "id": "yankun",
                "grant": {
                    "type": {
                        "type": 0
                    },
                    "id": "yankun",
                    "email": "",
                    "permission": {
                        "flags": 15
                    },
                    "name": "yankun",
                    "group": 0
                }
            }
        ]
    },
    "owner": {
        "id": "yankun",
        "display_name": "yankun"
    }
}

手動還原資料

根據object的模型設計,不通過rados gateway獲取一份完整的物件。
構造一個物件

location_object
# du -h location_object
9.8M    location_object

本地物件md5值

# md5sum location_object 
24796d54d73d694168170135091f7eba  location_object

上傳該物件到where_is_my_bucket

# s3cmd put location_object s3://where_is_my_bucket
upload: 'location_object' -> 's3://where_is_my_bucket/location_object'  [1 of 1]
 10200056 of 10200056   100% in    0s    77.72 MB/s
 10200056 of 10200056   100% in    4s     2.18 MB/s  done

物件切分
根據object的設計他會在rados中存在4個物件,一個頭物件和3個尾物件。
頭物件:default.5762326.25_location_object
尾物件:default.5762326.25__shadow_.{object_head:prefix}{1,2,3}

頭物件

rados -p .rgw.buckets ls | grep location
default.5762326.25_location_object

該物件的prefix

# rados -p .rgw.buckets getxattr default.5762326.25_location_object user.rgw.manifest > ./binary.default.5762326.25_location_object.user.rgw.manifest
# ceph-dencoder type  RGWObjManifest import binary.default.5762326.25_location_object.user.rgw.manifest  decode dump_json
{
    "objs": [],
    "obj_size": 10200056,
    "explicit_objs": "false",
    "head_obj": {
        "bucket": {
            "name": "where_is_my_bucket",
            "pool": ".rgw.buckets",
            "data_extra_pool": ".rgw.buckets.extra",
            "index_pool": ".rgw.buckets.index",
            "marker": "default.5762326.25",
            "bucket_id": "default.5762326.25"
        },
        "key": "",
        "ns": "",
        "object": "location_object",
        "instance": ""
    },
    "head_size": 524288,
    "max_head_size": 524288,
    "prefix": ".Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_",
    "tail_bucket": {
        "name": "where_is_my_bucket",
        "pool": ".rgw.buckets",
        "data_extra_pool": ".rgw.buckets.extra",
        "index_pool": ".rgw.buckets.index",
        "marker": "default.5762326.25",
        "bucket_id": "default.5762326.25"
    },
    "rules": [
        {
            "key": 0,
            "val": {
                "start_part_num": 0,
                "start_ofs": 524288,
                "part_size": 0,
                "stripe_max_size": 4194304,
                "override_prefix": ""
            }
        }
    ]
}

為物件為:default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_{1,2,3}
獲取被切分的物件
使用rados來獲取這些被切分的物件:

# rados -p .rgw.buckets get  default.5762326.25_location_object ./location_head
# rados -p .rgw.buckets get  default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1
# rados -p .rgw.buckets get  default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2
# rados -p .rgw.buckets get  default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3

拼接該物件

# cat location_head >  new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1  >>  new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2  >>  new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3  >>  new_location_object

new_location_object的md5值

# md5sum new_location_object 
24796d54d73d694168170135091f7eba  new_location_object

注:拉取拼接後的物件與之前的物件md5值相同,內容沒有發生變化。