百度物件儲存BOS(Baidu Object Storage)進行冷儲存資料備份
阿新 • • 發佈:2019-01-08
最近有需求就是冷儲存資料進行異地災備,同時為了更多的節省本地的儲存成本,維護成本,人力資源等等,選擇使用相對更為優惠的百度物件儲存來進行備份資料,BOS產品介紹:BOS介紹,為了快速的,批量的上傳檔案,利用BOS Python SDK開發了一套分散式多工上傳解決方案,本文主要來介紹一下BOS Python SDK的使用方法,為BOS免費做了廣告,怎麼感謝我!
一、建立虛擬環境
# yum install python-virtualenv
# mkvirtualenv bos
# workon bos
使用virtualenv主要是使用虛擬環境來搭建python開發環境,將不同的python專案進行隔離,避免相關的包的衝突,簡單的介紹一下virtualenv的使用:
列出虛擬環境列表:
workon/lsvirtualenv
新建虛擬環境:
mkvirtualenv [虛擬環境名稱]
啟動/切換虛擬環境:
workon [虛擬環境名稱]
刪除虛擬環境:
rmvirtualenv [虛擬環境名稱]
離開虛擬環境:
deactive
二、安裝BOS SDK
2.1 下載bos sdk安裝包
wget http://sdk.bce.baidu.com/console-sdk/bce-python-sdk-0.8.8.zip
2.2 執行安裝指令碼
python setup.py install
三、編寫配置檔案
bos_sample_conf.py:
import logging import os import sys from baidubce.bce_client_configuration import BceClientConfiguration from baidubce.auth.bce_credentials import BceCredentials PROXY_HOST = 'localhost:8080' bos_host = "bj.bcebos.com" access_key_id = "a6748c1334a44c2d8af60fcdf098b30d" secret_access_key = "3d7621d35b0c426ea2c0dfdbfca45151" logger = logging.getLogger('baidubce.services.bos.bosclient') fh = logging.FileHandler("sample.log") fh.setLevel(logging.DEBUG) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') fh.setFormatter(formatter) logger.setLevel(logging.DEBUG) logger.addHandler(fh) config = BceClientConfiguration(credentials=BceCredentials(access_key_id, secret_access_key), endpoint = bos_host)
配置檔案指定了上傳的host,id和secret,並執行了client的初始化配置config。
四、檔案上傳
import os,sys,hashlib from baidubce import exception from baidubce.services import bos from baidubce.services.bos import canned_acl from baidubce.services.bos.bos_client import BosClient import base64 import bos_sample_conf ##init a bos client bos_client = BosClient(bos_sample_conf.config) ##init a bucket bucket_name = 'wahaha' if not bos_client.does_bucket_exist(bucket_name): bos_client.create_bucket(bucket_name) print "init bucket:%s success" % bucket_name ##upload object from string object_key = 'Happy Spring Festival' str = 'this is the test string' bos_client.put_object_from_string(bucket_name,object_key,str) ##put object from file file_name = "/root/baidu_object_storage/test/file_to_be_upload" response = bos_client.put_object_from_file(bucket_name, object_key + ' plus',file_name) print "response.metadata.etag = " + response.metadata.etag ##get object meta data response = bos_client.get_object_meta_data(bucket_name, object_key+' plus') print "response object meta data:" print response ##list objects in bucket response = bos_client.list_objects(bucket_name) for object in response.contents: print 'object.key = ' + object.key ##get bucket list response = bos_client.list_buckets() for bucket in response.buckets: print "bucket.name = " + bucket.name ##get object print bos_client.get_object_as_string(bucket_name,object_key) #get unfinished multipart upload task print "get unfinished multipart upload task:" for item in bos_client.list_all_multipart_uploads(bucket_name): print 'item.upload_id = ' + item.upload_id #abort unfinished multipart upload task print "abort unfinished multipart upload task" for item in bos_client.list_all_multipart_uploads(bucket_name): bos_client.abort_multipart_upload(bucket_name, item.key.encode("utf-8"), upload_id = item.upload_id) response = bos_client.list_multipart_uploads(bucket_name) for item in response.bukcet: print item.name
結果:
五、大檔案上傳
import os
import sys
import hashlib
sys.path.append("../bos")
import bos_sample_conf
from baidubce import exception
from baidubce.services import bos
from baidubce.services.bos import canned_acl
from baidubce.services.bos.bos_client import BosClient
default_path = os.path.dirname(os.path.realpath(__file__))
#init a bos client
bos_client = BosClient(bos_sample_conf.config)
#init a bucket
bucket_name = 'wahaha'
if not bos_client.does_bucket_exist(bucket_name):
bos_client.create_bucket(bucket_name)
#init object key
object_key = 'this is object_key of big file'
#upload multipart object
upload_id = bos_client.initiate_multipart_upload(bucket_name,object_key).upload_id
print 'upload_id = ' + upload_id
file_name = default_path + os.path.sep + 'big_file'
if os.path.isfile(file_name):
print "file_name = %s" % file_name
else:
exit(-1)
#set the beginning of multipart
left_size = os.path.getsize(file_name)
#set the offset
offset = 0
part_number = 1
part_list = []
e_tag_str = ""
while left_size > 0:
#set each part 50MB
print "size left: %dMB" % (left_size/1014/1024)
part_size = 50*1024*1024
if left_size < part_size:
part_size = left_size
response = bos_client.upload_part_from_file(
bucket_name,object_key,upload_id,part_number,part_size,file_name,offset)
left_size -= part_size
offset += part_size
part_list.append({
"partNumber":part_number,
"eTag":response.metadata.etag
})
part_number += 1
e_tag_str += response.metadata.etag
print part_number, " ", response.metadata.etag
print "\n"
response = bos_client.complete_multipart_upload(bucket_name,object_key,upload_id,part_list)
print response
m = hashlib.md5()
m.update(e_tag_str)
e_tag_str_to_md5 = "-" + m.hexdigest()
if e_tag_str_to_md5 == response.etag:
print "e_tag match great!!!"
else:
print "etag does not match, e_tag_str_to_md5 = %s" % e_tag_str_to_md5
print "\n"
print response.bucket
print response.key
print response.etag
print response.location
結果:
注:
1.對於小檔案上傳返回的response值的etag值就是小檔案的md5值。
2.對於大檔案上傳,完成分塊上傳的etag是每一個分塊上傳返回的etag值相加之後再計算md5值得到的值之前再加個"-"(好抽象)。
Author:憶之獨秀
Email:[email protected]