1. 程式人生 > >Multi-Tenant Storage with Amazon DynamoDB

Multi-Tenant Storage with Amazon DynamoDB

Editor’s note: For the latest information, visit the .

By Tod Golding, Partner Solutions Architect at AWS

AWS Big DataIf you’re designing a true multi-tenant software as a service (SaaS) solution, you’re likely to devote a significant amount of time to selecting a strategy for effectively partitioning your system’s tenant data. On Amazon Web Services (AWS), your partitioning options mirror much of what you see in the wild. However, if you’re looking at using

Amazon DynamoDB, you’ll find that the global, managed nature of this NoSQL database presents you with some new twists that will likely influence your approach.

Before we dig into the specifics of the DynamoDB options, let’s look at the traditional models that are generally applied to achieve tenant data partitioning. The list of partitioning solutions typically includes the following variations:

  • Separate database – each tenant has a fully isolated database with its own representation of the data
  • Shared database, separate schema – tenants all reside in the same database, but each tenant can have its own representation of the data
  • Shared everything – tenants all reside in the same database and all leverage a universal representation of the data

These options all have their strengths and weaknesses. If, for example, you’d like to support the ability for tenants to have their own data customizations, you might want to lean toward a model that supports separate schemas. If that’s not the case, you’ll likely prefer a more unified schema. Security and isolation requirements are also key factors that could shape your strategy. Ultimately, the specific needs of your solutions will steer you toward one or more of these approaches. In some cases, where a system is decomposed into more granular services, you may see situations where multiple strategies are applied. The requirements of each service may dictate which flavor of partitioning best suits that service.

With this as a backdrop, let’s look at how these partitioning models map to the different partitioning approaches that are available with DynamoDB.

Linked Account Partitioning (Separate Database)

This model is by far the most extreme of the available options. Its focus is on providing each tenant with its own table namespace and footprint with DynamoDB. While this seems like a fairly basic goal, it is not easily achieved. DynamoDB does not have the notion of an instance or some distinct, named construct that can be used to partition a collection of tables. In fact, all the tables that are created by DynamoDB are global to a given region.

Given these scoping characteristics, the best option for achieving this level of isolation is to introduce separate linked AWS accounts for each tenant. To leverage this approach, you need to start by enabling the AWS Consolidated Billing feature. This option allows you to have a parent payer account that is then linked to any number of child accounts.

Once the linked account mechanism is established, you can then provision a separate linked account for each new tenant (shown in the following diagram). These tenants would then have distinct AWS account IDs and, in turn, have a scoped view of DynamoDB tables that are owned by that account.

While this model has its advantages, it is often cumbersome to manage. It introduces a layer of complexity and automation to the tenant provisioning lifecycle. It also seems impractical and unwieldy for environments where there might be a large collection of tenants. Caveats aside, there are some nice benefits that are natural byproducts of this model. Having this hard line between accounts makes it a bit simpler to manage the scope and schema of each tenant’s data. It also provides a rather natural model for evaluating and metering a tenant’s usage of AWS resources.

Tenant Table Name Partitioning (Shared Database, Separate Schema)

The linked account model represents a more concrete separation of tenant data. A less invasive approach would be to introduce a table naming schema that adds a unique tenant context to each DynamoDB table. The following diagram represents a simplified version of this approach, prepending a tenant ID (T1, T2, and T3) to each table name to identify the tenant’s ownership of the table.

This model embraces all the freedoms that come with an isolated tenant scheme, allowing each tenant to have its own unique data representation. With this level of granularity, you’ll also find that this aligns your tenants with other AWS constructs. These include:

  • The ability to apply AWS Identity and Access Management (IAM) roles at the table level allows you to constrain table access to a given tenant role.
  • Amazon CloudWatch metrics can be captured at the table level, simplifying the aggregation of tenant metrics for storage activity.
  • IOPS is applied at the table level, allowing you to create distinct scaling policies for each tenant.

Provisioning also can be somewhat simpler under this model since each tenant’s tables can be created and managed independently.

The downside of this model tends to be more on the operational and management side. Clearly, with this approach, your operational views of a tenant will require some awareness of the tenant table naming scheme in order to filter and present information in a tenant-centric context. The approach also adds a layer of indirection to any code you might have that is metering tenant consumption of DynamoDB resources.

Tenant Index Partitioning (Shared Everything)

Index-based partitioning is perhaps the most agile and common technique that is applied by SaaS developers. This approach places all the tenant data in the same table(s) and partitions it with a DynamoDB index. This is achieved by populating the hash key of an index with a tenant’s unique ID. This essentially means that the keys that would typically be your hash key (Customer ID, Account ID, etc.) are now represented as range keys.  The following example provides a simplified view of an index that introduces a tenant ID as a hash key. Here, the customer ID is now represented as a range key.

This model, where the data for every tenant resides in a shared representation, simplifies many aspects of the multi-tenant model. It promotes a unified approach to managing and migrating the data for all tenants without requiring a table-by-table processing of the information. It also enables a simpler model for performing tenant-wide analytics of the data. This can be extremely helpful in assessing and profiling trends in the data.

Of course, there are also limitations with this model. Chief among these is the inability to have more granular, tenant-centric control over access, performance, and scaling. However, some may view this as an advantage since it allows you to have a more global set of policies that respond to the load of all tenants instead of absorbing the load of maintaining policies on a tenant-by-tenant basis. When you choose your partitioning approach, you’ll likely strike a balance between these tradeoffs.

Another consideration here is that this approach could be viewed as creating a single point of failure. Any problem with the shared table could affect the entire population of tenants.

Abstracting Client Access

Each technique outlined in this blog post requires some awareness of tenant context. Every attempt to access data for a tenant requires acquiring a unique tenant identifier and injecting that identifier into any requests to manage data in DynamoDB.

Of course, in most cases, end-users of the data should have no direct knowledge that their provider is a tenant of your service. Instead, the solution you build should introduce an abstraction layer that acquires and applies the tenant context to any DynamoDB interactions.

This data access layer will also enhance your ability to add security checks and business logic outside of your partitioning strategies, with minimal impact to end-users.

Supporting Multiple Environments

As you think about partitioning, you may also need to consider how the presence of multiple environments (development, QA, production, etc.) might influence your approach. Each partitioning model we’ve discussed here would require an additional mechanism to associate tables with a given environment.

The strategy for addressing this problem varies based on the partitioning scheme you’ve adopted. The linked account model is the least affected, since the provisioning process will likely just create separate accounts for each environment. However, with table name and index-based partitioning, you’ll need to introduce an additional qualifier to your naming scheme that will identify the environment associated with each table.

The key takeaway is that you need to be thinking about whether and how environments might also influence your entire build and deployment lifecycle. If you’re building for multiple environments, the context of those environments likely need to be factored into your overall provisioning and naming scheme.

Microservice Considerations

With the shift toward microservice architectures, teams are decomposing their SaaS solutions into small, autonomous services. A key tenant of this architectural model is that each service must encapsulate, manage, and own its representation of data. This means that each service can leverage whichever partitioning approach best aligns with the requirements and performance characteristics of that service.

The other factor to consider is how microservices might influence the identity of your DynamoDB tables. With each service owning its own storage, the provisioning process needs assurance that the tables it’s creating for a given service are guaranteed to be unique. This typically translates into adding some notion of the service’s identity into the actual name of the table. A catalog manager service, for example, might have a table that is an amalgam of the tenant ID, the service name, and the logical table name. This may or may not be necessary, but it’s certainly another factor you’ll want to keep in mind as you think about the naming model you’ll use when tables are being provisioned.

Agility vs. Isolation

It’s important to note that there is no single preferred model for the solutions that are outlined in this blog post. Each model has its merits and applicability to different problem domains. That being said, it’s also important to consider agility when you’re building SaaS solutions. Agility is fundamental to the success of many SaaS organizations and it’s essential that teams consider how each partitioning model might influence its ability to continually deploy and evolve both applications and business.

Each variation outlined here highlights some of the natural tension that exists in SaaS design. In picking a partitioning strategy, you must balance the simplicity and agility of a fully shared model with the security and variability offered by more isolated models.

The good news is that DynamoDB supports all the mechanisms you’ll need to implement each of the common partitioning models. As you dig deeper into DynamoDB, you’ll find that it actually aligns nicely with many of the core SaaS values. As a managed service, DynamoDB allows you to shift the burden of management, scale, and availability directly to AWS. The schemaless nature of DynamoDB also enables a level of flexibility and agility that is crucial to many SaaS organizations.

Kicking the Tires

The best way to really understand the merits of each of these partitioning models is to simply dig in and get your hands dirty. It’s important to examine the overall provisioning lifecycle of each partitioning approach and determine how and where it would fit into a broader build and deployment lifecycle. You’ll also want to look more carefully at how these partitioning models interact with AWS constructs. Each approach has nuances that can influence the experience you’ll get with the console, IAM roles, CloudWatch metrics, billing, and so on. Naturally, the fundamentals of how you’re isolating tenants and the requirements of your domain are also going to have a significant impact on the approach you choose.

Are you building SaaS on AWS? Check out the AWS SaaS Partner Program, an APN Program providing Technology Partners with support to build, launch, and grow SaaS solutions on AWS.

相關推薦

Multi-Tenant Storage with Amazon DynamoDB

Editor’s note: For the latest information, visit the . By Tod Golding, Partner Solutions Architect at AWS If you’re designing a tr

Create Write-Once-Read-Many Archive Storage with Amazon Glacier

Many AWS customers use Amazon Glacier for long-term storage of their mission-critical data. They benefit from Glacier’s durability and low cost, a

Tutorial for building a Web Application with Amazon S3, Lambda, DynamoDB and API Gateway

Tutorial for building a Web Application with Amazon S3, Lambda, DynamoDB and API GatewayI recently attended Serverless Day at the AWS Loft in downtown San

Amazon DynamoDB – Internet-Scale Data Storage the NoSQL Way

We want to make it very easy for you to be able to store any amount of semistructured data and to be able to read, write, and modify it quickly, e

Prepare Environment for Working with AWS CLI and Amazon DynamoDB on Amazon EC2

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Analyze Your Data on Amazon DynamoDB with Apache Spark

Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, adverti

Amazon DynamoDB, 面向互聯網應用的高性能、可擴展的NoSQL數據庫

單個 相對 分布式系統 強一致性 數據集 osql 搭建 dynamodb ngs DynamoDB是一款全面托管的NoSQL數據庫服務。客戶能夠很easy地使用DynamoDB的服務。同一時候享受到高性能,海量擴展性和數據的持久性保護。 DynamoDB數據

[AngularFire] Angular File Uploads to Firebase Storage with Angular control value accessor

state spa lec span tor event allow load loading The upload class will be used in the service layer. Notice it has a constructor for file

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cody Rioux, Sadid A. Hasan, Yllias Chali ##Abstract Achieve the largest coverage of the docu ments content.目標的覆蓋整個文件的內容 Concentrate dis

Amazon DynamoDB系列---初識Amazon DynamoDB

什麼是 DynamoDB Amazon DynamoDB 是一種完全託管的 NoSQL 資料庫服務,提供快速而可預測的效能,能夠實現無縫擴充套件。使用 DynamoDB,您可以免除操作和擴充套件分散式資料庫的管理工作負擔,因而無需擔心硬體預置、設定和配置、複製、軟體修補或叢集擴充套

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classifi

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification (多工網路中的完全自適應特徵共享及其在人屬性分類中的應用 ) 原文連結:Fully

深度學習:乳腺論文Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram

參考論文解析:https://blog.csdn.net/Coralccccc/article/details/73956702 論文翻譯:https://blog.csdn.net/u014264373/article/details/79581655 標題:Deep Multi-inst

Multi-Object-Edit With Django FormSets

I had to write a multi-object edit table the other day for a Django project and as such I dove into the FormSet Documentation. Django’s documentation

Miccai論文分享(一)Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classific

多示例學習(multi-instance learning) 本科畢業答辯結束!這個系列主要分享一些Miccai這一醫學影像分析頂級會議上論文~ 為什麼講這個? 看到了miccai2017年的一篇挺有意思的論文,所以查閱了相關研究資料,分享給大家。 什麼是多示例學習

Hybrid collaborative tiered storage with Alluxio

應用程式從AWS S3或者阿里雲OSS讀取資料時,通常都會有嚴重的效能問題,畢竟是要通過遠端網路。 Alluxio可以提供一個透明的資料快取層,自動快取需要讀取遠端OSS/S3資料,但是Alluxio本身什麼時候拉取遠端資料呢?預設全部快取?還是按需快取?這個PPT裡將會介紹Alluxio的層次化儲存概念,結

Segmenting brain tissue using Apache MXNet with Amazon SageMaker and AWS Greengrass ML Inference

In Part 1 of this blog post, we demonstrated how to train and deploy neural networks to automatically segment brain tissue from an MRI scan in a s

How Will Google Shopping Compete With Amazon? With Sci

This story is for Medium members.Continue with FacebookContinue with GoogleMedium curates expert stories from leading publishers exclusively for members (w

Multi-User Applications with Anvil | Anvil

« Anvil Tutorials Data Storage and Security When your application supports multiple different users, it’s im

Save time and money by filtering faces during indexing with Amazon Rekognition

Amazon Rekognition is a deep-learning-based image and video analysis service that can identify objects, people, text, scenes, and activities, as w

5 tips for multi-GPU training with Keras

Deep Learning (the favourite buzzwo