Mar 24, 2021

DynamoDB Checklist

checklist-cropped

DynamoDB is the leading serverless database in the AWS suite of offerings. It provides an easy to configure, high-performance, NoSQL database with low operational overhead and almost endless scalability.

DynamoDB appeals to developers requiring a simple serverless database and those requiring the utmost in scalability. However, DynamoDB can be used effectively for almost any OLTP application regardless of scale.

This DynamoDB checklist is a collection of the some of the more important items We’ve learned with our EmbedThis Ioto Service that uses single-table design patterns, NodeJS and the DynamoDB OneTable library in production over the past several years.

I hope you will consider them in your DynamoDB projects and use it as a checklist to prompt your adoption of best practices.

Why Single Table?

First a recap as to what are single-table designs and why are they are the preferred pattern for DynamoDB today.

In single-table designs, one database table serves the entire application and holds multiple different application entities. This design pattern offers greater performance by reducing the number of requests required to retrieve information and lowers operational overhead. It also greatly simplifies the changing and evolving of your DynamoDB designs by uncoupling the entity key fields and attributes from the physical table structure.

The recent rise of single-table designs is due to better understanding of how to model DynamoDB data in a single-table design and streamlined access libraries such as DynamoDB OneTable.

Why Single Table?
Contents
Preparation
Data Modeling
Indexes
Data Organization
Libraries and Tooling
Coding
Debugging
Performance
Infrastructure
Migrations
Billing
Cost Control
Monitoring and Maintenance
This is a Journey
Why OneTable?
Links
References

Preparation

[ ] Understand the DynamoDB data model if you are coming from an RDBMs world. Read the DynamoDB Book and DynamoDB Best Practices.
[ ] Learn about DynamoDB single-table design. Read: Data Modeling for Single Table Designs
[ ] Know DynamoDB capacity and limits (400KB item size, 10GB per hash key with LSIs, 4MB per transaction, 3000 RCUs / 1000 WCUs per partition, etc.) See DynamoDB Limits.
[ ] Understand how filter FilterExpressions actually work. You are charged for all data examined, not just for data returned. See Alex DeBrie’s take on Filter Expressions.

Data Modeling

[ ] Before even thinking of writing code, ensure you have determined all your data entities and their relationships and have a documented Entity Relationship Diagram (ERD).
[ ] Enumerate and document all your known access patterns. Ensure you cover access for user interfaces, APIs, CRUD for all entities and don’t forget required maintenance operations. This should describe the entities and attributes queried and the required key fields as well as the attributes returned by the queries.
[ ] Uncouple logical keys from the physical key names. Overload the physical partition key by using key prefix labels coupled with other attribute values to create key values for entities. Your DynamoDB access library should provide support for this. (OneTable schema value).
[ ] Select your partition key values carefully to distribute load over all partitions and avoid hot keys.
[ ] Use the same partition key for related items that are required for a single access pattern. Differentiate items in the collection via the sort key.
[ ] An ideal primary key has a high cardinality partition key and a set of sort keys to support retrieving item collections of related data.
[ ] Select your sort key values to support multiple access patterns by using catenated sub-fields that can be queried using begins_with.
[ ] Use a local DynamoDB instance when creating and prototyping your DynamoDB design. When used with the OneTable CLI for migrations and test data, you can quickly iterate and test your design with prototype queries.

Indexes

[ ] Don’t store attribute values directly and uniquely in the partition or sort keys. Have separate attributes for all entity item values and copy/project into the keys. This will greatly ease migrations and evolving your database going forward. (The OneTable schema value provides a template or callback function to create the key value).
[ ] Only use UUIDs or generated partition keys if they are directly used by your access patterns. For example, use a UUID for an Account ID if you already have that ID via a prior query result such as authorizing a user that provides the account ID.
[ ] Use ULIDs or KSUIDs when you need a time-based sortable, unique sequential string.
[ ] Use sparse secondary indexes to efficiently subset your data in queries. You only pay for items that have values defined for the GSI.
[ ] Minimize projected attributes in GSIs to reduce additional storage costs. You can now have up to 20 GSIs on a table. With OneTable, use the map option to pack attributes into a single GSI data attribute. Use the follow option to transparently follow the GSI and fetch the complete item if required.
[ ] Don’t waste the sort key. In addition to implementing your access patterns, use it to sort the data in the most used sequence. Use a limit to fetch the first or last set of items.

Data Organization

[ ] Keep related data together. Use the primary partition key to group the item collection and use composite sort key values to define successively smaller sets in the collection.
[ ] Keep update data items small. Consider splitting an item if only a portion is updated frequently. You can use the same partition key and retrieve the parts using different sort keys.
[ ] Don’t denormalize highly mutable data as it can become a significant overhead to update all the place the data is replicated.
[ ] Map application attribute names to shorter abbreviations to minimize storage for very large data sets. (Use the OneTable schema map).
[ ] Add a type attribute to differentiate entities for which you need to batch process. For example: all Accounts. You can then index the type in a sparse GSI. (OneTable creates the type automatically).
[ ] Store dates in epoch or ISO 8601 format so they can be sorted.
[ ] Don’t store blobs in DynamoDB, S3 or EFS are more economical and scalablell scalable storage for large data items.
[ ] Use GZIP to compress other large items if not stored in S3.
[ ] Handle post-processing and data aggregation needs separately via DynamoDB streams. This may simplify your key structure by handling these use cases specially.

Libraries and Tooling

[ ] Use the AWS SDK for Javascript V3 instead of V2. It is modular, smaller, faster and supports native async and keep alive on connections.
[ ] If using the AWS V2 SDK, use set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 in your environment for faster network I/O by reusing TCP/IP connections.
[ ] Use a library that will accelerate your use of the DynamoDB API and automatically create the low-level DynamoDB API attribute names and values, project expressions and filter expressions. See OneTable.

Coding

[ ] Always use a sanity limit on queries and especially on scans. This will help prevent nasty billing surprises due to queries and scans that fetch much more data than you anticipate.
[ ] Use ProjectionFilters on returned data to reduce data transfer latency and costs. (Set params.fields in OneTable).
[ ] Use FilterExpressions rather than client-side filtering. The data is still read by DynamoDB, but your transfer costs will be lower. (params.where in OneTable).
[ ] Use transactions for atomic updates but don’t use for atomic math operations such as increment, instead use update expressions to implement an Atomic Counter (Use params.add in OneTable update).
[ ] Use attribute_exists and attribute_not_exists on sort key attributes to test for preexisting items. (OneTable automates this with create and update operations).
[ ] Use transactions and a secondary item to implement and enforce uniqueness for non-key attributes. (See OneTable schema unique fields).
[ ] Use await/promises instead of callbacks in NodeJS. Enough said!
[ ] Use BatchGetItem/BatchWriteItem to retrieve/update items in parallel.

Debugging

[ ] Test and debug apps locally first with a local DynamoDB and single-step in a debugger to check your queries. You will sometimes be surprised at the results your queries are returning. Especially when you are retrieving large item collections.

Performance

[ ] Avoid using scan operations like the plague which scan the entire database contents. By using a type attribute indexed via a KEYS_ONLY GSI, you may be able to use a query rather than a scan.
[ ] Consider disabling scan on dev systems via IAM. This will prevent less experienced developers accidentally relying on costly scan operations.
[ ] Consider spreading and balancing your load by routing requests through SQS. This will eliminate peaks and make capacity planning easier.
[ ] Consider the Amazon DynamoDB Accelerator (DAX) for caching which can deliver up to a 10x performance improvement.

Infrastructure

[ ] Create and manage database tables and indexes via scripts or code. See Infrastructure as code. Never use the console to create permanent infrastructure.
[ ] Perform all database provisioning and mutations via coded reversible, idempotent migrations.
[ ] Perform database migrations independently from code deploys.
[ ] Use VPC endpoints if required for secure access to private resources. This now incur much smaller cold start penalties.

Migrations

[ ] Use a migration management tool that makes using migrations easy and fast. It should keep a history of migrations and provide full forwards or backwards control of migrations. Use it for development and for production sites.
[ ] Subdivide migrations into small, stand-alone mutations to lessen the risk of bugs, data loss or corruption.
[ ] Create reversible migrations wherever possible. If something goes wrong, you need to quickly and easily revert the database.
[ ] Create idempotent migrations wherever possible. If a migration fails to complete for any reason, it can then be resumed. This also enables the use of canary migrations where say 5% of the data is migrated and tested before migrating the entire database.
[ ] Observe the make then break principle where migrations never leave the database in a partially upgraded or available state. This means adding new data or capabilities while maintaining prior data until after the migration is complete and new code is deployed and fully tested.
[ ] Before deploying a database migration, the currently deployed code must fully work with the database before and after the migration is deployed. This often means fallback code to handle new and old data entities, attributes and relationships.
[ ] Use migrations to perform maintenance tasks such as finding and removing orphaned items and old, unused attributes. LINK (gist).

Billing

[ ] Use OnDemand pricing until DynamoDB costs becomes an issue. By then you’ll have history and valid traffic patterns to do your capacity planning and switch to provisioned capacity with autoscale.
[ ] Remember on-demand can be up to 7x provisioned pricing, but many smaller sites have sufficient gaps in load to make on-demand cost effective.
[ ] Purchase reserved capacity if your provisioned load is predictable. A three year term is 76% cheaper.

Cost Control

[ ] DynamoDB prices vary considerably across regions. Choose the cheapest region that is closest to your customers. AWS Pricing Calculator.
[ ] Monitor API gateway pricing. For high traffic services, API Gateway costs can exceed your DynamoDB cost by 4-5 times. Consider migrating such services to an ALB.
[ ] Configure billing Alarms using AWS CloudWatch.
[ ] Use short attribute names. Attribute names are part of the total cost of storage. Use OneTable mapped names for short storage attribute names.
[ ] Store dates in the compact Unix epoch format rather than ISO format. Epoch format is still sortable, but uses less space.
[ ] Don’t use strongly consistent reads unless really required. Consistent reads are more expensive and often slower.
[ ] Regularly delete orphaned items and unused attributes. As a NoSQL database, DynamoDB can easily accumulate orphaned items over time that are not used by any related items and unused attributes can remain as you evolve your data design. Have regular data migrations to clean these up. Consider the OneTable Migration CLI for these maintenance tasks.
[ ] Use KEYS-ONLY secondary indexes where possible to reduce the cost of the GSI. Use the OneTable follow option to transparently follow the GSI fetch the complete item.

Monitoring and Maintenance

[ ] Use CloudWatch Contributor Insights for DynamoDB to understand the most accessed keys and items in your database.
[ ] Enable point in time backups to help protect against catastrophic accidental writes and deletes on your production database.

This is a Journey

Thanks for considering these lessons. If you have suggestions to improve this checklist or correct any items I may have misrepresented, you can contact me (Michael O’Brien) on X at: @mobstream, or email and read our Blog.

Why OneTable?

DynamoDB is a great NoSQL database that comes with a steep learning curve. Folks migrating from SQL often have a hard time adjusting to the NoSQL paradigm and especially to DynamoDB which offers exceptional scalability but with a fairly low-level API.

However, the standard DynamoDB API requires a lot of boiler-plate syntax and expressions. This is tedious to use and can unfortunately can be error prone at times. Net/Net: it is not easy to write terse, clear, robust Dynamo code for one-table patterns.

The OneTable library addresses these concerns and provides a higher-level, more natural, “Javascripty” way to interact with DynamoDB without obscuring any of the power of DynamoDB itself.

You can read more in the detailed documentation at: GitHub OneTable or NPM OneTable.

EmbedThis with OneTable

At EmbedThis, we’ve used DynamoDB, OneTable and the OneTable CLI extensively with our EmbedThis Ioto IoT service. All data is stored in a single DynamoDB table and we extensively use single-table design patterns. We could not be more satisfied with DynamoDB implementation. Our storage and database access costs are insanely low and access/response times are excellent.

References

Comments

{{comment.name}} said ...

Comments Closed