Bootstrap DynamoDB Tables Using CDK

Bootstrap DynamoDB Tables Using CDK

I’ve been building a serverless API this week, but I’ve run into a database bootstrapping problem. Then i thought, perhaps i can use my IaC tool of choice to bootstrap it?

Background

Both the API logic and infrastructure are defined within a single CDK repo. I haven’t got much time to work on this so i’m trying to move fast - this means fail fast, get the desired result quickly.

I need my database to have some data in it. In this example i’m showing a single DynamoDB Table, but in reality i’m working with half a dozen tables, with more being added as the API expands!

As a good DevOps Engineer, i don’t want to do anything manually.

How can i automatically & programmatically populate my DynamoDB tables?

Table of contents

Custom Resources

No, unfortunately there is no built-in feature of CDK where i can provide the table resource a json file. I’m going to have to build that feature myself.

The tool i’ll be using to create this feature is Custom Resources.

Custom resources have been a part of Cloudformation for yonks and they should only be used as a last resort. It involves creating a lambda to do what you couldn’t do in Cloudformation by itself, and then signaling cloudformation on the successes or failure of said lambda.

Here’s an example:

import cfnresponse
import logging
import boto3

# Init of the logging module
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
def lambda_handler(event, context):
logging.info("Event: %s" % event)

try:
    if event.get('RequestType') in ['Create', 'Update']:
        # Do Something using AWS SDK - like... Adding an item to a DynamoDB Table.
        responseData = {}
        responseData['message'] = "Done"
        logging.info('Sending %s to cloudformation', responseData['message'])
        cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
    elif event.get('RequestType') == 'Delete':
        # Do something different on delete
        responseData = {}
        responseData['message'] = "Deleted"
        logging.info('Sending %s to cloudformation', responseData['message'])
        cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
    else:
        raise Exception('Unknown operation: %s' % event.get('RequestType'))
except Exception as e:
    logging.error("Exception: %s" % str(e))
    cfnresponse.send(event, context, cfnresponse.FAILED, {"message": str(e)})

Wait so i need to define this lambda function every time i want to do one small custom thing in Cloudformation?

Yup.

Well you can of course do multiple things in the single Lambda, but you get the idea.

That looks like a LOT of effort to simply populate a NoSQL Database like DynamoDB. This solution is clunky. Can CDK save the day by providing Custom Resources in a simpler to use construct?

CDK Custom Resource Construct

Let’s start with a DynamoDB table:

self.products_db = dynamodb.Table(
    self,
    f"{project_name}-products-db",
    partition_key=dynamodb.Attribute(name="id", type=dynamodb.AttributeType.STRING),
    table_name=f"{project_name}-products-db",
    billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST
)

Now let’s add the Custom Resource.

cr.AwsCustomResource(
    self,
    "init_products",
    on_update=cr.AwsSdkCall(
                action="putItem",
                service="DynamoDB",
                physical_resource_id=cr.PhysicalResourceId.of(self.products_db.table_name + "_initialization"),
                parameters={
                    "TableName": self.products_db.table_name,
                    "Item": {
                        "id": {"S": "PRODUCT#1234-5678-9012"},
                        "name": {"S": "blanket"},
                        "price": {"N": "123.00"},
                        "stock": {"N": "5"}
                    },
                }
            ),
    policy=cr.AwsCustomResourcePolicy.from_statements(
        statements=[iam.PolicyStatement(
            effect=iam.Effect.ALLOW,
            actions=["dynamodb:putItem"],
            resources=[self.products_db.table_arn]
        )]
    )
)

Bask in it’s beauty! I’ve been able to write a Custom Resource that adds an item to a DynamoDB table, without actually writing the Lambda! All i have to do is define the action i want to do, and it’s parameters. Wonderful! I’ve saved myself from the entire Lambda code block above, and just done the custom AWS SDK bit.

But…. I’ve just added one item - That’s hardly bootstrapping. Let’s expand this to many items!

First, let’s download some DynamoDB data and put it into a list. Simply export your DynamoDB data using the AWS CLI:

aws dynamodb scan --table-name TABLE_NAME > export.json

I’ve minified the Items output for easier viewing.

{
    "Items": [
        {"stock":{"N":"2"},"id":{"S":"PRODUCT#1234-5678-9015"},"price":{"N":"199"},"name":{"S":"hiking boots"}},
        {"stock":{"N":"10"},"id":{"S":"PRODUCT#1234-5678-9018"},"price":{"N":"120"},"name":{"S":"sleeping mat"}},
        {"stock":{"N":"27"},"id":{"S":"PRODUCT#1234-5678-9016"},"price":{"N":"19.95"},"name":{"S":"water bottle"}},
        {"stock":{"N":"5"},"id":{"S":"PRODUCT#1234-5678-9019"},"price":{"N":"400.95"},"name":{"S":"tent"}},
        {"stock":{"N":"9"},"id":{"S":"PRODUCT#1234-5678-9013"},"price":{"N":"58"},"name":{"S":"head torch"}},
        {"stock":{"N":"5"},"price":{"N":"123"},"id":{"S":"PRODUCT#1234-5678-9012"},"name":{"S":"blanket"}},
        {"stock":{"N":"12"},"id":{"S":"PRODUCT#1234-5678-9014"},"price":{"N":"69.95"},"name":{"S":"pillow"}},
        {"stock":{"N":"30"},"id":{"S":"PRODUCT#1234-5678-9017"},"price":{"N":"9.95"},"name":{"S":"canned soup"}}
    ],
    "Count": 8,
    "ScannedCount": 8,
    "ConsumedCapacity": null
}

Then copy and paste the data from the export.json file into a list.

Easy. No data manipulation required - Direct Copy.

products_table = [
    {"id": {"S": "PRODUCT#1234-5678-9012"},"name": {"S": "blanket"},"price": {"N": "123.00"},"stock": {"N": "5"}},
    {"id": {"S": "PRODUCT#1234-5678-9013"},"name": {"S": "head torch"},"price": {"N": "58.00"},"stock": {"N": "9"}},
    {"id": {"S": "PRODUCT#1234-5678-9014"},"name": {"S": "pillow"},"price": {"N": "69.95"},"stock": {"N": "12"}},
    {"id": {"S": "PRODUCT#1234-5678-9015"},"name": {"S": "hiking boots"},"price": {"N": "199.00"},"stock": {"N": "2"}},
    {"id": {"S": "PRODUCT#1234-5678-9016"},"name": {"S": "water bottle"},"price": {"N": "19.95"},"stock": {"N": "27"}},
    {"id": {"S": "PRODUCT#1234-5678-9017"},"name": {"S": "canned soup"},"price": {"N": "9.95"},"stock": {"N": "30"}},
    {"id": {"S": "PRODUCT#1234-5678-9018"},"name": {"S": "sleeping mat"},"price": {"N": "120.00"},"stock": {"N": "10"}},
    {"id": {"S": "PRODUCT#1234-5678-9019"},"name": {"S": "tent"},"price": {"N": "400.95"},"stock": {"N": "5"}},
]

Now let’s make a method to ingest that data:

def batch_add_items(self, items):
    number_of_batches = math.ceil(len(items) / 25)
    for batch in range(number_of_batches):
        batch_items = []
        item_count = 0
        while item_count < 25 and len(items) > 0:
            batch_items.append({"PutRequest": {"Item": items[0]}})
            items.pop(0)
        cr.AwsCustomResource(
                self,
                f"init_products_{str(batch)}",
                on_update=cr.AwsSdkCall(
                            action="batchWriteItem",
                            service="DynamoDB",
                            physical_resource_id=cr.PhysicalResourceId.of(self.products_db.table_name + "_initialization_"+str(batch)),
                            parameters={
                                "RequestItems": {self.products_db.table_name: batch_items}
                            }
                        ),
                policy=cr.AwsCustomResourcePolicy.from_statements(
                    statements=[iam.PolicyStatement(
                        effect=iam.Effect.ALLOW,
                        actions=["dynamodb:batchWriteItem"],
                        resources=[self.products_db.table_arn]
                    )]
                )
            )

This method simply breaks up the data into chunks of 25, and creates as many custom resources as necessary to populate the entire database. It’s a simple loop in a loop.

Populated DynamoDB Table

Populated DynamoDB Table

Populated DynamoDB Table

Let’s see this all together now:

from aws_cdk import (
    Stack,
    aws_dynamodb as dynamodb,
    aws_iam as iam,
    custom_resources as cr,
)
from constructs import Construct
from dynamodb_bootstrap import products_table
import math

class DevaxStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        project_name = self.node.try_get_context("project_name")

        self.products_db = dynamodb.Table(
            self,
            f"{project_name}-products-db",
            partition_key=dynamodb.Attribute(name="id", type=dynamodb.AttributeType.STRING),
            table_name=f"{project_name}-products-db",
            billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST
        )

        self.batch_add_items(products_table)


    def batch_add_items(self, items):
        number_of_batches = math.ceil(len(items) / 25)
        for batch in range(number_of_batches):
            batch_items = []
            item_count = 0
            while item_count < 25 and len(items) > 0:
                batch_items.append({"PutRequest": {"Item": items[0]}})
                items.pop(0)
            cr.AwsCustomResource(
                    self,
                    f"init_products_{str(batch)}",
                    on_update=cr.AwsSdkCall(
                                action="batchWriteItem",
                                service="DynamoDB",
                                physical_resource_id=cr.PhysicalResourceId.of(self.products_db.table_name + "_initialization_"+str(batch)),
                                parameters={
                                    "RequestItems": {self.products_db.table_name: batch_items}
                                }
                            ),
                    policy=cr.AwsCustomResourcePolicy.from_statements(
                        statements=[iam.PolicyStatement(
                            effect=iam.Effect.ALLOW,
                            actions=["dynamodb:batchWriteItem"],
                            resources=[self.products_db.table_arn]
                        )]
                    )
                )

Conclusion

That’s it!

We can bootstrap to our heart’s content - or at least until you hit the cloudformation template limits. For truly enormous amounts of data, you’ll need to look at Data Pipeline - Perhaps i’ll write about that in the future?

comments powered by Disqus