This pattern describes how to use serverless computing and infrastructure as code (IaC) to implement and administer a data lake on the Amazon Web Services (AWS) Cloud. This pattern is based on the serverless data lake framework (SDLF) workshop developed by AWS.
SDLF is a collection of reusable resources that accelerate the delivery of enterprise data lakes on the AWS Cloud and helps with faster deployment to production. It is used to implement the foundational structure of a data lake by following best practices.
SDLF implements a continuous integration / continuous deployment (CI/CD) process throughout the code and infrastructure deployment by using AWS services such as AWS CodePipeline, AWS CodeBuild, and AWS CodeCommit.
This pattern uses multiple AWS serverless services to simplify data lake management. These include Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB for storage, AWS Lambda and AWS Glue for computing, and Amazon CloudWatch Events, Amazon Simple Queue Service (Amazon SQS), and AWS Step Functions for orchestration.
AWS CloudFormation and AWS code services act as the IaC layer to provide reproducible and fast deployments with easy operations and administration.