Data integration is a critical element in building a data lake and a data warehouse. Data integration enables data from different sources to be cleaned, harmonized, transformed, and finally loaded. When building a data warehouse, the bulk of development efforts are needed for building a data integration pipeline. Data integration is one of the most critical pillars in data analytics ecosystems. An efficient and well-designed data integration pipeline is critical for making the data available, and being trusted among the analytics consumers.
In this whitepaper, we show you some of the consideration and best practices for security and reliability of data pipelines built with AWS Glue.
To get the most out of reading this whitepaper, it helps to be familiar with AWS Glue, AWS Glue DataBrew, Amazon Simple Storage Service (Amazon S3), AWS Lambda, and AWS Step Functions.
For best practices around Operational Excellence for your data pipelines, refer AWS Glue Best Practices: Building an Operationally Efficient Data Pipeline.
For best practices around Performance Efficiency and Cost Optimization for your data pipelines, refer AWS Glue Best Practices: Building a Performant and Cost Optimized Data Pipeline.