Data is collected from multiple data This architecture enables customers to build end-to-end modern data analytics platforms using AWS and Snowflake. 1 2 AWS Cloud Relational/Operational data SQL/NoSQL DBs Events/Streaming data Devices Social Batch data Media Logs File shares SaaS Applications SaaS Apps Data sources AWS Data Migration Service Amazon Kinesis Amazon Managed Streaming for Apache Kafka AWS IoT Core AWS DataSync AWS Glue Amazon AppFlow Data ingestion Governance and lineage AWS Security Token Service (AWS STS) 6 7 AWS Lake Formation AWS Identity and Access Management (IAM) Data Lake Raw data 3 Snowflake Conformed data Modelled data ETL AWS Glue Amazon Simple Storage Service (Amazon S3) Amazon EMR AWS Lambda Orchestration Amazon Managed Workflows for Apache Airflow (MWAA) AWS Step Functions Data storage, transform and govern Reviewed for technical accuracy March 11, 2022 Reviewed for technical accuracy Month Day, 2022 © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query data lake without loading data using external tables 4 5 Automated data ingestion with SnowPipe Data store Data serving AWS Reference Architecture Amazon QuickSight Applications and services Amazon SageMaker Data consumption sources across the enterprise, software as a service (SaaS) applications, edge devices, logs, streaming data, and social media networks. 2 Based on the type of data source, AWS 8 Database Migration Service, AWS DataSync, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka, AWS IoT Core, AWS Glue and Amazon AppFlow are used to ingest the data into the data lake in AWS. 3 Amazon S3is used for fully managed, highly available and scalable data lake storage. 4 AWS Glue is used to extract, transform and ingest data across multiple data stores. Amazon EMR provides the cloud big data platform for processing vast amounts of data using open source analytics framework. AWS Lambda and Amazon EC2 provide compute capability for data enrichment needs. 5 Amazon Managed Workflows for Apache Airflow (MWAA ) or AWS Step Functions is used for orchestrating end-to-end data pipelines. 6 AWS Lake Formation makes it easy to build, secure and manage your data lake, providing single place to enforce data classification and manage fine-grained access. AWS IAM and AWS STS provides ability to manage access permissions and temporary credentials. 7 Snowflake is used as virtual data