This Guidance helps users prepare genomic, clinical, mutation, expression, and imaging data for large-scale analysis and perform interactive queries against a data lake. It includes infrastructure as code (IaC) automation, continuous integration and continuous delivery (CI/CD) for rapid iteration, an ingestion pipeline to store and transform the data, and notebooks and dashboards for interactive analysis. We also demonstrate how genomics variant and annotation data is stored and queried using AWS HealthOmics, Amazon Athena, and Amazon SageMaker notebooks. This Guidance was built in collaboration with Bioteam.