This document provides a reference architecture that you can use to design the infrastructure to run a generative artificial intelligence (AI) application with retrieval-augmented generation (RAG). The intended audience for this document includes developers and administrators of generative AI applications and cloud architects. The document assumes a basic understanding of AI, machine learning (ML), and large language model (LLM) concepts. This document doesn't provide guidance about how to design and develop a generative AI application.