This document provides a reference architecture that you can use to design the infrastructure to run a generative AI application with retrieval-augmented generation (RAG) using Google Kubernetes Engine (GKE), Cloud SQL, and open source tools like Ray, Hugging Face, and LangChain. To help you experiment with this reference architecture, a sample application and Terraform configuration are provided in GitHub.
This document is for developers who want to rapidly build and deploy RAG-capable generative AI applications by using open source tools and models. It assumes that you have experience with using GKE and Cloud SQL and that you have a conceptual understanding of AI, machine learning (ML), and large language models (LLMs). This document doesn't provide guidance about how to design and develop a generative AI application.