This Guidance demonstrates how to build a real-time machine learning (ML) inferencing solution on AWS that can serve millions of requests per second. By hosting your solution’s ML model on Amazon Elastic Container Service (Amazon ECS) and routing requests to the ML server using Network Load Balancer, you can achieve low latency and support high-throughput inference requirements commonly found in real-time and programmatic advertising. This Guidance provides an example of applying ML for ad request filtering and demonstrates how to build a client application that can simulate high-throughput OpenRTB-based requests to send to the ML inference server.