Linkedin

  • Home >
  • Guidance for Low-Latency High-Throughput Model Inference Using Amazon ECS

Guidance for Low-Latency High-Throughput Model Inference Using Amazon ECS

Project Overview

This Guidance demonstrates how to build a real-time machine learning (ML) inferencing solution on AWS that can serve millions of requests per second. By hosting your solution’s ML model on Amazon Elastic Container Service (Amazon ECS) and routing requests to the ML server using Network Load Balancer, you can achieve low latency and support high-throughput inference requirements commonly found in real-time and programmatic advertising. This Guidance provides an example of applying ML for ad request filtering and demonstrates how to build a client application that can simulate high-throughput OpenRTB-based requests to send to the ML inference server.

To know more about this project connect with us

Guidance for Low-Latency High-Throughput Model Inference Using Amazon ECS