Linkedin

  • Home >
  • Guidance for Low Latency, High Throughput Inference using Efficient Compute on Amazon EKS

Guidance for Low Latency, High Throughput Inference using Efficient Compute on Amazon EKS

Project Overview

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture. PyTorch is an open-source machine learning framework that can help accelerate your machine learning journey from prototyping to deployment. We also explore a mix of Amazon Elastic Compute Cloud (Amazon EC2) instance families to develop an optimal design using efficient compute (such as AWS Graviton and AWS Inferentia) that allows you to scale inferences efficiently and cost effectively.

To know more about this project connect with us

Guidance for Low Latency, High Throughput Inference using Efficient Compute on Amazon EKS