Linkedin

Guidance for Distributed Model Training on AWS

Project Overview

Project Detail

This Guidance helps customers who have on-premises restrictions or who have existing Kubernetes investments to use either Amazon Elastic Kubernetes Service (Amazon EKS) and Kubeflow or Amazon SageMaker to implement a hybrid, distributed machine learning (ML) training architecture. Kubernetes is a widely adopted system for automating infrastructure deployment, resource scaling, and management of containerized applications. The open-source community developed a layer on top of Kubernetes called Kubeflow, which aims to make the deployment of end-to-end ML workflows on Kubernetes simple, portable, and scalable. With the ability to choose between two approaches at runtime in this architecture, customers gain maximum control over their ML deployments. They can continue using open-source libraries in their deep learning training script and still make it compatible to run on both Kubernetes and SageMaker.

https://aws.amazon.com/solutions/guidance/distributed-model-training-on-aws/?did=sl_card&trk=sl_card

To know more about this project connect with us

Guidance for Distributed Model Training on AWS