Linkedin

Scalable TensorFlow inference system

Project Overview

Project Detail

This reference architecture series describes how you can design and deploy a high performance online inference system for deep learning models by using an NVIDIA® T4 GPU and Triton Inference Server.Using this architecture, you can create a system that uses machine learning models and can leverage GPU acceleration. Google Kubernetes Engine (GKE) lets you scale the system according to a growing number of clients. You can improve throughput and reduce the latency of the system by applying the optimization techniques that are described in this series.

 

https://cloud.google.com/architecture/scalable-tensorflow-inference-system

To know more about this project connect with us

Scalable TensorFlow inference system