🍮 Edge#147: MLOPs – Model Serving
plus overview of the TensorFlow serving paper and TorchServe
In this issue:
we explain what model serving is;
we explore the TensorFlow serving paper;
we cover TorchServe, a super simple serving framework for PyTorch.
💡 ML Concept of the Day: Model Serving
Continuing with our MLOPs series, we would like to focus on the serving of machine learning (ML) models. Model deployment/serving can be considered one of MLOps pipelines' most challenging aspects. This is partly because model serving architectures have little to do with data science and are more related to ML engineering techniques. Some ML models take hours to execute, requiring large computation pipelines, while others can be executed in seconds on a mobile phone. A solid ML serving infrastructure should be able to adapt to diverse requirements from ML applications.
Many unique requirements can influence a model serving architecture. Throughout the last few years, we have seen four fundamental model serving patterns emerging in modern ML architectures: