👁 Edge#212: Inside the Masterful CLI Trainer, a low-code CV model development platform

Jul 28, 2022

On Thursdays, we deep dive into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new AI developments and introduce the platforms that deal with the ML challenges.

💥 Deep Dive: Inside the Masterful CLI Trainer, a low-code Computer Vision model development platform

Deep learning has advanced the field of computer vision (CV) dramatically in the past decade. New research and innovative solutions come out every week, making it possible for millions of people to benefit from CV in areas like automotive safety systems, medical imagery analysis, security, manufacturing quality control, and geographic information systems. The applications for CV keep growing because, ultimately, computer vision has the potential to disrupt and enhance every task that currently relies on humans to process and interpret visual information. But there are a lot of challenges in building models that can accurately recognize and classify images. Just recently, in Edge#194, we’ve covered Masterful AI, a startup that tackles these challenges. After publishing our analysis, the Masterful AI team released a new interface, CLI Trainer, which deserves an additional deep dive. But first, let’s discuss why it’s still so hard today to build a CV model.

Challenges Building Computer Vision Models Today

As powerful as CV can be, until now there have really been only two options to build these CV models, each with significant drawbacks.

The first pathway is hiring ML experts and building custom software. There are major drawbacks to this approach.

The effort to recruit ML engineers. ML engineers are some of the most in-demand and difficult to recruit engineers. Building a small ML team might require holding on any CV development for a year while the team is being recruited.
Labeling large amounts of data. Labeling has rapidly diminishing returns, requiring larger and larger budgets to improve model accuracy. It can end up being a vicious cycle, requiring more and more of an engineering organization’s budget.
Extensive and time-consuming hand-tuning of models. This is tedious work that feels less like being a software engineer and more like being a human grid search algorithm.
The primitive nature of PyTorch/Tensorflow. ML developers spend an inordinate amount of time debugging long stack traces, complaining about low-level issues like tensor shape mismatches. Although Python is a powerful high level language, PyTorch and Tensorflow more closely resemble programming assembly.
Maintaining training pipeline code. Training pipeline code is always fragile and tedious to debug, becoming a codebase that no one wants to touch.

The other pathway is relying on Cloud AutoML platforms. Although these platforms can be a good way to build a prototype model, their limitations become clear once a production application is attempted:

Deliver subpar performance as measured by accuracy. Cloud AutoML platforms lack advanced algorithms and result in low-performance models. This kicks off a vicious cycle of requiring more labeled training data than necessary, increasing the total cost.
Takes away developer control. There is essentially no ability to customize the result to trade off latency, throughput, and accuracy to match an application’s requirements.
No flexibility for inference. These platforms are designed for simplicity and only provide a cloud-hosted endpoint, making them unsuitable for edge use cases or applications that require higher levels of security.
Are at risk of end-of-life. The cloud providers make the bulk of their revenues on AI through structured/tabular data platforms since it aligns so well with the well-established industries of OLAP, data warehousing, and analytics. The result is an underinvestment in AutoML solutions for computer vision.

A Better Approach to Building CV Models

Then Google AI Researcher Sam Wookey realized that companies without the unlimited engineering and data labeling budgets of Google needed a way to harness the power of deep learning based computer vision. He started Masterful AI to address the problems he saw in the existing approaches to building CV Models. Improving the user experience, the team recently launched The Masterful CLI Trainer, a low-code, command line function to train models.

It differs from the cloud-provider AutoML services in a key way: Masterful is designed to be data-centric. The focus is on extracting the most information from a customer’s data, both labeled and unlabeled. The input is data and a short YAML configuration file; the output is a trained model saved to disk. Execution occurs on customer hardware (or customer-provisioned cloud instances) and the only hardware requirements are standard GPU-accelerated Linux.

The packaging of the Masterful CLI Trainer represents a modern approach to building CV models. Like many other capabilities that started off requiring developers to implement their own tooling but eventually evolved into well-productized software component (e.g. databases, application servers, credit card processing services, authentication, autoscaling, etc), Masterful represents the productization and simplification of the essential goal: an accurate CV model. Just as today’s developers would consider using MySQL, Stripe, NodeJS, or Kubernetes before attempting to implement something bespoke, Masterful’s team’s ambition is to become the first choice for developers when they are building computer vision capabilities into their applications.

Not Just A YAML Wrapper Around an API

The YAML configuration file is descriptive, not prescriptive. It only describes a small number of necessary metadata fields and asks the developer to define the objective for the platform, such as an accurate binary classification model. The YAML never grows longer than the following 19 lines (aside from comments, of course). The key design philosophy is that the YAML is not just a wrapper that parallels entire Tensorflow/PyTorch modules: the YAML truly represents a higher level of abstraction.

dataset:

 root_path: s3://masterful-public/datasets/cifar10

 splits: [train, val, test, unlabeled]

 label_map: label_map

 optimize: True

model:

 architecture: efficientnetb0_v1_small

 num_classes: 10

 input_shape: [32,32,3]

training:

 task: classification

 training_split: train

 validation_split: val

 unlabeled_split: unlabeled

output:

 formats: [saved_model, onnx]

 path: ~/model_output

evaluation:

 split: test

Power Under-The-Hood

Despite the simple interface, under the hood the platform is built using original, state-of-the-art algorithms to extract the maximum accuracy using the least amount of labeled data.

One of the important differentiating techniques is semi-supervised learning (SSL). A high-performance, robust implementation of SSL algorithms is very complex and time-consuming. Masterful excels in productizing SSL for CV. For the budget holder of the CV project, by using SSL, labeling budgets can be reduced by an order of magnitude. For developers, improving model accuracy no longer relies on seeking approvals from budget holders but is now entirely within the developer’s control, namely, accessing unlabeled data and kicking off a new training run.

Saving Developer Time

Another thing that we liked about The Masterful CLI Trainer is that it saves developers from time spent on experimenting with hyperparameters and the associated use of experiment tracking tools. Essentially, Masterful has a built-in set of experiments to automatically tune hyperparameters for maximum accuracy and training speed. By making these automated experiments specific to CV, rather than black box, the product is able to implement optimized and standardized hyperparameter tuning that runs orders of magnitude faster than traditional black box approaches.

Finally, the Masterful CLI Trainer bakes in many best practices like high-speed data preprocessing to maximize GPU utilization, the creation of separate training and inference models, automatically evaluating performance on a hold-out set, and visualizing the results. This saves developers the time of writing and debugging boiler-plate code.

Conclusion

With its intuitive interface, the Masterful CLI Trainer is the low-code CV model development platform that delivers on the promise of AutoML’s simplicity while delivering production-ready results. It’s free for personal use and commercial evaluations at masterfulai.com. A demo can be requested by contacting learn@masterfulai.com

TheSequence

Discussion about this post