Edge#2: AutoML, AutoML-Zero and the spell of TransmogrifAI
This issue is dedicated to AutoML, one of the most popular topics in modern ML:
we explain the concept of AutoML;
we discuss AutoML-Zero, that proposes a method to expand the frontiers of AutoML models;
we speak about TransmogriAI, an open-source framework that Salesforce.com used to build Einstein.
💡 ML Concept of the Day: What is AutoML?
Automated machine learning or AutoML is one of the most popular topics in modern ML but, like any overhyped topic, is constantly subjected to misinterpretations. The ideas behind AutoML have been around for years but the concept certainly was raised to popularity by Google with the launch of AutoML services in Google Cloud. The goal of AutoML is to use ML to automate the end to end development of ML programs. In that sense, an AutoML program receives a raw dataset as an input, and, ideally, it should produce a deployable ML model.
Building an ML model is a multistep process that requires domain knowledge, mathematical expertise, and computer science skills. Using a dataset and a target task as an input, AutoML looks to leverage ML to automate tasks such as model selection, feature engineering, hyperparameter optimization, and many other elements of the lifecycle of ML programs.
AutoML is often confused with another important trend in self-service ML known as neural architecture search (NAS). Conceptually, NAS should be seen as a subset but also as an enabler of AutoML. While NAS focuses mostly on automating hyperparameters tuning, AutoML tries to cover the complete lifecycle of ML applications. Advancements in NAS have greatly contributed to catalyze the AutoML movement and both concepts should play an important role to make ML more mainstream.
🔎 ML Research You Should Know About: AutoML-Zero Builds ML Models Starting with Basic Math
AutoML-Zero: Evolving ML Algorithms From Scratch was published in 2020 by Google Research. The paper proposes a method to expand the frontiers of AutoML models.
The objective: To be effective, AutoML requires sophisticated building blocks and very strong constraints. AutoML-Zero attempts to discover ML algorithms starting with basic mathematical operations.
Why is it so important: AutoML-Zero offers a glimpse at how the future of ML can look like. The model is able to discover new ML algorithms without major restrictions in the search space. To some extent, AutoML-Zero can help rediscover ML from scratch.
Diving deeper: AutoML-Zero borrows its name from DeepMind’s Alpha-Zero model that minimized human intervention. Similarly, AutoML-Zero looks to automate the creation of ML models by removing many of the traditional constraints around its search space. Currently, most AutoML algorithms rely on carefully curated building blocks such as algorithm sets and hyperparameter configurations that form its search space. Those building blocks do not only take time to create but also prevent the discovery of brand new algorithms. Typically, you can’t discover what you do not search for. AutoML-Zero takes a step towards addressing that challenge by enabling search for all aspects of ML algorithms, such as the model structure and the learning strategy, without relying on human bias.
We start with basic math and discover ML algorithms. Hmmm, does that sound like evolution to you? Not surprisingly, Google relied on evolutionary algorithms as part of the core design of AutoML-Zero. These algorithms were, somewhat surprisingly, able to find the best performing algorithms for a given task and dataset in a massively large search space.
The use of evolutionary methods is one of the greatest innovations of AutoML-Zero. Given an ML algorithm, AutoML-Zero divides it into three main functions:
Setup Function: Which is responsible for initializing the ML algorithm.
Learn function: Which trains variables used in the prediction task.
Predict Function: Which performs the target prediction.
With those three steps as the starting point, AutoML-Zero relies on an evolutionary algorithm using the following steps:
Step 1: Remove the oldest algorithm.
Step 2: Choose a random subset of the remaining algorithms and select the best performing one.
Step 3: Copy the best algorithm.
Step 4: Mutate the best algorithm.
That process is repeated multiple times in order to find the best performing ML model for a given task.
One of the most fascinating things about AutoML-Zero is that, instead of using building blocks such as convolutional or max-pooling layers in the search space, it relies on basic mathematical operations. That enables the discovery of new ML models while minimizing the human subjectivity and bias involved in the process. Maybe soon AutoML-Zero will rediscover algorithms such as backpropagation and gradient descent on its own. Wouldn’t that be cool?
🤖 ML Technology to Follow: TransmogrifAI is the AutoML Framework Powering Salesforce.com Einstein
Why should I know about this: Salesforce Einstein is one of the largest AutoML solutions in the world powering hundreds of automated workflows for Salesforce customers. TransmogriAI is an open-source framework Salesforce.com used to build Einstein.
What is it: TransmogrifAI is an open-source framework for automating the creation of ML models. The framework is built on Scala and runs on top of Apache Spark. From the design standpoint, TransmogrifAI is based on four fundamental principles:
Automation: TransmogrifAI includes many building blocks that automate tasks such as feature engineering and model selection.
Modularity: TransmogrifAI enforces a clear separation between ML workflows and data manipulation ensuring the modularity of any model is created by using the framework.
Compile-Time Safety: The models produced by TransmogrifAI are strongly typed which allows compile-time safety.
Transparency: The strongly typed nature of TransmogrifAI also enables transparency in terms of the inputs and outputs at any layer of an ML model.
The architecture of TransmogrifAI is based on a series of building blocks that abstracts the creation of ML models. The main building blocks include features, stages, workflows, and readers.
TransmogrifAI features are essentially pointers to columns in a Spark Data Frame. More specifically, a feature definition includes name, the type of data to be found in it, as well as lineage information about how it was derived.
In TransmogrifAI, manipulations performed on Features are called Stages. The current version of the framework includes two fundamental types of Stages: Transformers and Estimators.
Transformers specify functions for transforming one or more Features to one or more new Features. Transformers, basically act as transformations applied to the Feature value of a single row of the input data.
Estimators specify algorithms that can be applied to one or more Features to produce Transformers that in turn produce new Features. The important distinction between Estimators and Transformers is that Estimators have access to all the information in the columns while transformers only act within a row.
Workflows and Readers
After the features have been created, they can be materialized by adding the desired Features to a TransmogrifAI Workflow and feeding it a DataReader. Workflows are the TransmogrifAI component that controls the execution of an ML pipeline. Complementary, DataReaders define how data should be loaded into the workflow. They load and process raw data to produce the Dataframe used by the workflow. DataReaders are tied to a specific data source with the type of raw loaded data.
TransmogrifAI combines those building blocks in a very simple programming model that allows data scientists to produce fairly sophisticated ML models in a few lines of code. The integration with Apache Spark helps with the operationalization and scaling of TransmogrifAI models using a well-established toolset. Finally, we can count that Salesforce will continue evolving TransmogrifAI as the main engine behind its Einstein ML services.
How can I use it: TransmogrifAI is a free open-source and is available at https://github.com/salesforce/TransmogrifAI. The framework has dependencies on Java and Spark.
TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. TheSequence keeps you up to date with the news, trends, and technology developments in the AI field.
5 minutes of your time, 3 times a week – you will steadily become knowledgeable about everything happening in the AI space. Make it a gift for those who can benefit from it.