🕑 Edge#49: An Intro to Time-Series Forecasting

TheSequence is a convenient way to build and reinforce your knowledge about machine learning and AI

Dec 22, 2020

In this issue:

we provide an introduction to time-series forecasting models;
we discuss how Uber uses neural networks to forecast during extreme events;
we explore Uber’s M3 time-series platform.

Enjoy the learning!

💡 ML Concept of the Day: An Intro to Time-Series Forecasting

In the next series of TheSequence Edges, we would like to dive deep into the subject of time-series forecasting. Initially considered one of the classic use cases for machine learning, time-series forecasting methods are surprisingly tricky to master. Part of the challenge is that time-series forecasting is one of those disciplines that expands from classical statistics to modern deep learning. As a result, the number and diversity of methods are overwhelming. At the same time, it feels as if recent advancements in deep learning research haven’t done as much for time-series forecasting compared to the progress in other disciplines, such as computer vision or natural language understanding. Nonetheless, time-series forecasting remains one of the most popular use cases for machine learning techniques.

How to understand time-series forecasting? Conceptually, a time-series forecasting model attempts to predict the value of a target variable for a given entity at a given time. Typically, entities represent logical groupings of temporal information, such as the orders in a stock order book or the measurements from a temperature sensor. The two most important dimensions for understanding a time-series forecasting model are the nature of the problem and the methods used. Even though there are many types of time-series problems, most of them fall into one of the following categories:

Univariate: Problems that model a single series of information over time.
Multivariate: Problems that model multiple, inter-related information over time.
Multi-step: Problems that attempt to forecast multiple steps into the future.
Multivariate, Multi-step: Problems that forecast multiple steps into the future for different series.
Classification: Problems that predict a discrete class given an input time-series.

From traditional statistics, time-series forecasting methods can be classified using the following categories:

Benchmark Forecasting: Methods such as naïve forecast or geometric random walk that build up forecasting intuition by adding additional layers of complexity. These methods are rarely used in complex scenarios.
Exponential Smoothing Forecasting: Methods that remove the variability within a series. This group includes techniques such as simple exponential smoothing or Holt’s linear trend.
Autoregressive Forecasting: Methods such as the famous ARIMA or SARIMA that focus on using observations from previous time steps in several layers of regressive models.

In recent years, deep neural networks have become one of the most effective mechanisms to apply to time-series forecasting problems. Techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs) and even attention-based models are rapidly expanding the different categories of time-series forecasting techniques. These days, it’s very common to find techniques such as long-short-term memory networks (LSTMs) and CNNs attacking the same problems typically handled by models such as ARIMA or SARIMA. We will learn more about them in the next few editions of this newsletter.

🔎 ML Research You Should Know: How Uber Uses Time-Series Forecasting to Predict Extreme Events

In a paper titled Time-series Extreme Event Forecasting with Neural Networks at Uber, researchers from Uber present a neural network architecture used to forecast during extreme events.

The objective: Extreme events and anomalies are part of most businesses. Uber’s paper illustrates the techniques they used to perform accurate forecasting of those events.

Why is it so important: The paper shows the use of neural network architecture to enable time-series forecasting in high variance conditions that deviate from the norm.

Diving deeper: Performing during extreme events is the ultimate test for time-series forecasting models. This is even more important for businesses like Uber in which extreme events are more regular than you think. Pick your favorite: holidays, concerts, sports events, bad weather. All those events can affect user demand and rider-availability to a point of disrupting the service. From that perspective, forecasting during extreme events at Uber is a key requirement rather than a luxury.

As you might be suspecting by now, Uber had to architect their own forecasting models as most standard time-series forecasting packages fail to perform during anomalous conditions at the scale and frequency of the transportation giant. Most statistical methods require manual tuning to set extreme event parameters. Other methods facilitate the modeling of extreme events using exogenous variables, but they suffer from the curse of dimensionality and require frequent retraining. When faced with this reality, Uber turned its attention to Long-short-term memory (LSTM) networks as one of the most powerful architectures for time-series forecasting.

We discussed LSTMs in Edge#41 and Edge#43. Conceptually, LSTMs are a variation of recurrent neural networks that use memory cells to store past information, which makes them a good candidate for time-series forecasting. LSTMs are able to perform with large and multi-dimensional datasets. With an architecture decision in mind, Uber then assembled a large dataset with trips across multiple cities containing many extreme events. The dataset included many exogenous variables, including weather conditions such as precipitation or wind speed as well as city-level information, like trips in progress at any given time within a specific geographic area, in addition to local holidays or events. Extreme events are, by definition, infrequent, which makes them difficult to forecast. The first architecture implemented by Uber was a single LSTM which, to the frustration of the Uber engineering team, failed to outperform baseline time-series models after being trained in the aforementioned dataset.

The failure of the basic LSTM method to forecast during extreme events should not come as a surprise. Training a single model per time-series for millions of metrics is impractical. Additionally, the single LSTM model repeatedly failed to adapt to out of sample data of extreme events, which led to poor performance. To address those challenges, Uber switched to architecture with two LSTMs, one to model uncertainty via features and the other one to produce the forecast. The first LSTM uses automated feature extraction, which is key to model complex events. Those feature vectors are then aggregated using an ensemble technique, which is passed to the second LSTM to produce a forecast.

Image credit: Columbia.edu

The new architecture outperformed the single LSTM model by over 14% and the classical time-series models by over 25%. The architecture produced by Uber shows the potential of deep neural networks in complex time-forecasting events and can be applicable to many other domains.

🤖 ML Technology to Follow: M3 is the Platform Powering Time-Series at Uber

Why should I know about this: One of the most important challenges of time-series forecasting scenarios is how to capture and store time-series data. M3 was created and open-sourced in order to capture real-time metrics across its business operations. These metrics are then processed by many machine learning models.

What is it: Time-series data is a core element of the Uber experience across its different apps. As a result, time-series analysis seems to be multiplying more relevantly than on other types of large scale businesses. Initially, Uber relied on traditional time-series stacks such as Graphite, Nagios, StatsD and Prometheus to power their time-series metrics. While that technology stack worked for a while, it was not able to keep up with Uber’s stratospheric growth, so by 2015, the company was in need of a proprietary time-series infrastructure. That was the origin of M3, a scalable, low latency time-series processing and storage platform that has become one of the most important building blocks of Uber forecasting architecture.

High scalability and low latency are key principles of M3 architecture. At any given second, M3 processes 500 million metrics and persists another 20 million aggregated metrics. Extrapolating those numbers to a 24-hour cycle indicates that M3 processes around 45 TRILLION metrics per day, which is far beyond the performance of any conventional time-series infrastructure. To handle that throughput, M3 relied on an architecture based on the following components:

M3DB: M3DB is a distributed time-series database that provides scalable storage and a reverse index of time-series. It is optimized as a cost-effective, reliable real-time and long term retention metrics store and index.
M3Query: M3 Query is a service that houses a distributed query engine for querying both real-time and historical metrics, supporting several different query languages. It is designed to support both low latency real-time queries and queries that can take longer to execute, aggregating over much larger datasets, for analytical use cases.
M3 Aggregator: M3 Aggregator is a service that runs as a dedicated metrics aggregator and provides stream-based downsampling, based on dynamic rules stored in etcd (a consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data).
M3 Coordinator: M3 Coordinator is a service that coordinates, reads and writes between upstream systems, such as Prometheus and M3DB.
M3QL: A query language optimized for time-series data.

The relationship between the core M3 components is shown in the following figure:

Image credit: M3DB

Among the components of Uber’s M3 architecture, M3DB has seen popular adoption within the time-series forecasting community. The time-series storage engine was natively built in Go, features native distributed storage as well as in-memory querying capabilities. All these features make it a powerful option for time-series forecasting architectures.

Following its release by Uber, M3 has been adopted by other technology giants such as LinkedIn and Walmart. This is certainly a technology to keep on your radar when thinking about building time-series forecasting solutions.

How can I use it: M3 Is open-sourced and available at https://m3db.io/

🧠 The Quiz

Now, to our regular quiz. After ten quizzes, we will reward the winners. The questions are the following:

Which type of time-series forecasting model uses several linear regression models to predict the value of one variable from previous observations?
Which neural network architecture Uber relied on for forecasting during extreme events?

Check your knowledge

That was fun! Thank you. See you on Thursday 😉

TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. TheSequence keeps you up to date with the news, trends, and technology developments in the AI field.

5 minutes of your time, 3 times a week – you will steadily become knowledgeable about everything happening in the AI space.

TheSequence