📝 Guest post: Introducing Low-Latency Streaming Pipelines for Real-Time ML

Real-time ML is valuable for any use case that requires fresh data, such as fraud detection, product recommendations, and pricing use cases

Aug 20, 2021

In TheSequence, we like to experiment with different formats, and today we introduce TheSequence Guest Post. Here we will give space to our partners to explain in detail what machine learning (ML) challenges they help deal with. In this post, Tecton’s team talks about real-time ML, the common challenges of processing ML features from streaming sources, and introduces low-latency streaming pipelines for real-time ML. That’s a useful explanatory article from the feature store pioneers; give it a read.

What is Real-Time ML?

Real-Time ML means that predictions are generated online, at low latency, using real-time data; new events from the data sources are reflected in real-time in the model’s predictions. Real-time ML is valuable for any use case that requires fresh data, such as fraud detection, product recommendations, and pricing use cases. These use cases need to make predictions on the events that happened over the past few seconds, not just the past day.

Challenges of Streaming Pipelines for Real-Time ML

Streaming platforms like Apache Kafka and AWS Kinesis are ubiquitous, and the most common data sources for Real-Time ML use cases. They provide raw data with latencies in the order of milliseconds. However, streaming data is much more difficult to operationalize than batch data. It presents the following technical challenges:

Building streaming pipelines. Most data scientists lack the expertise to build streaming pipelines, which require production-grade code and specialized stream processing tools like Spark Streaming, Apache Flink, or Kafka Streams. So data scientists hand off their features to highly skilled data engineers to build custom streaming pipelines that reimplement the data scientist’s feature logic. This process can add weeks or months to the lead time for deploying a new model.
Combining batch and streaming data. The streaming pipelines continuously process fresh feature values, which can be served online for real-time inference. But we still need to process historical data to generate training datasets and backfill the feature values during cold starts. It may take weeks or months of processing to completely populate an online store from the stream without the backfill.
The historical data often doesn’t reside in the stream itself because streaming platforms are typically configured to retain only a limited amount of data. So data engineers need to build batch pipelines that efficiently process large-scale data from offline sources (e.g., data lake, data warehouse) that mirror the stream (event logs or materialized tables of historical value). This, in turn, introduces more complexity. How do we ensure parity between the streaming and batch transformation logic? How do we ensure time consistency between online and offline data? When these requirements aren’t met, features suffer from training-serving skew, which ultimately reduces prediction accuracy.
Processing time window aggregations. As explained in our blog “Real-Time Aggregation Features for Machine Learning (Part 1),” processing rolling time window aggregations for real-time predictions in production poses a difficult problem: How can you efficiently serve such features that aggregate a lot of raw events (>1000s), at a very high scale (>1000s QPS), at low serving latency (<<100ms), at high freshness (<<1s), and with high feature accuracy (e.g., a guaranteed and not approximate time window length)? This is a very hard problem that data engineers have to solve for every aggregation feature that gets rolled out to production.

Because of these challenges, many organizations choose to use only batch data for their features. Alternatively, some organizations decide to build custom streaming pipelines, often at the cost of adding weeks or months to deployment timelines and potentially limiting prediction accuracy due to training / serving skew.

Tecton Low-Latency Streaming Pipelines

Tecton provides powerful primitives that abstract away the complexity of building streaming pipelines. Specifically, Tecton:

Automates streaming ML pipelines. Tecton eliminates the need to build pipelines using stream processing engines like Apache Flink and Spark Streaming. Instead, data teams use simple Tecton primitives to define their transformation logic. Tecton automatically executes the transformation logic in a fully-managed data pipeline, using an appropriate stream processor like Spark Streaming. Tecton delivers sub-second freshness and ensures enterprise-grade uptime, latency, and throughput.
Combines streaming and batch sources to backfill features. Streaming features need to be backfilled to generate training data and populate the feature store during cold starts. Tecton orchestrates batch pipelines to generate the backfills. These pipelines use batch data sources that mirror the streams (event log, materialized tables of historical values). Tecton applies the same transformation logic and ensures point-in-time consistency between the batch and streaming pipelines to eliminate training / serving skew.
Provides efficient and scalable time window aggregations. As outlined in the blog entry “Real-Time Aggregation Features for Machine Learning (Part 2),” Tecton provides an optimized approach to processing time window aggregations - by far the most common feature type used in real-time ML applications. Older events are pre-processed and compacted into tiles, while the latest events are aggregated on-demand at serving time. The optimized implementation combines the benefits of fresh feature values (sub-second), fast serving times (sub-10 ms), cost-efficient compute and memory, and support for backfills. Most importantly, these optimizations come out of the box.

With Tecton, ML teams already using streaming sources can now build and deploy models faster, increase prediction accuracy by eliminating training / serving skew, and reduce the load on engineering teams, as described in the Atlassian case study. ML teams new to streaming can now build a new class of real-time ML applications that require fresh feature values, such as fraud detection models.

Conventional streaming pipelines vs. Tecton low-latency streaming pipelines

The table below summarizes the problems typically encountered with custom streaming pipelines vs. the advantage of using Tecton.

Conclusion

Many organizations want to deploy Real-Time ML but struggle with its operational requirements. Processing real-time data into fresh features is often the hardest part of implementing Real-Time ML. Building custom streaming pipelines can add weeks or months to a project’s delivery time.

At Tecton, we’ve been hard at work building the most complete feature store in the industry. With our new low-latency streaming pipelines, we automate the most difficult step in the transition to Real-Time ML: the processing of features from streaming data sources.

If you’re building Real-Time ML models and want to learn more:

Request a free trial

Let’s discuss your project in more detail.*

TheSequence