🔴⚪️ Edge#72: Tecton – The Enterprise Feature Store Built by the Creators of Uber's ML Platform

Mar 18, 2021

What’s New in AI, a deep dive into one of the freshest research papers or technology frameworks that are worth your attention. Our goal is to keep you up to date with new developments in AI in a way that complements the concepts we are debating in other editions of our newsletter.

💥 What’s New in AI: Tecton – Enterprise-Grade Feature Store Platform built by the same team that built Uber's Michelangelo

Feature stores (Feast) are becoming one of the hottest topics in modern machine learning (ML). In recent years, we have seen dozens of well-capitalized startups as well as ML incumbents like AWS enable feature store capabilities in their platforms. The origins of feature stores can be traced back to Uber’s Michelangelo platform, which is the backbone of Uber’s ML capabilities. Arguably, Michelangelo was the first documented ML platform to introduce a centralized catalog to manage the lifecycle and reusability of features across a large number of ML models. Inspired by the results of the first versions of Michelangelo’s feature store, part of the team decided to venture on their own to build a platform that offers those capabilities to enterprises and startups embarking on their ML journey. That was the origin of Tecton, one of the most robust feature store platforms in the current ML ecosystem.

Feature store is both an overloaded and oversimplified term in the context of ML solutions. On one extreme, we have the groups that say any form of basic feature management in ML architectures can be catalogued as a feature store. On the other end, most people mistakenly associate feature stores with a simple database that stores metadata related to features and, quite often, they question whether we need a separate platform just to handle that use case. Both extremes are wrong. Granted, the term feature store might not be the best marketing definition, but we should still think about this type of platform not as a simple feature database but rather, as a central backbone of the lifecycle management of ML models.

Understanding Feature Stores

The reason feature stores are a key component of MLOps infrastructure is because a large percentage of the challenges in the lifecycle of ML models revolves around data and features. In any large ML team, data scientists spend most of their time extracting, selecting and transforming data into features and then figuring out how to incorporate those features into production-ready ML models. In an ML pipeline, a feature store can be seen as the missing link between feature engineering and feature serving.

In its simplest expression, a feature store enables a centralized catalog of production-ready features that can be easily incorporated into ML models. Taking a more functional perspective, we can outline five key capabilities that feature stores incorporate into any ML pipeline.

Feature Transformation: Orchestrate data transformation to process new data into features.
Feature Storage: Persisting and querying feature data so that it can be accessed from different ML models.
Feature Serving: Serving feature data to ML models.
Feature Monitoring: Monitor specific feature metrics that can impact the performance of ML models.
Feature Registry: An interface to interact with features and their metadata.

Feature stores subscribe to the mantra that well-managed features are conducive to better ML models. The realization of the importance of feature management in modern ML solutions has triggered an explosion in the number of feature store platforms. In just a few years, feature stores have gone from being a component of Uber’s Michelangelo architecture to a vibrant segment in the machine learning market. The innovation that started at Uber permeated in feature store initiatives in other tech giants like LinkedIn, Spotify and Airbnb. Gradually, feature stores have evolved as a standalone component of the modern ML ecosystem. Among the new generation of companies leading the charge in feature store innovation, Tecton stands out as one of the most complete, enterprise-ready feature store platforms that can be incorporated into any ML infrastructure.

Tecton

Tecton is an enterprise-grade feature store platform. By enterprise-grade, we refer to Tecton’s ability to deliver the core building blocks of a feature store platform for large-scale ML pipelines while maintaining high levels of reliability. From a feature lifecycle management standpoint, Tecton enables some key functionalities that are becoming essential in modern ML pipelines:

Build features based on real-time and/or batch data.
Serve features to production in a reliable and scalable manner.
Search and discover features across different ML solutions.

Monitor the performance of features over time.

Tecton enables these core capabilities by using a very modern architecture that streamlines the lifecycle of features in production. Data scientists can rely on a rich web interface to search and discover features as well as use a simple programming model, including a Python SDK and a CLI, to incorporate those features into their ML systems. Behind the scenes, the Tecton platform manages the lifecycle of those features from creation to serving.

The Tecton platform enables a number of capabilities, including the key feature store capabilities described in the previous section. In each of those key feature store capabilities, Tecton offers robust functionality:

Feature Serving

Feature serving capabilities in the Tecton platform enable both online and offline feature access for serving and training purposes respectively. Live feature serving is enabled via the Serving API which is backed by Tecton’s low-latency feature database. Offline feature access for training purposes is powered by the Tecton SDK. While both mechanisms have different ways to deliver feature data, they both respect the consistency and state management of the features.

Feature Transformations

Tecton enables transformations to calculate features from raw datasets. The transformation engine can operate against batch datasets from data warehouses like RedShift or Snowflake, streaming datasets from platforms like Kafka or Kinesis as well as operate on-demand calculations against real-time datasets. The Tecton platform manages the regular lifecycle of transformations in order to calculate the current and historical values of features.

Data Storage

Similar to the serving model, Tecton provides offline and online stores to persist feature data. The offline store captures historical feature values across time and is mostly used for training purposes. Tecton’s offline feature store is based on Amazon S3. The online feature store is optimized for low-latency data access and is based on DynamoDB.

Operational Monitoring

Tecton provides native capabilities to monitor the operational performance and quality of features. For each aspect, Tecton enables a series of key metrics that allows data scientists to monitor the behavior of the different features in an ML pipeline.

Feature Registry

Tecton’s feature registry is a centralized catalog that delivers a single-source-of-truth for the features used in an ML infrastructure. The feature registry is the entry point for data science teams to search, publish and interact with feature definitions.

Conclusion

Tecton has been one of the pioneers of the feature store space and has captured a lot of the initial momentum in the market. The platform can seamlessly integrate with mainstream ML frameworks as well as a large variety of database and streaming platforms. As feature stores evolve, platforms like Tecton are likely to become a more central component of MLOps pipelines. For now, feature stores are definitely a trend to pay attention to and Tecton remains one of the most exciting platforms in this nascent space.

We are happy to support Apply() conference presented by Tecton. This is a free virtual event on data engineering for applied Machine Learning. Register here. The lineup is great.

🧠 The Quiz

Every ten quizzes we reward two random people. Participate! The question is the following:

Which of the following statements is a more accurate description of Tecton’s feature storage architecture?

Check your knowledge

TheSequence

Ready for more?