TheSequence

Share this post

๐ŸŽ™ Jim Dowling/CEO Logical Clocks: The future of feature stores

thesequence.substack.com

Discover more from TheSequence

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data science world. Trusted by 144,485 professionals from the main AI labs, universities, and enterprises
Over 164,000 subscribers
Continue reading
Sign in

๐ŸŽ™ Jim Dowling/CEO Logical Clocks: The future of feature stores

TheSequence interviews ML practitioners to merge you into the real world of machine learning and artificial intelligence

Jan 29, 2021
24
Share this post

๐ŸŽ™ Jim Dowling/CEO Logical Clocks: The future of feature stores

thesequence.substack.com
1
Share

There is nothing more inspiring than to learn from practitioners. Getting to know the experience gained by researchers, engineers and entrepreneurs doing real ML work canย become a great source of insights and inspiration. Weโ€™d like to introduce to youย TheSequence Chatย โ€“ the interviews that bring you closer to real ML practitioners. Please share these interviews if you find them enriching. No subscription is needed.


๐Ÿ‘คย Quick bio / Jim Dowling

Tell us a bit about yourself. Your background, current role and how did youย getย started in machine learning?ย 

Jim Dowling (JD):ย I come from a research background. My PhDย wasย onย Middleware forย distributedย reinforcement learningย back in 2004.ย After myย PhD,ย I worked at MySQLย for a couple of years, thenย as a researcherย at RISEย (Research Institutes of Sweden) andย anย Associate Professor at KTH. As part of my systems research, we builtย Hopsworksย as an open-source data science platformย โ€“ that includes the first open-source feature store for machine learning.ย 

๐Ÿ›  ML Workย ย 

Feature stores have been gaining prominence in the last couple of years. Can youย describe whatโ€™sย the value proposition of a feature store andย whyย are theyย a necessary component of a machine learning pipeline?ย 

JD:ย In order toย serveย models in production, you need to feed them withย (often non-trivial)ย features. Those featuresย are computed from input data, and the code that computes theย featuresย shouldย beย the sameย for both training and serving. You shouldย notย re-implement feature engineering code for serving, asย non-DRY feature engineering code increases the risk of subtle differences in theย implementations that introduceย difficult to track down bugs.ย A solution to this problem is to store computed features in a feature store, and retrieve the same features when training and serving models.ย The feature store then becomes a centralized, enterpriseย platform to manage data (features) for machine learningย โ€“ feature stores have the same role for ML that data warehouses have for analytics.ย 

What should be the three core capabilities of an enterprise-ready feature store?ย ย 

JD:ย 

  • (a)ย Feature stores should provide efficient access to the largeย volumes of (potentially historical)ย features for trainingย modelsย on different data science platforms, and low-latency access to the latest values of features for model serving.ย 

  • (b) Feature stores should be intuitive and easy to use by data scientists and data/ML engineers,ย for example, providing Python APIs to allow them toย browse and understand available features,ย create training data,ย and create new featuresย from either Enterprise data sources or existing features.ย 

  • (c)ย Features to be access controlled, versioned (both schema version and data versioning),ย governed, and easily discovered.


๐Ÿ”บ๐Ÿ”ปย Subscribe to our Premium newsletter โ€“ TheSequence Edge, a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. Stay up to date with the news, trends, and tech developments in the AI field.ย Very practical. No hype.ย ๐Ÿ”ป๐Ÿ”บ


In the long-term,ย are feature storesย aย standaloneย productย or a featureย (interesting choice of wordsย ๐Ÿ˜‰ย )ย ofย broader ML platforms?ย 

JD:ย I donโ€™t think we have even answered the question of whetherย data warehousesย are justย part of larger analytics pipelines, yet. Feature stores are much newer and will be standalone products for the next couple of years.ย But,ย ML pipelines will benefitย hugelyย from end-to-end provenance for debugging, governance, and reproducing models. The feature store will need toย beย tightly integrated into those ML pipelinesย and the platforms used to develop and operate those ML pipelines.

How do techniques like representationย learning,ย that can learn features from a given dataset,ย influenceย the future of feature stores?ย 

JD:ย I donโ€™t think they have a direct bearing onย theย systemย architecture ofย feature stores themselves.ย It is already the case that feature storesย ingestย โ€˜baseโ€™ features from whichย manyย derived features are created by data scientists. There may beย value inย automated feature engineeringย toย reduceย the manual effort in identifying and creating downstream features. However, deep learning shows us that a lot of feature engineering can be done in model training with appropriate model architectures, soย Iย do notย expectย automatedย feature engineering will be the next big thingย for feature stores.ย 

Big technology platforms like AWS have recently entered the feature store spaceย which also includes well-funded startups like Tecton. Howย do you see theย competitive landscape in the near future?ย 

JD:ย The first feature stores, developed at Uber andย AirBnb, used domain-specific languagesย (DSLs)ย to support feature engineering forย constrained domains. Now,ย Enterpriseย feature stores need to support a wider set of clients and use cases and DSLsย are not flexible enoughย โ€“ Pythonย languageย APIsย areย dominating, and most platforms are converging on aย Dataframeย API (Pandas andย (Py)Spark)ย that weย first introduced inย Hopsworks.ย We expect that there will be one or twoย dominantย open-source feature stores (Hopsworksย and Feast, maybe) thatย will become more widely usedย asย more models need to be put in production. We also expect there will beย managedย feature store platforms on every cloud provider this year.ย Currently, there isย Sagemakerย Feature Store and Tecton available on AWS. Hopsworks.ai is available on both AWS and Azure, and Google announced that they would release a managed feature store, soon. Databricks will also release a feature store in 2021.ย ย 

๐Ÿ’ฅ Miscellaneousย โ€“ a set ofย rapid-fireย questionsย ย 

TensorFlow orย PyTorch?ย 

JD:ย Itโ€™s not 2017 anymore. In 2021, theyย areย practically the same.ย If I have to choose, TensorFlowย for its Enterprise capabilities. ย 

Favorite math paradox?

JD:ย 75% of people think they are smarter/more-attractive than average.

Anyย bookย you wouldย recommend to aspiring data scientists?

JD:ย Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurรฉlienย Gรฉron.

Is P equals NP?

JD:ย The systems research adage doesnโ€™t help much here: โ€œdonโ€™t guess, measureโ€.ย 

Share


TheSequenceโ€™sย goal is to make you smarter about artificial intelligence.ย 5 minutes of your time by a newsletterย โ€“ you steadily become knowledgeable about everything happening in the AI space.ย Subscribe to receive it straight into your inbox.ย Support the projectย and our mission to simplify AI education, one newsletter at a time. Thank you.

24
Share this post

๐ŸŽ™ Jim Dowling/CEO Logical Clocks: The future of feature stores

thesequence.substack.com
1
Share
1 Comment
Share this discussion

๐ŸŽ™ Jim Dowling/CEO Logical Clocks: The future of feature stores

thesequence.substack.com
Bala Subramanian
Writes Balaโ€™s Newsletter
Jan 29, 2021Liked by Jesus Rodriguez

Interesting interview Could feature stores evolve into self enhancing AI?

Expand full comment
Reply
Share
Top
New
Community

No posts

Ready for more?

ยฉ 2023 Jesus Rodriguez
Privacy โˆ™ Terms โˆ™ Collection notice
Start WritingGet the app
Substack is the home for great writing