TheSequence

TheSequence

Share this post

TheSequence
TheSequence
The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale

The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale

An overview of the evolution and architecture of Yelp's JupyterHub

Aug 09, 2023
∙ Paid
22

Share this post

TheSequence
TheSequence
The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale
1
2
Share
Created Using Midjourney

Jupyter notebooks have become an omnipresent components of machine learning(ML) infrastructures. Today, I would like to deep dive into the Jupyer Hub ecosystem at Yelp which might be one of the most complete implementations (publicly documented) I have seen in a while. I think this will serve as inspirations to many data science teams building agile, lightweight infrastructures for ML experimentation.

Yelp's technical ecosystem relies heavily on Apache Spark and JupyterHub for diverse use-cases, including batch processing and interactive activities such as feature model building, ad-hoc data analysis, template sharing, on-boarding material creation, visualizations, and sales reporting.

The Evolution

In the early stages, Jupyter usage at Yelp was limited to individual-level iPython notebooks. As the scale of usage expanded, the challenges associated with managing use-cases at the organizational level became evident.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share