The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale

An overview of the evolution and architecture of Yelp's JupyterHub

Aug 09, 2023

∙ Paid

Jupyter notebooks have become an omnipresent components of machine learning(ML) infrastructures. Today, I would like to deep dive into the Jupyer Hub ecosystem at Yelp which might be one of the most complete implementations (publicly documented) I have seen in a while. I think this will serve as inspirations to many data science teams building agile, lightweight infrastructures for ML experimentation.

Yelp's technical ecosystem relies heavily on Apache Spark and JupyterHub for diverse use-cases, including batch processing and interactive activities such as feature model building, ad-hoc data analysis, template sharing, on-boarding material creation, visualizations, and sales reporting.

The Evolution

In the early stages, Jupyter usage at Yelp was limited to individual-level iPython notebooks. As the scale of usage expanded, the challenges associated with managing use-cases at the organizational level became evident.

TheSequence

The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale

An overview of the evolution and architecture of Yelp's JupyterHub

The Evolution

This post is for paid subscribers