The Sequence Pulse: A Deep Look Into How Yelp Uses Jupyter Notebooks at Scale
An overview of the evolution and architecture of Yelp's JupyterHub
Jupyter notebooks have become an omnipresent components of machine learning(ML) infrastructures. Today, I would like to deep dive into the Jupyer Hub ecosystem at Yelp which might be one of the most complete implementations (publicly documented) I have seen in a while. I think this will serve as inspirations to many data science teams building agile, lightweight infrastructures for ML experimentation.
Yelp's technical ecosystem relies heavily on Apache Spark and JupyterHub for diverse use-cases, including batch processing and interactive activities such as feature model building, ad-hoc data analysis, template sharing, on-boarding material creation, visualizations, and sales reporting.
The Evolution
In the early stages, Jupyter usage at Yelp was limited to individual-level iPython notebooks. As the scale of usage expanded, the challenges associated with managing use-cases at the organizational level became evident.