🎙 Piero Molino on creating Ludwig and the Importance of Low-Code ML

No subscription is needed

Ludwig, the low-code ML stack pioneered by Uber AI, released its 0.4 version last week. We talked to Piero Molino, one of the minds behind the Ludwig project, about his background; the importance of low-code ML; his most ambitious ideas about low-code ML and important breakthroughs that are needed to get there. Read on and share with your communities.


👤 Quick bio / Piero Molino

Tell us a bit about yourself. Your background, current role and how did you get started in machine learning (ML)?

Piero Molino (PM): I always liked to be at the intersection of multiple fields; that’s where interesting ideas are born (at least the subset of my ideas that I find more interesting). I got into machine learning through recommender systems. I was really intrigued (and still am right now) by their power in nudging human decisions and behavior (for good or bad) and the human-in-the-loop aspect that makes the recommender systems one of the most interesting ML applications. 

My background is in open domain question answering, which is kind of an intersection of NLP, machine learning (learning to rank in particular), and information retrieval. Over time I drifted more towards machine learning systems and abstractions, working at IBM Watson first, then co-founding Uber AI and creating Ludwig, and most recently being a staff research scientist at Stanford. 

My current role is the co-founder of Predibase, a new company in the ML space, co-founded with serial entrepreneurs, open-source maintainers, and ML industry veterans, which I'm really excited about! (We are hiring, reach out: team@predibase.com) 

🛠 ML Work  

You are famous for being one of the minds behind the Ludwig project, which was incubated at Uber and now is part of the Linux Foundation. What makes low-code ML such an important challenge?

PM: My take is that technologies change the world only when they become usable by people who aren’t able to build them. Cars would not have become ubiquitous if only mechanical engineers could drive them. So I believe low-code ML is one step in the direction of making ML more accessible and ubiquitous. 

The other aspect is speed: low-code ML can significantly reduce the time required for a machine learning project to go from the ideation phase to deployment in production, which creates the opportunity for lowering costs and for many more ML projects to be tackled. 

What role the methods such as neural architecture search, meta-learning, and AutoML can play in the future of low-code ML?

PM: I believe one possible future of low-code ML is declarative. With Prof. Chris Ré, we recently published an opinion paper about the lessons we learned by building such declarative ML systems and what we believe the future will look like (to be published on ACM Queue). 

NAS, meta-learning, and AutoML play an important role in a declarative system of commoditizing model choice, which is an important part of the ML process today. They also help refocus the attention on the data, its quality, and how it impacts learning systems. 

One of the things that I enjoyed about Ludwig is the integration with other ML stacks. Is the role of a low-code ML stack to abstract the interactions with best-of-breed ML technologies? How do you select the best stacks to enable a low-code ML experience in the current super-fragmented ML market?

PM: I'm particularly interested in interfaces. If you choose the right level of abstraction, those choices of components in the stack become interchangeable implementation details. 

Think about SQL. The same query can (almost) be run in most relational databases, on Hive, on Apache Spark, and now on many NoSQL databases. Each of them has its different tradeoffs, but the interface is common and familiar, allowing users to switch easily to a solution that better suits their application needs. 

I believe this direction is where machine learning is heading. 

In Ludwig, you can see some steps in that direction: the declarative configuration users provide could, in theory, be implemented by several different pipelines that use many different models, components, and frameworks. 

In Ludwig, we made an opinionated choice based on our experiences of what components and technologies to use for some stack pieces, but in the recently released v0.4, we abstracted away some of these decisions. For instance, data preprocessing and postprocessing previously happened in Pandas, while now there's a new abstract concept of the backend that can support multiple DataFrame libraries, like Dask, Modin, potentially NVTabular, that provide different tradeoffs (in-memory single-machine performance vs distributed performance vs GPU machine performance). 

On the other hand, the market is so large and abstractions so new that covering every option is not feasible, but after this Cambrian explosion, I believe, we'll experience a phase of consolidation where interfaces will become more well-defined and standardized, which will favor interoperability. 

Ludwig seems to have specialized on TensorFlow 2. Is it practical to enable a consistent low-code ML experience across different deep learning frameworks such as TensorFlow, PyTorch, and MxNET?

PM: As we did for DataFrame libraries, we are approaching the idea of having multiple tensor computation backends. Other libraries, like Thinc, do a great job with that already. The issue is that though TensorFlow 2 and PyTorch have relatively similar interfaces (object-oriented Keras model/layer and PyTorch Module are pretty similar for instance), the differences among them could make one end up doubling the amount of resources and time for supporting parallel implementations (like Huggingface Transformers does for instance). The question we often ask ourselves among Ludwig maintainers is about the best use of our time. Is it adding an additional backend to support PyTorch (or MxNET, Jax, and other future technologies), or is it adding new features to the current implementation? 

Right now we are exploring both directions. 

What are some of your most ambitious ideas about low-code ML and what important breakthroughs are needed to get there?

PM: I think the most ambitious idea in Ludwig is to have a unified solution that works across different modalities (tabular data, text, images, audio, time series, and more), also with multiple modalities at the same time and across different machine learning tasks. 

The current implementation shows that it is feasible, although more work is needed to cover more data types and machine learning tasks. 

Going beyond Ludwig my ambition is to make ML systems a glass box, where experienced users can peek inside and change things, while people without machine learning background can use them without caring about the implementation details. And this glass box should work across datasets, tasks, domains, modalities and will ideally unlock entire organizations to collaborate on ML projects. The final goal is to get to the point where anyone can use ML, even people who don't know what convolution is and people who don't know how to code at all. This is the direction I'm following with Predibase. 

TheSequence is the #1 AI Newsletter on Substack. Subscribe to receive our Premium part and become more knowledgeable about ML and AI every week.

💥 Miscellaneous – a set of rapid-fire questions  

Is the Turing Test still relevant? Is there a better alternative? 

PM: It depends, relevant to what? 

As a valuable metric of progress toward A(G)I, probably it has never been relevant. Passing it would tell us more about how easy it is to fool humans than how smart machines are. 

As a thought experiment for determining what does it mean to be intelligent (is it sufficient to fool a qualified majority into thinking an agent is intelligent to make them intelligent?), it is as relevant as when it was formulated, as is Searle's Chinese room argument and the derived conversation on the nature of intelligence. 

Favorite math paradox? 

PM: Not sure you would count it as a math paradox, but my favorites are certainly Zeno's paradoxes. I discovered them independently as a kid when participating in the math olympiads. Then I learned about them when studying philosophy in high school. Finally, I discovered them again when reading “Gödel, Escher, Bach” by Douglas Hofstadter in college. By the way, that was the book that most likely kickstarted my interest in AI. So I would say Zeno's paradoxes have been a constant presence throughout my life. 

Any book you would recommend to aspiring data scientists?

PM: Many people can suggest practically useful books for aspiring data scientists, I would actually suggest a book I found inspiring albeit not directly applicable. 

One is "Who owns the future?" by Jaron Lanier, which made me think deeply about the economical, ethical, and societal implications of my work. 

The other one is "Why greatness cannot be planned" by my former coworkers and good friends Ken Stanley and Joel Lehman. It is a thought-provoking take on our society's obsession with metric measurement and optimization of goals that are ill-defined. It all started from an unexpected discovery at their work in artificial intelligence. 

Is P equal NP? 

PM: The important question is: can we prove it either way?