The Sequence Chat: Emad Mostaque -Stability AI, Schelling AI- About Open and Decentralized AI
The co-founder and former CEO of Stability AI discusses his new vision for decentralized AI and his new project.
Bio:
Emad Mostaque is widely recognized as one of the leaders in the open-source generative AI movement. He is the former CEO of Stability AI, the company behind Stable Diffusion and numerous open-source generative AI models across different modalities. Stability AI attracted a community of hundreds of thousands of AI researchers and is actively pushing the boundaries of innovation in the field. Emad holds both a BA and an MA in mathematics and computer science from the University of Oxford, followed by a successful career as a hedge fund manager.
After leaving Stability AI, Emad decided to focus on the potential of decentralization. His new project, Schelling AI, combines generative AI and Web3 to enable transparency and trust in the world of foundation models.
🛠 ML Work
You recently stepped down as CEO of Stability AI and coined the phrase, "You can’t beat centralized AI with more centralized AI." I often think of AI as an increasingly centralising force. Why is decentralized AI so important, and what are the key areas where it can contribute to the development of generative AI?
It was clear when setting Stability AI up that absent a catalyst the foundation model space would be dominated by a few very large players. In scaling the company we adopted a similar corporate and structural model to these large players and we can see the arms race that ensues, with giant rounds, pressure on hiring, involvement with global regulators and more.
The landscape has changed over the last few years and its clear that distributed and decentralised AI has its place, for privacy, security, governance and ensuring the benefits of this technology are
Distributed AI is important because by centralising AI and moving towards AGI, we create black boxes which are understood just in terms of inputs and outputs with no understanding of the inner workings or the reasoning behind the models.
However, open-source models means code is transparent and publicly available. A framework where everyone checks on everyone promotes accountability, trust and agency. Everyone has a voice in AI creation and evolution.
A large area of importance for decentralisation is in facilitating and distributing the benefits of research. Transparency in research builds trust and ensures security in the open.
Imagine the healthcare industries of every nation, each with open models and datasets. By allowing everyone to access all medical literature, it would enable the development of unique models that are representative of each community and nationality. For example, a model specifically tailored to Bulgarian cancer research would have a far greater impact on the Bulgarian healthcare system than a generalised American cancer model. Open source means specialisation.
This could be said for every industry, not just healthcare, from finance to education.
Your new project, Schelling AI, is deeply rooted in Web3 and crypto economics. Can you share the vision for Schelling AI and your thoughts on the intersection of AI and Web3?
While I think web3/crypto perhaps deservedly have a bad reputation, much of the advances made in these areas will be directly applicable to our augmented intelligence future.
If we look at Bitcoin, it was the original incentive mechanism for large amounts of highly specialised compute - the total energy usage of the Bitcoin network is 160 TWh versus around 350 TWh for all the global data centres to give you an idea of scale.
This provides an example of an incentive and coordination system that did a job and could potentially be used to provide the compute needed for open, global models for all, owned and governed as widely as possible.
Bitcoin is sometimes also noted as a Schelling Point, a focal point in game theory terms that enables coordination without communication. I think that our future AI systems comprising of billions of robots and trillions of agents will need coordination systems that build from this, from payments (agents are unlikely to have bank accounts) to verification of inputs and outputs and more.
These features echo the capabilities being built into second and third generation distributed ledgers, but I don’t think any of these systems are up to the task of coordinating and supporting AI in health, education, government or any important and regulated sector.
I think we have a real opportunity to design and build an open, distributed and resilient AI system for all incorporating learnings from across the board. If we can do this an a way that is verifiable and trustworthy then not only will we have solved many issues that plague web3, where the key is trust, not decentralisation for decentralisation’s sake, but more importantly we will perhaps solve many of the issues that plague our existing systems but first integrating with them and then reimagine them.
Decentralized AI is not a new concept, yet it has never achieved significant adoption. One could argue that the massive scale of frontier models makes decentralized AI even more challenging now. Why do you believe this time will be different?
The first wave of AI models were based around scale, with relatively poor data being eaten by giant supercomputers that papered over their low quality and achieved remarkable results.
We are now seeing the important of the data put into models, with high quality models beating larger ones on a fraction of the data and splits in performance based on data.
Decentralised AI training of full models will always lag behind centralised clusters due to communication overhead.
However, if base models are trained on these massive clusters, say as LLaMA was, then these base models can be taken and customised and improved on a fraction of the compute. We have seen this with the explosion of fine tunes and their combination and recombination from the community to outperform the base models.
Decentralisation is also highly suitable for data augmentation, particularly asynchronous, model tuning optimisation and many other use areas.
However, the mission isn’t really to decentralise, but to distribute this technology to drive genuinely useful use cases.
I think what it will eventually be is a few large players providing base models as infrastructure and swarms of people and then agents optimising the models and underlying data versus training from scratch in swarms.
How do you view the balance between the race towards massively large models, trillion-dollar GPU clusters, and the need for small, sovereign, decentralized models?
I think this is similar to highly specialised experts, the models the large organisations are trying to build and the team of talented juniors you bring in, which are similar to these smaller models being run locally and on the edge.
There is a concept to satisficing where you reach a level that is good enough and I think small language models have achieved that for many use cases, outperforming giant models from a generation or two ago yet working on a smartphone or laptop.
We have seen from Gemma 27b and other models that you can also use large models to instruct and improve smaller models, something done by Meta for the smaller LLaMA models too.
I think the final space will incorporate all of these varaiations and there won’t be just type of the model out there.
The release of Stable Diffusion completely changed my perspective on the open-source generative AI space. Since then, a lot has happened in the field. How do you see the long-term balance between open and closed models? Can open source really compete on a massive scale?
Closed models will always outperform open models as you can just take an open model and add private data to it (!).
Open has an advantage in spread, optimisation and similar areas over directed centralised models.
I think ultimately they are complementary though, equivalent to hiring your own graduates versus bringing in consultants. It is likely that open will end up being most used if it can keep up in performance terms. Even if it lags somewhat, models are rapidly becoming “good enough” to build around and the next leg of growth is likely to be on products and services to provide and implement this technology as a result.
One of the biggest challenges in open-source AI is the lack of funding, an area where crypto excels. How can crypto’s capital formation and token economics help develop open-source generative AI?
I think we perhaps catalysed large amounts of funding to go into open source AI at Stability AI (!)
While I think exponential compute is likely not needed for the next generation of models, lessons from crypto capital formation from a funding and distribution perspective are instructive.
As noted in question 2 above, Bitcoin has a been a spectacular success in attracting and rewarding specialised compute and energy provision. It has many other issues, but has become institutional and provides some insight into how incentive systems may be employed to provide the compute and funding we need to create genuine public AI infrastructure.
Our healthcare, education, government and similar AI systems should not run on black boxes and should not be controlled by an unelected few. Creating a mechanism to provide the compute and capital needed to build and maintain this infrastructure, which government initiatives are clearly likely not to be able to keep up with, is imperative. It is difficult to see how to do this outside of building a new type of organisations based on prior lessons.
The intersection of Web3 and AI is intellectually fascinating but filled with technical and cultural gaps compared to the mainstream AI space, leading to the creation of many projects without real use cases. Which aspects of the lifecycle of generative AI applications can genuinely be decentralised with today’s Web3 stacks?
Digging into this area I have moved away from decentralised AI towards thinking distributed AI is where it is at, particularly for the implantation of AI technology to important areas of society like health and education.
I am somewhat disillusioned by web3/crypto projects forgetting that the core mission is to build systems that can coordinate in a trust minimised fashion for real world use cases versus decentralisation for the sake of it and relying on speculation.
If we look to the future as outlined in our how to think about AI piece, it is clear that generative AI has a role to play in the future of many areas of the public and private sector.
While this needs to be built on new coordination and alignment infrastructure, it is unclear whether any of the systems we have today are suitable for this.
Where projects today are good is in supply aggregation (eg DePIN for distributed compute), research on governance with DAOs having made all the mistakes of democracy and more and payments, which will be essential as the number of agents and robots increases.
When I think about decentralized AI, I gravitate towards trends such as small foundation models, decentralized inference, and other areas that are still in a relatively nascent state. What technical or research milestones should be achieved to unlock the potential of decentralised AI?
I think you area already seeing models good enough for a range of tasks on the edge and innovative architectures to enable this.
I think firming up a baseline of model quality people can build around, much as they continue to build around the original stable diffusion, is very important as this opens up a range of potential mechanism design. This includes distributed tuning and model/data optimisation and ablation capability.
I think this is somewhat recursive as well as better base models that are predictable and improve can continually support data location and improvement that then, in turn can make the models better.
What data and knowledge should go into a model pre training and post training is probable the most of important outstanding research question.
If we can figure this out then we can pull on not only the compute and support of the masses, but their expertise to increase the quality and diversity of the data that feeds our models and their ability to help us all.
💥 Miscellaneous – a set of rapid-fire questions
What is your favorite area of research outside of generative AI?
I have a particularly interest in neurochemistry from my ASD research and functional medicine, which I think will be completely transformed by AI.
How far can the LLM scaling laws take us? All the way to AGI?
I think AGI from scaling LLMs is unlikely. What we are seeing now is similar to cooking a poor quality steak (massive datasets) for longer. They get tender and nice and the system exhibits increasing capability, but not necessarily generalised knowledge as an individual model nor capability. When put together in a broader system this does, of course, become more difficult to predict.
It could be that humans plus sufficiently advanced generative AI systems are the real ASI. Especially when we get BCI kicking off.
Describe the crypto-AI + Schelling AI world in five years.
An open, distributed, AI system that offers universal basic intelligence to everyone, is communally owned and governed and constantly improving with the objective function of human flourishing.
Who are your favorite mathematicians and computer scientists, and why?
I have a particular soft spot for Claude Shannon whose wonderful work laid the foundation for these massive advances we have seen. Herb Simon is another favourite bridging multiple disciplines and has been an inspiration for the design of Schelling AI in particular.