The Sequence Chat: Consensys's Lex Sokolin on Generative Art and Philosophical Principles of Generative AI
A conversation about the history, current state and foundations of generative art.
👤 Quick bio
Tell us a bit about yourself. Your background, current role and how did you get started in AI and generative art?
Thanks for having me on here. In terms of my background, sometimes it feels like a pendulum swing between the rational and the creative. I am equally drawn to aesthetics and systems, sometimes at the same time.
On the analytical side, I have pursued work in economics, financial services, and the law through my work as the Chief Economist at Web3 company ConsenSys, various entrepreneurship experiences in the Fintech industry, and a JD/MBA at Columbia University. I’ve always enjoyed modeling and understanding complex social systems, and the interactions between people and the abstractions of their technologies. I think of operating and investing as a continuum to explore what people can build across economic, financial, and cultural networks. For more on this topic, I write a newsletter called the Fintech Blueprint accessible here https://lex.substack.com/.
On the other hand, I have cultivated a visual arts practice using new media since I was young. Whether it was the early days of Photoshop, or Flash, or Processing, I like playing with both the language of the medium, as well as the renderings that result. Recent developments across generative, neural, and glitch art, including their popularity on various NFT platforms, have been absolutely amazing for me, bringing together economics and creativity in very novel ways. You can see my views on this topic at
and also at https://www.lexsokolin.com/artist-statement
For social media, check out https://twitter.com/LexSokolin and https://www.linkedin.com/in/alexeysokolin.
🛠 ML Work
In the last two years, we have seen an explosion of generative models for image, audio, 3D, video. How has this level of innovation impacted the generative AI space?
I go back to the concept of the Uncanny Valley. We have had an enormous volume of CGI and various rendering of images over the last two decades. Artists have been trying to make things photo-realistic in movies and video games, but (1) the images were imperfect and (2) the skill to create them was prohibitive. In fact, the more people chased perfection, the more off-putting the images had felt. I think a similar thing can be said of robot conversation – early attempts felt like talking to a chattering metallic machine with a rubber mask on. You could see the gears, and the fact that those gears attempted to look human was genuinely unnerving and creepy.
Neural networks started to show magic in the early to mid 2010s, but you could still easily see the artifacts and distortions of the math required to generate the images. Things looked like computational styles, or mathematical transformations, rather than something that would pass for reality. In this way, getting through the Uncanny Valley is akin to passing the conversational Turing Test.
AI began to pass visual, hearing, and written tests better than the human counterpart. Its probabilistic sensing of media could be inverted to hallucinate and generate media. We are now in the moment where such media are, for all intents and purposes, indistinguishable for most people from “the truth”. Yes, there are mangled hands and repeating face patterns and dream-like text, but we have our entire imagination now indexed as a search engine.
In regards to industry, we can say that the skill required to create media at professional level has dropped dramatically. This will lead to an over-supply of media, a sort of asymptotic infinity towards the end of our Web2 era of free doom-scrolling content. As supply goes up, price will continue to drop, and there will be further asymmetric concentration of value to a few companies. The only way out of this is to generate digital scarcity for digital objects, and re-introduce human economies around hallucinated media. But we are several steps away from this, in my view.
Diffusion models have become the state-of-the-art technique for image generation. What is it about diffusion techniques that makes it better suited for generative AI scenarios than previous methods?
I used to think of AI as a counterpart to a human brain. Once we have mapped an entire human brain, in an Accelerando fashion, then we can copy/paste that intelligence and scale up our processing. But it feels more like AI has been recreating human senses at the scale of the population, of humanity. We see how neural networks used to ingest some local data set about cats and that was sufficient to train that network to see cats. Now, the entire container of digitized human knowledge is pumped into a mystery box, which structures that information into abstractions we cannot touch or understand.
If anything with deep learning, we could trace the abstractions in the beginning as Shakespearean letters turn to words, then words to sentences, then sentences to paragraphs, then paragraphs into new books. But the new models have an order of magnitude more data, and the abstractions and clumps pass away from our intuition, several levels up to the clouds. So I think of it as interacting with a modeled sense, a disconnected digital feeling averaged out across all human experience.
Text-to-(image, video, 3D, audio) have become one of the dominant forms in the generative AI landscape. What’s the impact of text guidance in generative art?
Generative Art used to mean that you use a programming language like Processing to discover mathematical algorithms which deterministically design beautiful patterns. Those things might be fractals, or constructivist abstractions, or some other balanced recursive aesthetic. The key was in being very precise with specifying rules through programming.
The new generative AI art landscape is far more akin to photography, or found objects. People are having interesting experiences struggling with ownership, intellectual property, and the artistic view in general when using these new technologies. It is too easy to make something, and it feels as if nothing was actually made.
The role of the natural language model is to create the navigation, the view finder, by which you traverse the latent space of all visual data. It is technical and difficult to move around a visual generative model without such a device. But if you are able to translate the intent of your language into the numbers which move around another mathematical layer, then it feels like your intent matches the outputs you receive from the model. It is a magical feeling, and endlessly addictive.
However, it is not structured or deterministic in the sense of algorithmic art, for example, because the model with which you are interacting can interpret your inputs in wildly different ways as seen in Stable Diffusion. There are still unknown effects for industry structure. Will we have a single visual model style to rule them all, like MidJourney, which satisfies most users with its simplicity and stylization? Or will we have a splintering of models and styles, as if they were filters on Instagram? And who will consume this content at the end? Perhaps it is the trap of Narcissus, staring forever at our own reflection in the AI drip feed of the phone.
Generative AI models have recently pioneered techniques such as inpaintaing, outpainting, blending and many others that are pushing the boundaries of visual effects. What are some of the most exciting techniques you have seen applied to generative art?
I remember seeing a generative AI paper in 2014 or so, and thinking that it was impossible to commercialize. Now, there is a new step forward every week. Video game worlds are rendered in Minecraft blocks, and then styled and made alive through diffusion models. Videos are in their beginning stages of being consumable. Music and NPC text are coming around. All these primitives will add up to supporting a spontaneous, personalized metaverse experience, regardless of Zuckerberg’s early failures. Each one of us can and will carry a secret world, and visual effects are an unbounded part of this future.
What I worry more about, however, is the difference between art and illustration. In as much as I love the clever and beautiful visual artifacts that people are creating and sharing, something tells me that we are losing that key distinction. Making cool-looking things to tell a story is illustration. Embellishing the visuals to have “hyper detail” “cinematic” flourish is illustration. What is the art in this movement? I believe photography went through a similar challenge in creating the tooling and then figuring out how it could be used to make art. Such language develops more slowly than the technology to render.
The release of Stable Diffusion was sort of a “Sputnik moment” in generative art sparking a tremendous level of innovation. How do you see the balance between open source vs. proprietary models when comes to generative art?
There are two dimensions I am worried about here: (1) the closing / opening of the model itself, and whether the manufacters of the AI engine try to close down access to its use and re-use, and (2) the ability of people to own and transact around the outputs of the models in a way that advantages human dignity.
The generation of these models is a race, and spoils will accrue to the fastest movers at scale. Once that race is over, the technology will be available to all, and its protections will diminish and the economic profits will be gone. To that end, while it may feel hyper-competitive in the moment, I think the long term outcome will be large open-source models that are tied to the evolution of humanity’s data exhaust. The Internet will *think*.
The other problem seems more dangerous. If we again opt into infinite content and zero cost, nothing good will happen to society – dopamine addiction will continue to rise, people will opt into robot friendships and relationships over messy human ones, and so on. To that end, I hope we find economic models for these AI-produced digital goods that look more likely functional market economies, and less like infinite streams.
💥 Miscellaneous – a set of rapid-fire questions
What are the biggest challenges, next frontiers for generative AI methods?
I am excited to see generative AI meaningfully adopted in media and entertainment, rather than as a brainstorming tool. Once picture-perfect AI is available to all on cheap compute, I would expect more “art” oriented usages of the AI to emerge. In particular, ideas around glitching and deconstructing AI imagery is very interesting to me.
Favorite generative art model? Is Midjourney ahead of everyone else in terms of artistic capabilities?
I personally use Midjourney, because it is optimized for consumers and is fast and easy. I think different models are likely to succeed for mainstream users versus pro-sumers or professional users.
Are we navigating towards a world in which we end up with a dozen of foundation generative computer vision models that are used to create the vast majority of digital art in the world? What is the probability, risks and potential alternatives to that future?
I think we will end up with an oligopoly of AI conversational interfaces, which become deeply functional like operating systems. The OpenAI plug-in strategy is very powerful, and could kick off a race in terms of economic competition that largely benefits a single AI owner. I hope that the open source community is able to fork many of these benefits, and then create decentralized ownership and governance models that allow people to maintain their dignity (i.e., rights) as well as manageable financial models.
How do you think about intellectual property and artists rights when comes to generative art?
Art is separate from rendering and illustration. The creative commons has been a boon for the Internet and digital media, and I hope that the tooling we are building now remains largely in that commons. However, artists need economic models for their craft. The answer to that question comes in the form of digital ownership, with the earliest examples being NFTs on computational blockchains. This is the only answer I have seen as to how artists crowdfund from their communities by selling authentic art, even when infinite copies and remixes float around in the world. Perhaps we can tie in a royalty with a Web3 mechanism that allows for art to be integrated into an AI learning set, but frankly this feels like a weak mechanism to a mammoth problem.
What is the value and current limitations of non-fungible-tokens when comes to generative art?
NFTs prove authenticity and provenance, and allow for real commerce to occur on digital objects. Generative art can be special in that it is manufactured with the participation of the purchaser / minter, drawing the consumer into the creative process. I like the idea of having “authentic” mints being a valuable experience with a tangible price. The limitations are in that adoption of the particular market structure and shape of NFTs is very low in the general population. We need to move from novelty to standard, in the way that plastic records have been discarded in favor of digital music files.