The Sequence Chat: Small Specialists vs. Large Generalist Models and What if NVIDIA Becomes Sun Microsystems
A controversial debate and a crazy thesis.
Mega large transformer have dominated the generative AI revolution over the last few years. The emergent properties that surface at scale in large foundation models are nothing short of magical. At the same time, running pretraining, fine-tuning and inference workloads in large models results cost prohibited for most organizations. As a result, we have seen the emergence of smaller models that more specialized in a given domain. The raise of smaller models also represents a challenge to the dependency of NVIDIA GPUs are these models are perfectly able to run in commodity hardware.
Today, I would like to explore two complex thesis:
Dive into the debate between smaller and large models.
What happens to NVIDIA in a world in which new AI architecture do not depend on its GPU topologies?