Edge 385: The Two Big Schools for Building Autonomous Agents
Language or computer-vision based agents?
In this Issue:
Building LLM-based vs. computer-based autonomous agents.
Adept AI’s Fuyu-8B which powers its agent platform.
Microsoft’s Autogen framework for building collaborative agents.
💡 ML Concept of the Day: The Two Big Schools for Building Autonomous Agents
In our series about autonomous agents, today we would like to explore the two fundamental schools used for implementations of this AI systems. Typically, we associate autonomous agents with LLMs but there are competitive techniques fundamentally based on computer vision models which has been gaining quite a bit of traction. CV-based autonomous agents fundamentally focus on recording actions in a user’s computer and replicating those actions with models that can understand pixel-by-pixel positions and mouse actions. You can think of this type of approach as the next generation of robotic process automation(RPA) models.
In general, when exploring the landscape of frameworks for building autonomous agents, you are likely to be presented with two fundamental choices: