Edge 385: The Two Big Schools for Building Autonomous Agents

Language or computer-vision based agents?

Apr 09, 2024

∙ Paid

In this Issue:

Building LLM-based vs. computer-based autonomous agents.
Adept AI’s Fuyu-8B which powers its agent platform.
Microsoft’s Autogen framework for building collaborative agents.

💡 ML Concept of the Day: The Two Big Schools for Building Autonomous Agents

In our series about autonomous agents, today we would like to explore the two fundamental schools used for implementations of this AI systems. Typically, we associate autonomous agents with LLMs but there are competitive techniques fundamentally based on computer vision models which has been gaining quite a bit of traction. CV-based autonomous agents fundamentally focus on recording actions in a user’s computer and replicating those actions with models that can understand pixel-by-pixel positions and mouse actions. You can think of this type of approach as the next generation of robotic process automation(RPA) models.

In general, when exploring the landscape of frameworks for building autonomous agents, you are likely to be presented with two fundamental choices:

TheSequence

Edge 385: The Two Big Schools for Building Autonomous Agents

Language or computer-vision based agents?

In this Issue:

💡 ML Concept of the Day: The Two Big Schools for Building Autonomous Agents

This post is for paid subscribers