TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 385: The Two Big Schools for Building Autonomous Agents

Edge 385: The Two Big Schools for Building Autonomous Agents

Language or computer-vision based agents?

Apr 09, 2024
∙ Paid
31

Share this post

TheSequence
TheSequence
Edge 385: The Two Big Schools for Building Autonomous Agents
2
Share
Created Using Ideogram

In this Issue:

  1. Building LLM-based vs. computer-based autonomous agents.

  2. Adept AI’s Fuyu-8B which powers its agent platform.

  3. Microsoft’s Autogen framework for building collaborative agents.

💡 ML Concept of the Day: The Two Big Schools for Building Autonomous Agents

In our series about autonomous agents, today we would like to explore the two fundamental schools used for implementations of this AI systems. Typically, we associate autonomous agents with LLMs but there are competitive techniques fundamentally based on computer vision models which has been gaining quite a bit of traction. CV-based autonomous agents fundamentally focus on recording actions in a user’s computer and replicating those actions with models that can understand pixel-by-pixel positions and mouse actions. You can think of this type of approach as the next generation of robotic process automation(RPA) models.

In general, when exploring the landscape of frameworks for building autonomous agents, you are likely to be presented with two fundamental choices:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share