Author: @vibedilettante
Source: Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks (Arxiv: 2512.24156)
We’re accustomed to believing that the path to AGI (Artificial General Intelligence) lies in scaling up neural network parameters and context window size. Yet a new preprint by researchers Rudakov, Shok, and Cooley challenges this
notion. In the ARC-AGI-3 benchmark, a training-free algorithmic approach achieved results unattainable by many state-of-the-art language models (LLMs).
Let’s examine how their Graph-Based Exploration method works and why it secured third place on the leaderboard.
Problem: LLMs Can’t Explore Effectively
ARC-AGI-3 is a set of interactive tasks where an agent must uncover the hidden logic of a level (e.g., “move all blue blocks to the left”) by interacting with it. Large language models often fail here because the environment
demands systematic exploration, not mere next-token prediction.
Solution Architecture: State Graph + Visual Salience
Instead of relying solely on neural network intuition, the authors proposed framing the solution process as a graph traversal.
1. Object-Oriented Perception
The agent does not process raw pixels. It segments the game field (up to 30×30) into objects (connected components of the same color). This allows the system to operate with meaningful entities rather than just a set of points.
2. Transition Graph
The system builds a map of its actions:
- Nodes: Unique states of the game field.
- Edges: Actions (clicks, moves) that transition the game from one state to another.
3. Intelligent Queue (Visual Salience)
To avoid blindly exploring millions of possibilities (which kills performance), the agent uses the heuristic of visual salience. Priority is given to actions involving:
- Objects that changed after the last move.
- Objects that visually stand out against the background.
- States that lead to unexplored parts of the graph.
Results: Algorithms Are Back in Play
The agent was tested on the private evaluation set of the ARC-AGI-3 competition. Results are impressive:
- Third place: Had this agent participated in the official leaderboard at the time of publication, it would have ranked third.
- 30 out of 52: The median number of solved tasks was 30, surpassing many stochastic LLM-based methods.
Conclusion
The Graph-Based Exploration work presents a compelling argument for neuro-symbolic systems. For tasks requiring reasoning and planning, we still need rigid logical structures. Pure neural networks may generate ideas, but
verifying them and building a path is better done with graphs.
P.S. Colleagues, do you think it makes sense to use LLMs for initial image analysis (segmentation), and then use those results to build a rigid graph? Or should “vision” itself be algorithmic? I’m eager to hear your thoughts
in the comments of my Telegram channel https://t.me/vibedilettante
