
AIabout 5 hours ago
ARC-AGI-3 drops frontier models below 1% on interactive reasoning tasks humans ace
ARC-AGI-3 tests AI agents in interactive turn-based environments with no instructions or stated goals. Frontier models score below 1%. Humans score 100%. The new benchmark from the ARC Prize Foundation reveals a massive gap in agentic reasoning.
By Kai NakamuraAI|
#agentic-ai#ARC-AGI#benchmarks