According to Cointelegraph, the pursuit of artificial general intelligence (AGI) remains a complex challenge, as highlighted by Apple researchers who have identified significant reasoning difficulties in leading AI models. Despite recent advancements in large language models (LLMs) like OpenAI's ChatGPT and Anthropic's Claude, the fundamental capabilities and limitations of these models are not fully understood, as detailed in a June paper titled "The Illusion of Thinking." The researchers emphasize that current evaluations focus heavily on mathematical and coding benchmarks, prioritizing final answer accuracy without adequately assessing the reasoning abilities of AI models.
Apple's research contrasts with the widespread belief that AGI is imminent. To explore the reasoning capabilities of AI, the researchers designed various puzzle games to test both "thinking" and "non-thinking" versions of models such as Claude Sonnet, OpenAI's o3-mini and o1, and DeepSeek-R1 and V3 chatbots. Their findings reveal that frontier large reasoning models (LRMs) experience a significant drop in accuracy when faced with complex tasks, failing to generalize reasoning effectively. This contradicts expectations for AGI, as these models struggle with exact computation, inconsistent reasoning, and an inability to apply explicit algorithms across different puzzles.
The study also highlights that AI chatbots often exhibit overthinking, generating correct answers initially but then deviating into incorrect reasoning. The researchers conclude that LRMs mimic reasoning patterns without truly internalizing or generalizing them, falling short of AGI-level reasoning. These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to achieving generalizable reasoning.
AGI is considered the ultimate goal of AI development, representing a state where machines can think and reason on par with human intelligence. In January, OpenAI CEO Sam Altman expressed confidence in the company's progress toward building AGI, stating that they are closer than ever before. Similarly, Anthropic CEO Dario Amodei predicted that AGI could surpass human capabilities within the next few years, potentially by 2026 or 2027. Despite these optimistic projections, the findings from Apple researchers underscore the ongoing challenges in the race to develop AGI.