AI Advances, But Still Fails to Understand Human Interactions

While artificial intelligence (AI) continues to evolve, it remains challenged by the complexities of human social interactions.

A recent study in the United States sheds light on this limitation, revealing that while AI can efficiently recognise objects and faces in still images, it struggles to describe and interpret social dynamics in moving scenes.

Led by Leyla Isik, a professor of cognitive science at Johns Hopkins University, the research aimed to assess how AI models understand social behaviour.

To achieve this, the team conducted a large-scale experiment involving over 350 AI models, each specialising in video, image, or language processing.

AI Still Falls Short in Understanding Human Social Interactions

A new study shows that humans are far better than current AI models at understanding social interactions in dynamic scenes.

Researchers tested this by asking people to rate short video clips featuring social… pic.twitter.com/IcTsHD20BF

— Neuroscience News (@NeuroscienceNew) April 24, 2025

These models were shown short, three-second video clips depicting various social situations, while human participants rated the intensity of the interactions on a scale from 1 to 5.

The goal was to compare how humans and AI interpret these scenarios, providing valuable insights into the current limitations of AI in understanding the nuances of social interactions.

The Critical Gap in Modern AI Models

The human participants in the study demonstrated remarkable consistency in their assessments, reflecting a deep and shared understanding of social interactions.

In contrast, AI struggled to replicate these judgments.

Video-specialised models were particularly ineffective in accurately interpreting the scenes, while models based on still images, even when provided with multiple video excerpts, had difficulty determining whether the characters were engaged in communication.

Language models performed slightly better, especially when given human-written descriptions, but still fell far short of human-level comprehension.

🚗🤖 Humans still outshine AI in reading social dynamics! A study by Johns Hopkins reveals AI struggles to grasp social interactions crucial for tech like self-driving cars & assistive robots. Understanding context & intent remains a human forte! #AI #VirtualsProtocol #TechNews… pic.twitter.com/z90EPDZKvQ

— Global Pulse (@GlobalPulse_Vir) April 25, 2025

For Isik, this inability of AI to grasp human social dynamics presents a significant barrier to its effective integration into real-world applications.

The study's lead author explains in a news release:

"AI for a self-driving car, for example, would need to recognise the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street. Any time you want an AI to interact with humans, you want it to be able to recognise what people are doing. I think this [study] sheds light on the fact that these systems can't right now."

The researchers suggest that this gap may stem from the design of AI neural networks, which are primarily modelled after brain regions responsible for processing static images.

In contrast, dynamic social scenes require engagement from different brain areas, creating a structural mismatch that could explain what the researchers describe as "a blind spot" in AI development.

The study coauthor, Kathy Garcia, noted:

"Indeed, "real life isn't static. We need AI to understand the story that is unfolding in a scene."

Ultimately, the study underscores a profound divide between human and AI perception of dynamic social scenarios.

Despite AI's impressive computational power and capacity for processing vast amounts of data, it remains unable to fully comprehend the nuanced and implicit intentions underlying human social interactions.

While AI has made significant strides, it still faces substantial challenges in understanding the complexity of human behaviour.