This One Twist to Fool ChatGPT Could Cost Lives
05 December 2025 | 12:24
13:05 - August 06, 2025

This One Twist to Fool ChatGPT Could Cost Lives

TEHRAN (ANA)- AI systems like ChatGPT may appear impressively smart, but a new Mount Sinai-led study shows they can fail in surprisingly human ways—especially when ethical reasoning is on the line.
News ID : 9588

By subtly tweaking classic medical dilemmas, researchers revealed that large language models often default to familiar or intuitive answers, even when they contradict the facts. These “fast thinking” failures expose troubling blind spots that could have real consequences in clinical decision-making, the journal NPJ Digital Medicine reported.

A recent study led by researchers at the Icahn School of Medicine at Mount Sinai, working with colleagues from Israel’s Rabin Medical Center and other institutions, has found that even today’s most advanced artificial intelligence (AI) models can make surprisingly basic errors when navigating complex medical ethics questions.

The results raise important concerns about how much trust should be placed in large language models (LLMs) like ChatGPT when they are used in health care environments.

The research was guided by concepts from Daniel Kahneman’s book “Thinking, Fast and Slow,” which explores the contrast between instinctive, rapid decision-making and slower, more deliberate reasoning. Previous observations have shown that LLMs can struggle when well-known lateral-thinking puzzles are modified slightly. Building on that idea, the study evaluated how effectively these AI systems could shift between fast and slow reasoning when responding to medical ethics scenarios that had been intentionally altered.

“AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says co-senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “In everyday situations, that kind of thinking might go unnoticed. But in health care, where decisions often carry serious ethical and clinical implications, missing those nuances can have real consequences for patients.”

To explore this tendency, the research team tested several commercially available LLMs using a combination of creative lateral thinking puzzles and slightly modified well-known medical ethics cases. In one example, they adapted the classic “Surgeon’s Dilemma,” a widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy—he’s my son!” The twist is that the surgeon is his mother, though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. The error reveals how LLMs can cling to familiar patterns, even when contradicted by new information.

In another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented, many models still recommended overriding a refusal that no longer existed.

“Our findings don’t suggest that AI has no place in medical practice, but they do highlight the need for thoughtful human oversight, especially in situations that require ethical sensitivity, nuanced judgment, or emotional intelligence,” says co-senior corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. “Naturally, these tools can be incredibly helpful, but they’re not infallible. Physicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularly when navigating complex or high-stakes decisions. Ultimately, the goal is to build more reliable and ethically sound ways to integrate AI into patient care.”

“Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford,” says lead author Shelly Soffer, MD, a Fellow at the Institute of Hematology, Davidoff Cancer Center, Rabin Medical Center. “It underscores why human oversight must stay central when we deploy AI in patient care.”

Next, the research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.

4155/v

Send comments