What happens when AI makes clinical decisions — and humans don't question them?
Domain
Healthcare, Clinical AI, decision-making
Problem
Human-machine teaming, automation bias, professional skill degradation
Methods
Ethnography, depth interviews, literature synthesis
Deploying AI in clinical settings is assumed to improve outcomes, but the evidence says otherwise — so through two years of ethnographic fieldwork with radiologists and clinical AI practitioners, we investigated why AI in high-stakes medical settings is actively degrading human expertise rather than augmenting it, and identified five design factors that determine whether AI helps or harms professional judgment.
The Problem
AI systems in healthcare are getting better at reading medical images. So the obvious conclusion is that deploying AI in clinical settings should improve outcomes. Except the evidence says otherwise. In a study of 222,000 women across 430,000 mammograms over four years, introducing AI-assisted diagnosis increased the rate of biopsies by nearly 20 per cent — but made no significant improvement to cancer detection rates. A second study of 320,000 women found that radiologists' diagnostic performance was actually worse when working with AI than without it. AI, in these cases, actively degraded the skill and judgment of the clinicians who used it. This is the problem we set out to understand — and to help developers design a way past.
Who we worked with
Through ethnographic fieldwork, the project team worked with radiologists and radiation therapists across a range of professional experience: 8 radiologists interviewed in depth, from junior trainees to senior consultants; 5 radiologists in ongoing collaborative discussion throughout the research; 1 radiation therapist and 5 radiography practitioners and educators; and 2 participants who use AI-based diagnostic systems as part of their daily clinical practice.
How we did it
- 01
Ethnographic fieldwork
Observing how clinicians actually use AI in practice, not how they describe using it.
- 02
In-depth interviews
Exploring how AI outputs influenced clinical judgment, attention, confidence, and decision-making.
- 03
Ongoing collaborative discussion
Tracking how practitioners' relationships with AI changed over time.
- 04
Literature synthesis
Across human-centred AI, explainable AI, human-in-the-loop systems, and human-automation teaming.
- 05
Conceptual framework development
Translating findings into five design factors that determine whether AI helps or harms human expertise.
What we found
Finding 01
AI is replacing human judgment, not augmenting itAutomation-induced complacency in high-stakes settings
When clinicians are given an AI output, they tend to anchor on it. They look where the AI tells them to look and trust confident predictions even when those predictions are incomplete. We describe this as automation-induced complacency — the steady erosion of active, vigilant human attention when a machine appears to have already done the thinking. In mammography, this showed up in clinical trials as an increase in unnecessary biopsies and an increase in missed cancers.
If you are building AI for clinical or professional settings
We bring human-centred research methods to AI products in development, helping teams understand not just whether users can operate their system but how sustained use affects human skill, judgment, and trust.
If you are deploying AI in professional or clinical settings
We can design and run research with your teams to understand how AI tools are changing professional practice inside your organisation, and what changes would make those effects more positive.
Finding 02
AI systems are not designed with learning in mindOptimised for better answers, not better questions
We identified a structural flaw in how most AI for professional settings is built: the system is optimised to give better answers, not to help the human develop better questions. This produces teams that are less capable than either the human or the machine operating alone. What we propose instead is genuine human-machine teaming — systems in which AI and humans work in dialogue, where both parties get better over time.
If you are building AI for professional or knowledge-work settings
We design research programmes that study how users develop — or do not — through sustained AI use, and translate those findings into interaction design principles that build products worth returning to.
If you are deploying AI tools for knowledge workers or clinicians
We can design monitoring frameworks and ongoing research programmes that track how AI tools are affecting professional capability inside your organisation, and surface early signals before problems become entrenched.
Finding 03
Five things that determine whether AI helps humans — or harms themA framework for high-stakes AI design
We identified five factors that any AI system must get right to genuinely augment rather than erode human expertise: Human-machine dialogue — does the AI invite questioning, or just present conclusions? Labelling and attention — where does the AI direct human attention, and is that direction transparent? Problem framing — who gets to define what the AI is solving for? Biases, values and affect — what biases does the AI carry, and how do those interact with human biases? Ethics, agency and human choice — does the system preserve meaningful human agency, or structurally undermine it?
If you are deploying AI in high-stakes settings
We can help you build the evidence base — through research with your staff and the people they serve — that demonstrates you have taken these questions seriously.
If you are building AI for any high-stakes setting
We design research programmes structured around exactly these questions, using methods built for the complexity of AI's effect on human judgment.
“Rather than designing algorithms in isolation, we investigate how to design new forms of system that amplify human performance and overall system performance.”
— Project team
Working with Forth Story
If you are building AI for professional or clinical settings and want research that goes beyond technical performance to understand what your system does to the humans who use it — get in touch. We worked with a wide project team to deliver this work.
