HealthcareClinical AIdecision-makingHuman-machine teaming

What happens when AI makes clinical decisions — and humans don't question them?

Domain

Healthcare, Clinical AI, decision-making

Problem

Human-machine teaming, automation bias, professional skill degradation

Methods

Ethnography, depth interviews, literature synthesis

Deploying AI in clinical settings is assumed to improve outcomes, but the evidence says otherwise — so through two years of ethnographic fieldwork with radiologists and clinical AI practitioners, we investigated why AI in high-stakes medical settings is actively degrading human expertise rather than augmenting it, and identified five design factors that determine whether AI helps or harms professional judgment.

The Problem

AI systems in healthcare are getting better at reading medical images. So the obvious conclusion is that deploying AI in clinical settings should improve outcomes. Except the evidence says otherwise. In a study of 222,000 women across 430,000 mammograms over four years, introducing AI-assisted diagnosis increased the rate of biopsies by nearly 20 per cent — but made no significant improvement to cancer detection rates. A second study of 320,000 women found that radiologists' diagnostic performance was actually worse when working with AI than without it. AI, in these cases, actively degraded the skill and judgment of the clinicians who used it. This is the problem we set out to understand — and to help developers design a way past.

Who we worked with

Through ethnographic fieldwork, the project team worked with radiologists and radiation therapists across a range of professional experience: 8 radiologists interviewed in depth, from junior trainees to senior consultants; 5 radiologists in ongoing collaborative discussion throughout the research; 1 radiation therapist and 5 radiography practitioners and educators; and 2 participants who use AI-based diagnostic systems as part of their daily clinical practice.

How we did it

  1. 01

    Ethnographic fieldwork

    Observing how clinicians actually use AI in practice, not how they describe using it.

  2. 02

    In-depth interviews

    Exploring how AI outputs influenced clinical judgment, attention, confidence, and decision-making.

  3. 03

    Ongoing collaborative discussion

    Tracking how practitioners' relationships with AI changed over time.

  4. 04

    Literature synthesis

    Across human-centred AI, explainable AI, human-in-the-loop systems, and human-automation teaming.

  5. 05

    Conceptual framework development

    Translating findings into five design factors that determine whether AI helps or harms human expertise.

What we found

Finding 01

AI is replacing human judgment, not augmenting itAutomation-induced complacency in high-stakes settings

When clinicians are given an AI output, they tend to anchor on it. They look where the AI tells them to look and trust confident predictions even when those predictions are incomplete. We describe this as automation-induced complacency — the steady erosion of active, vigilant human attention when a machine appears to have already done the thinking. In mammography, this showed up in clinical trials as an increase in unnecessary biopsies and an increase in missed cancers.

If you are building AI for clinical or professional settings

We bring human-centred research methods to AI products in development, helping teams understand not just whether users can operate their system but how sustained use affects human skill, judgment, and trust.

If you are deploying AI in professional or clinical settings

We can design and run research with your teams to understand how AI tools are changing professional practice inside your organisation, and what changes would make those effects more positive.

Finding 02

AI systems are not designed with learning in mindOptimised for better answers, not better questions

We identified a structural flaw in how most AI for professional settings is built: the system is optimised to give better answers, not to help the human develop better questions. This produces teams that are less capable than either the human or the machine operating alone. What we propose instead is genuine human-machine teaming — systems in which AI and humans work in dialogue, where both parties get better over time.

If you are building AI for professional or knowledge-work settings

We design research programmes that study how users develop — or do not — through sustained AI use, and translate those findings into interaction design principles that build products worth returning to.

If you are deploying AI tools for knowledge workers or clinicians

We can design monitoring frameworks and ongoing research programmes that track how AI tools are affecting professional capability inside your organisation, and surface early signals before problems become entrenched.

Finding 03

Five things that determine whether AI helps humans — or harms themA framework for high-stakes AI design

We identified five factors that any AI system must get right to genuinely augment rather than erode human expertise: Human-machine dialogue — does the AI invite questioning, or just present conclusions? Labelling and attention — where does the AI direct human attention, and is that direction transparent? Problem framing — who gets to define what the AI is solving for? Biases, values and affect — what biases does the AI carry, and how do those interact with human biases? Ethics, agency and human choice — does the system preserve meaningful human agency, or structurally undermine it?

If you are deploying AI in high-stakes settings

We can help you build the evidence base — through research with your staff and the people they serve — that demonstrates you have taken these questions seriously.

If you are building AI for any high-stakes setting

We design research programmes structured around exactly these questions, using methods built for the complexity of AI's effect on human judgment.

Rather than designing algorithms in isolation, we investigate how to design new forms of system that amplify human performance and overall system performance.

Project team

Working with Forth Story

If you are building AI for professional or clinical settings and want research that goes beyond technical performance to understand what your system does to the humans who use it — get in touch. We worked with a wide project team to deliver this work.