CTI Research Series

Clinician-in-the-Loop AI Is a Safety Model, Not a Slogan

Dr. Brendan O'Brien

"Human in the loop" has become one of those phrases that sounds reassuring and means almost nothing. Vendors put it on their websites. Investor decks include it as a bullet point. Marketing material implies that as long as a clinician is somewhere in the system, the system is safe.

I want to push back on that, because I think the framing has drifted away from what it actually means in clinical practice. Clinician-in-the-loop is not a slogan. It is a safety model. And like any safety model, it either works because it is designed properly, or it fails quietly because it isn't.

This article is about what clinician-in-the-loop should mean in the context of clinical AI documentation, why it matters, and what to look for if you are evaluating a tool for your own practice or your service.

What clinician-in-the-loop is supposed to mean

At its most basic, clinician-in-the-loop describes a system in which an AI proposes outputs and a clinician reviews and approves them before they have clinical effect.

That is fine as a one-line definition. It immediately raises four questions, and the answers to those four questions are what separate a real safety model from a slogan.

  • Where in the workflow is the clinician?
  • What are they actually reviewing?
  • What are they empowered to do when something is wrong?
  • What does the system remember about the review?

If the answers to these four questions are vague, the clinician is not really in the loop. They are downstream of the loop, signing off.

Where the clinician sits in the workflow matters

There is a meaningful difference between clinicians being asked to review AI output before it leaves the system and clinicians being asked to review it after it has already affected the record, the patient, or the care plan.

A genuine clinician-in-the-loop design treats clinician review as the gate. Nothing leaves the encounter in a clinically meaningful form until the clinician has read it and approved it. The clinical note is not finalised, the letter is not sent, the patient summary is not released, and the action list is not committed.

A weaker design treats clinician review as a courtesy. The system has already produced something, perhaps already labelled it as the clinical record, and the clinician's review is more a chance to amend than a chance to authorise.

In real specialist practice, the difference is not subtle. The first design respects that the clinician is the author of the record. The second design implicitly treats the AI as the author and the clinician as a quality controller. That is not a defensible position when the record carries the clinician's accountability.

What the clinician should be reviewing

Not every piece of an AI-drafted document carries the same clinical weight. A useful safety model treats this as a feature of design.

In specialist documentation, certain facts are protected, meaning they should not be allowed to enter the record without explicit clinician confirmation. From my own practice, I would put the following on that list.

Diagnoses. A diagnosis carries downstream consequences for billing, insurance, future investigations and patient understanding. An AI proposing a diagnosis is helpful. An AI committing a diagnosis to the record without confirmation is unsafe.

Laterality. Left or right. The cost of getting this wrong is enormous and well documented.

Dates. Onset of symptoms, prior surgery dates, intended operative dates, follow-up intervals. Numerical errors in dates propagate into letters, lists and patient understanding.

Medications and doses. Wrong drug, wrong dose, wrong frequency, missed allergy. Each of these is an established source of harm.

Allergies. A missed or transposed allergy is a recurring source of avoidable injury.

Operation names and side. Operative records have to be unambiguous about what was done and on which side. A system that drafts operative descriptions without explicit clinician confirmation is not safe.

Risks discussed and consented to. The legal weight of consent depends on what was actually discussed and what the patient understood. An AI summarising "risks were discussed" is not a substitute for a properly recorded conversation.

Negations. "No red flags" and "red flags absent" carry enormous weight. An AI dropping a "no" through a transcription error is one of the more dangerous failure modes I worry about.

Statements of uncertainty. "Possible," "likely," and "consistent with" are not interchangeable. A system that flattens uncertainty into confident language is producing a less honest record than the clinician intended.

Follow-up plans. Whether the patient is being seen again, when, by whom, and for what. These are obligations. A wrong follow-up plan is a failure of duty of care.

A real safety model treats these as protected facts. They are highlighted in the draft, require explicit clinician confirmation, and are flagged in the audit trail when changed. A slogan-level model treats them as ordinary text and trusts that the clinician will spot any errors during review. They will sometimes. Not always.

What the clinician needs to be able to do

Reviewing is not the same as approving. A clinician reviewing AI output needs more than a "looks fine, sign here" interface.

They need to be able to correct, with the original AI draft preserved alongside the corrected version. This is essential both for quality improvement of the system and for the audit trail.

They need to be able to add. AI-drafted records are often missing the clinician's nuance: the look on a patient's face, the unspoken hesitation, the reason a particular option was set aside. Authorship-grade review means writing into the record, not just editing what the AI gave.

They need to be able to reject. If the draft is wrong enough, the clinician should be able to discard and start over without that being clinically painful or politically awkward inside their team.

They need to be able to see what was protected and confirmed, and what wasn't. This is important not just for safety but for trust over time. A clinician who can see exactly what the AI did and what they themselves added builds calibrated confidence in the tool. A clinician who can't, doesn't.

If review feels like rubber-stamping, the safety model has already failed. The clinician will rubber-stamp, because they are running late and the next patient is waiting.

What the system needs to remember

A real clinician-in-the-loop system keeps an honest record of three things.

What the AI proposed. What the clinician changed. What was finally approved.

This audit trail is the unglamorous backbone of clinical AI safety. Without it, errors disappear into the record and become indistinguishable from clinician statements. With it, errors can be traced, patterns can be identified, the system can be improved, and the clinician's accountability can be properly bounded. They own what they approved. The system can be assessed on what it drafted.

This is also the layer that medico-legal review will actually depend on. If a record written with AI assistance is questioned years later, the question will be: what did the clinician approve, and does the audit trail confirm that approval? A system that cannot answer that question precisely is a system that is exposing the clinician.

Why this is a design problem, not a checkbox

A great deal of the variation between clinical AI tools comes down to whether clinician-in-the-loop has been designed in from the start or bolted on at the end.

Designed-in looks like:

  • protected facts treated specially throughout the data model
  • clinician review embedded as the gate, not a step
  • drafts and approvals stored separately and durably
  • workflow shaped around the way clinicians actually work: quickly, between patients, with limited tolerance for friction
  • audit trails that can be read by a clinician, an auditor, or a court without a vendor as interpreter

Bolted-on looks like:

  • a "review" screen at the end with no special handling of high-risk content
  • drafts and approvals collapsed into a single document
  • a signature button that performs no real validation
  • audit trails locked inside the vendor's system or absent altogether
  • workflows that punish careful review by costing the clinician more time, encouraging them to skim

Most clinicians can spot the difference within five minutes of using a tool. The question is whether procurement processes and governance reviews can spot it before the tool is rolled out.

The medico-legal stance

There is a question I am asked privately more often than publicly: who is liable when an AI documentation tool is wrong?

The honest answer, in current practice, is that the clinician who approved the record is liable. That is true legally. It should also be true ethically, because the alternative, that responsibility lives with a vendor or a model, would mean clinicians cannot defend their own records.

What makes this defensible, however, is whether the clinician was given the conditions to actually review. If they were shown a clear draft, with high-risk content highlighted, and they confirmed and approved knowingly, the record is theirs. If they were handed a wall of generated text under time pressure with no protection on the protected facts, the system has set them up to fail.

Clinician-in-the-loop, designed properly, is partly a system for protecting clinicians from the risks of the AI they are using. That is a reasonable thing to expect from a tool that wants to be in real practice.

What I want clinicians to ask

If you are evaluating a clinical AI tool, here are the questions I would ask before signing anything:

  • Does the tool treat certain facts, including diagnoses, laterality, medications, allergies, operative side and follow-up, as protected and require explicit clinician confirmation?
  • Can I see the AI's draft and my changes side by side, and can I export both?
  • Is there an audit trail that records what was generated, what I changed, and what I approved?
  • Does the workflow allow me to review properly without making me significantly slower than I am now?
  • If the AI is uncertain, does it tell me, or does it produce confident text either way?
  • Whose name appears on the record when it leaves the encounter, and who is therefore accountable?

These are not unreasonable questions. They are the questions a hospital governance team should be asking on behalf of every clinician using the tool.

Closing

Clinician-in-the-loop is one of those phrases that has to earn its meaning every time it is used. It is not a feature you can claim. It is a safety model you have to design.

The tools that take it seriously will look different from the tools that don't. They will be slower to build, more careful with high-risk content, and more honest about what they don't know. They will also be the tools that survive contact with real clinical practice without producing avoidable harm.

That is the standard I am holding Regenemm Voice to. Not because it is fashionable, but because the alternative is a generation of clinical AI tools that quietly transfer risk onto clinicians and call it a partnership.

Related Regenemm workflow

If you are responsible for clinical safety, governance, or AI procurement in a healthcare setting, Regenemm Voice is being designed around clinician-reviewed documentation as a core safety model, not a label.

Explore clinician-reviewed AI documentation


Brendan O'Brien is Founder of Regenemm Healthcare and a practising neurosurgeon.

Read More