CTI Research Series

Why Clinical Documentation Is Not Just Transcription

Dr. Brendan O'Brien

There is a quiet assumption running through a lot of healthcare AI right now: that if you can transcribe a consultation accurately, you have largely solved clinical documentation.

You haven't.

You have solved one piece of one step of a much larger task.

I want to explain, plainly, the difference between transcription and clinical documentation, because the gap between them is where most current AI scribe tools quietly fail, and it is where I think the next generation of clinical software has to do better.

What transcription actually is

Transcription is turning speech into text. It is a real engineering achievement. Modern automatic speech recognition is genuinely impressive, especially with medical vocabularies, but the job it does is bounded.

A transcript gives you words in order. It tells you who spoke, more or less. It tells you when in the encounter something was said. If the system is good, it tells you accurately. If the system is excellent, it handles accents, medical terminology, overlapping speech and background noise.

Even when transcription is excellent, it gives you a record of the encounter as audio rendered into text. Nothing more.

Clinical documentation, by contrast, is something else entirely.

What clinical documentation actually is

A clinical record is not a transcript. It is an interpretation of an encounter, structured for a purpose, written for an audience, and carrying a clinician's professional accountability behind it.

Several things have to happen in a clinical record that do not happen in a transcript.

The patient's account has to be separated from the clinician's interpretation. When a patient says "my back has been killing me for years," that is a history. When I write "chronic mechanical lower back pain, gradually worsening over a certain time period, with no red flags identified on history," that is a clinical impression. Those are different layers of information, and confusing them is potentially dangerous. It risks stripping out the clinician's reasoning.

Findings have to be structured, not narrative. A clinical record needs to capture the examination in a way another clinician can act on: power, reflexes, sensation, gait, range of motion. Not as prose, but as discrete observations.

Reasoning has to be recorded, not just conclusions. The decision to operate, to defer, to investigate further, to refer, to safety-net: each of these decisions has a logic behind it. That logic belongs in the record.

Outputs are plural, not singular. A single encounter generates a clinical note, often a referrer letter, often a patient summary, often an entry in a follow-up system, sometimes a procedural record, sometimes a medication change. They are not the same document.

A transcript can be the raw material for some of this. It cannot be the whole thing. The leap from transcript to clinical record is interpretive, and interpretation is what clinicians are for.

Where transcription-only tools fall short

If you give a clinician a transcript and call it documentation, three problems appear immediately, and I have seen all three in real practice.

The first problem is that the clinician still has to do the documentation. They now have a long verbatim text and the same blank note they had before. The cognitive load has not been reduced. It has been moved.

The second problem is that the structure has been lost. A consultation, when you transcribe it, sounds like a conversation because it is one. The history loops back, the examination is interrupted by questions, the plan emerges in fragments. None of that is the shape of a clinical record. The clinician has to reconstruct the structure from the prose.

The third problem is that the patient's words and the clinician's reasoning get mashed together. "Back killing me" is in the same paragraph as the impression and the plan. Without effort, the record loses the distinction between what was reported and what was concluded. That is exactly the distinction medico-legal review depends on.

A scribe-style AI tool that solves only transcription gives the clinician a faster typewriter, not a clinical documentation system. That is useful. It is also nowhere near enough.

What good clinical documentation contains

When I think about what a specialist consultation note should actually contain, the list is longer than people expect.

It contains a structured presenting complaint and history of presenting complaint, distinguishing patient-reported facts from clinician interpretation. It contains a relevant past medical and surgical history, focused for the consultation rather than copy-pasted. It contains medications and allergies, validated rather than inherited. It contains a focused examination written in clinical shorthand a colleague can scan. It contains imaging or investigation findings reviewed and interpreted, not merely summarised. It contains a clear clinical impression. It contains options considered and the recommended plan. It contains risks, alternatives and the patient's understanding. It contains follow-up actions with timing.

It contains safety-netting instructions where relevant.

That is one document, and it has to be useful at three or four different reading speeds: by the patient's GP at 60 seconds, by a covering colleague at five minutes, by a medico-legal reviewer at an hour, and by the clinician themselves at the next visit.

Now do that for the patient summary, the referrer letter, and the action list.

Now do it twenty times a day.

This is the actual job. Transcription is a side door into the job.

Why specialists feel this most acutely

Generalist consultations are demanding, but specialist consultations have a particular structure that makes the documentation problem often more acute.

Specialist consultations integrate symptoms, examination, imaging, prior management, and a narrow but high-stakes decision space: operate or not, escalate or not, this medication at this dose or that one. The record has to support that decision and be re-readable by anyone who picks the patient up later, often in a different setting, sometimes years later.

Specialist consultations also produce more variable outputs. A spine consultation may need an operative plan for the patient, a letter to the referring GP, a letter to the radiologist, an instruction to the practice for follow-up imaging, and a summary the patient can show their family. Each of those needs different language for a different reader.

A transcription-only tool treats all five of these as the same problem. They aren't. They share a source, the encounter, but they require different framing, different vocabulary, and different signal-to-noise.

This is one of the reasons I keep coming back to a multi-output design. The encounter is one event. The documentation it generates is plural.

What I look for in a documentation tool

When I evaluate any AI documentation tool, and I have looked at most of them, I am not impressed by transcription accuracy alone. I assume that is a baseline. The questions I ask are different.

  • Does the tool produce more than one output from a single encounter, shaped for different readers?
  • Does it preserve the difference between patient-reported and clinician-concluded information?
  • Does it make clinician review feel like authorship, not box-ticking?
  • Does it allow correction without losing the original AI draft, so I can see what the system gave me and what I changed?
  • Does it produce records that are referenceable, exportable and durable, not stuck inside a vendor's ecosystem?
  • Does it support the structures clinicians actually use, such as SOAP-style notes, structured examination, operative notes and referral letters?
  • Does it know when to say "I am not sure"?
  • Does it adapt to the way my specialty actually works, or does it impose a generic note format that I have to fight every time?

An honest "uncertain" in a draft is more useful than a confident hallucination. The cost of fighting a tool's defaults, multiplied across a clinic day, can quickly exceed the cost of typing the note from scratch.

These are not abstract preferences. Each one of them is a hinge between a tool that helps me and a tool that gives me more work, and the gap between those two outcomes is what I think about every time I sit down to design Regenemm Voice.

What Regenemm Voice is being designed to do

Regenemm Voice is being built around the idea that the consultation is the source event and clinical documentation is a structured, plural, reviewable artefact.

The ambition is straightforward. From a single specialist consultation, the system should be able to produce a clinical note, a referrer letter, a patient summary and a follow-up action list. The clinician should review, correct and approve. The system should keep an audit trail of what it drafted and what was changed. The patient should be able to receive a clear summary in language they can understand. The GP should receive a letter written for a GP, not a generic dump.

That is a different design problem from "transcribe the room well." It assumes from the beginning that:

  • the clinician is the author of the record
  • the AI is a drafting tool, not an oracle
  • a single encounter has multiple legitimate outputs
  • every output has to be safe to send to the audience it is intended for

I am not interested in building a faster typewriter. I am interested in building something that actually reduces the documentation burden in a way that is safe, honest and durable.

A practical comparison

It is worth being concrete about the difference between a transcription-first tool and a documentation-first tool.

A transcription-first tool captures words, produces a single output, and asks the clinician to convert the transcript into a record. A documentation-first tool captures the encounter, produces multiple structured outputs, separates patient-reported information from clinician interpretation, lets the clinician review and approve, and keeps an honest audit trail.

A transcription-first tool can be wrong about a medication and the clinician may not notice until weeks later. A documentation-first tool flags the medication as a fact that needs explicit confirmation before it leaves the encounter.

A transcription-first tool produces a wall of text. A documentation-first tool produces a clinical note that another clinician can act on inside thirty seconds.

The difference is not subtle. It is the difference between a recording and a record.

Closing

If you are looking at AI clinical documentation tools, please do not stop at transcription accuracy. Ask what the tool produces, who it produces it for, who is named as the author when the record leaves your system, what happens when the AI is wrong, and what the system remembers about what it drafted versus what you approved.

Documentation is not transcription. Treating them as the same is how good clinicians end up with worse records than the ones they used to type by hand.

That is the problem we are working on. Properly. Slowly. With the seriousness it deserves.

Related Regenemm workflow

If you are evaluating AI clinical documentation tools for specialist practice, Regenemm Voice is being built for the way real specialist work actually documents itself: structured, multi-output, clinician-authored and auditable.

See Regenemm Voice's approach to AI clinical documentation


Brendan O'Brien is Founder of Regenemm Healthcare and a practising neurosurgeon.

Read More