What Neurosurgical Consultations Teach Us About Clinical AI
I want to say something up front: Regenemm Voice is not a neurosurgery tool. It is being built for specialist clinical practice across subtypes and disciplines, including Family Practice. Neurosurgery is one of many specialties it is designed to support.
But I have spent over two decades practising neurosurgery, and that practice has shaped how I think about clinical documentation. I have come to believe that neurosurgical consultations are an unusually demanding stress test for any clinical AI system. The features that make neurosurgery hard to document, including high-stakes decisions, dense imaging, integrated examination, irreversibility and medico-legal weight, are features that exist in milder form across most specialist practice.
If a documentation system can handle a neurosurgical consultation honestly and safely, the same architecture can serve the rest of medicine. If it cannot, it should at least be honest that it is not built for the specialist end of the spectrum.
This article is about what I have learnt from the way neurosurgical consultations document themselves, and what that teaches me about how to design clinical AI properly.
Are neurosurgical consultations unusually demanding?
No. They are typical of many multifaceted human clinical problems.
A neurosurgical consultation is rarely simple. The patient has usually been seen by one or more clinicians before they get to me. They arrive with imaging from one or two centres, sometimes more. They have symptoms that are often difficult to localise without a careful history and examination. They are frequently in pain, frequently anxious, and frequently aware that the conversation could end with a recommendation for surgery on their spine or brain.
What a neurosurgical consultation is very good at doing is tuning and developing an AI healthcare assistant.
In one consultation I am usually trying to do the following at the same time:
- Take a careful history that distinguishes structural from functional problems, neurological from musculoskeletal, urgent from chronic.
- Perform a focused neurological examination: power, tone, reflexes, sensation, coordination and gait, in a way that is reproducible and recordable.
- Review imaging, often more than one study, and form a view that integrates the imaging with the patient sitting in front of me, who may not match the imaging in either direction.
- Form a clinical impression.
- Decide whether surgery is appropriate, whether non-surgical management is appropriate, whether further investigation is needed, or whether the patient should be referred elsewhere.
- Have a frank conversation about risks. Real risks. Numerical where possible. Honest about the uncertainty.
- Confirm the patient's understanding and the agreed plan.
- Generate the documentation that has to flow from all of this: the clinical note, the letter, the patient summary, the operative plan if surgery is on the table, and the follow-up actions.
That is a lot of integrated work in one hour, and the cost of getting any of it wrong is high.
The features that can make neurosurgical documentation hard
Several features of neurosurgical practice push back hard on naive approaches to clinical AI.
Imaging is central. A neurosurgical consultation is half clinical encounter, half imaging review. The record has to integrate findings the AI has no direct access to: what I saw on the MRI, the discrepancy between the imaging and the patient's symptoms, the choice of one level of pathology over another based on examination. A transcript of the encounter does not capture imaging interpretation, and an AI that pretends it does is dangerous.
The examination is rich. Neurological examination findings are not narrative. They are structured observations: power graded by muscle group, reflexes graded by limb, sensory findings mapped to dermatomes, gait described against specific tests. A documentation system that flattens this into prose loses the structure that other clinicians depend on to make sense of the record.
Decisions are often irreversible. A spinal fusion is not a reversible event. A craniotomy is not a reversible event. The record of the discussion that led to the decision has to be unambiguous about what was discussed, what was understood, and what was agreed. There is little tolerance for sloppy documentation in this space.
Laterality is constant. Almost every neurosurgical record involves a side. Left or right. The cost of getting laterality wrong is well known and well documented. A documentation system that does not protect laterality as a fact requiring explicit confirmation is not safe for neurosurgery.
Risks are specific and consequential. "Risks were discussed" is not adequate documentation in this setting. The record needs to reflect that the risks of paralysis, infection, dural tear, recurrence, failure to relieve symptoms, and the alternatives, including doing nothing, were each considered and discussed. The patient's understanding of these risks is the foundation of consent.
Letters matter. A neurosurgical letter to a referring GP or to a colleague is not a courtesy. It is the way the patient's care is coordinated for months or years afterwards. A weak letter quietly degrades the patient's care across the entire downstream system.
These features are not unique to neurosurgery. They appear in cardiothoracic surgery, oncology, vascular surgery, obstetrics and any specialty where decisions are weighty and the documentation has to support those decisions years after they were made. Neurosurgery is just where they cluster most densely.
Where naive clinical AI fails in this setting
I have looked at most of the AI documentation tools currently on the market. Most of them, when I imagine using them for a real spine consultation, fail in similar ways.
They produce one output when I need several. The same encounter has to generate a clinical note, a patient summary, a referrer letter and sometimes an operative plan. A single generic note doesn't cover the work.
They flatten structure into prose. The neurological examination becomes a paragraph of narrative rather than a set of structured findings. Other clinicians cannot scan it.
They mash patient-reported information together with my interpretation. "Worsening leg pain" sits in the same paragraph as "L4-5 disc protrusion with right S1 nerve root compression," with no distinction between what the patient said and what I concluded.
They are confidently wrong about laterality, side and protected facts. I have seen drafts produce a perfect-looking note with the wrong side, and I have seen drafts confidently invent a follow-up plan that was never agreed.
They do not separate "risks discussed" from "risks understood by the patient and agreed to," and they do not record uncertainty honestly when uncertainty was part of the conversation.
They produce no useful patient summary. The output that would actually help my patients, a clear, plain-language description of the diagnosis, the plan and the safety net, is the output most often missing.
They keep no honest audit trail. I cannot see what the AI gave me and what I changed. The record I approve becomes mine, in full, with no trace of what was machine-generated.
This is not a minor list of complaints. It is a set of structural mismatches between how generic AI scribes are designed and what specialist documentation actually requires.
What specialist-grade AI documentation needs to do
Neurosurgical practice has helped me articulate a clearer version of what I think specialist clinical AI needs to do.
It needs to treat the consultation as a source event with multiple legitimate outputs. One encounter, several documents, each shaped for its actual reader.
It needs to preserve structure in clinical findings. Examinations, imaging summaries, medication lists and operative plans should be structured data, not prose.
It needs to separate patient-reported information from clinician interpretation in a way the record makes visible. This is not just a UX preference. It is the foundation of safe, defensible documentation.
It needs to protect high-stakes facts. Laterality, side, diagnosis, medication, allergy, operative name, follow-up, negation and uncertainty are facts the system should refuse to commit to the record without explicit clinician confirmation.
It needs to support a real review workflow. The clinician must be able to see drafts, correct them, add to them and reject them, with the original AI version preserved alongside the approved version.
It needs to produce a usable patient summary, not as an afterthought but as a peer output of the clinical note.
It needs to keep an honest audit trail of what was generated, what was changed and what was approved.
It needs to be designed for the realities of specialist clinics: high cognitive load, limited time per patient, low tolerance for friction and high consequences for error.
These are demanding criteria. They are also, in my view, the right criteria. A system that meets them in neurosurgery will meet them in most other specialist contexts. A system that does not meet them in neurosurgery is one that other specialties should also be cautious about.
What this teaches us about clinical AI more broadly
The temptation, in healthcare AI, is to start with the easy specialties and the easy encounters and worry about the hard ones later. I think this gets the design problem the wrong way around.
If you start with high-stakes specialist consultations as your design constraint, you build in the discipline that lower-stakes encounters also benefit from. Structured findings. Protected facts. Clinician-authored records. Multiple outputs. Honest audit trails. None of these become less useful in a primary care or outpatient setting; they just become less obviously life-and-death.
If you start with the easiest encounters, you optimise for "good enough for most" and then have to retrofit the safety architecture later. That is the path that produced most of today's clinical AI scribes, and it is why many of them feel under-engineered for serious specialist use.
This is one of the reasons I keep saying that Regenemm Voice is not a neurosurgery tool. It is a tool whose design choices are stress-tested by neurosurgery, with the intention of generalising properly to the rest of specialist practice. The complexity of neurosurgical consultations is not a niche use case. It is a useful design constraint.
Closing
If you are a clinician working in a specialist field, neurosurgery, orthopaedics, cardiology, oncology, surgery of any sort, and you are looking at AI documentation tools, the question I would ask is not "is this tool tuned for my specialty?" Most of them aren't, and tuning is the easy part anyway.
The question I would ask is: is this tool built around the things that specialist documentation has to get right: structure, protected facts, clinician authorship, multiple outputs and honest audit trail?
If the answer is no, the tool is not safe for your practice, regardless of how good the transcription sounds.
If the answer is yes, the tool is at least starting from a design that respects what you actually do.
Neurosurgery taught me what those criteria are. The longer I practise, the more convinced I become that those criteria belong to specialist clinical AI generally, not just to neurosurgery. That is the bar I am building Regenemm Voice to meet.
Related Regenemm workflow
If you practise in a specialist field where documentation has to get the imaging, the examination, the laterality and the plan right every time, Regenemm Voice is being designed for exactly that level of clinical seriousness, with neurosurgery as one of its early proving grounds.
See specialist documentation workflows in Regenemm Voice
Brendan O'Brien is Founder of Regenemm Healthcare and a practising neurosurgeon.