🧩 Bridging Human Intelligence and AI Power: How Clinical Reasoning Frameworks are informing Medical AI
🧩 Bridging Human Intelligence and AI Power: How Clinical Reasoning Frameworks Are Informing Medical AI
Executive Summary The next wave of clinical AI won’t just answer—it will reason with us. By aligning models to the structured frameworks clinicians already use (e.g., NILDOOCARP, NIHSS, Stroke pathways) and instrumenting them with mechanistic interpretability (monosemantic features, circuit tracing, feature attribution-Paper No 1), we can turn AI from a black box into a transparent clinical partner. This paper further discusses shows how to encode clinical frameworks into the AI contract, make explanations portable via FHIR, and run operations with guardrails, telemetry, and change control—illustrated through an acute ischemic stroke (large‑vessel occlusion) scenario.
🎯 The Master Clinician Challenge
The problem: Pattern‑matching AI often performs impressively yet struggles to explain how it reached a recommendation, reducing trust and learnability.
The opportunity: Master clinicians don’t rely on vibes (nor on vibe coding..); they use codified frameworks that make complex decisions teachable and auditable. If we map model internals to these frameworks—and expose structured reasoning—we get:
- Consistent triage across clinicians and shifts
- Faster decisions with documented rationale (and counterfactuals)
- Education at the point of care, not just answers
- Audit‑ready artifacts for quality and regulation
🧭 From Frameworks to Features: How We Align Model Reasoning
Clinician frameworks (examples):
- NILDOOCARP — Nature, Intensity, Location, Duration, Onset, Offset, Concomitant, Aggravating, Relieving, Precipitate (symptom characterization across systems) * SOCRATES - Site, Onset, Character, Radiation, Associated Symptoms, Timing, Exacerbating factors, Severity - — widely used in ED and prehospital care * OPQRST - Onset, Provocation, Quality, Region/Radiation, Severity, Time
- NIHSS (stroke severity) — standardized sub‑scores for language, motor, gaze, etc.
- Stroke LVO pathway — NCCT → CTA → CTP; IV thrombolysis ± mechanical thrombectomy
- Pain frameworks (SOCRATES / OPQRST)
Mechanistic interpretability pillars (model‑side): (As previously introduced- paper No 1)
- Monosemanticity — push toward one feature ≈ one concept (e.g., “Left M1 occlusion present”, “Receptive aphasia”, “NIHSS ≥ 15”).
- Circuit analysis — approximate how features link up into a decision pathway (evidence → intermediate hypotheses → recommendation).
- Feature attribution — quantify which features drove the answer (normalized contribution indices, not raw probability).
Recall that together, these let us anchor the model’s reasoning to the same scaffolds clinicians use—so everyone can see, critique, and improve the logic.
🧩 The Interpretability Stack (what we expect the AI to show)
📊 Real‑World Impact: Emergency Stroke Activation (LVO-large vessel occlusion)
Scenario (fictional for illustration): A 54‑year‑old woman with sudden‑onset slurred speech, receptive > expressive dysphasia, and right arm/leg weakness. Last known well (LKW) 10:12. ECG shows atrial fibrillation (previously undiagnosed). NCCT: no hemorrhage. CTA: left ICA terminus extending into proximal M1 thrombus. CTP: core/penumbra mismatch. Candidate for IV thrombolysis and mechanical thrombectomy.
Traditional pain points
- Inconsistent capture of LKW, symptom detail, and NIHSS sub‑scores
- Variable imaging sequencing and notifications → door‑to‑groin delays
- Rationale buried in free‑text; hard to audit or teach from
Framework‑enhanced AI (example output)
- Onset/LKW: 10:12; onset‑to‑door 24 min → thrombolysis window
- Deficits: Right hemiparesis, receptive dysphasia, mild dysarthria → left hemispheric cortical signs
- NIHSS: 17 (language + motor + gaze)
- Etiology risk: Atrial fibrillation → cardioembolic likely
- Imaging: NCCT no ICH; CTA L ICA→M1 occlusion; CTP mismatch positive
- Eligibility: Glucose/INR/platelets acceptable; no exclusions surfaced
Attribution (normalized indices, −1..+1)
- CTA LVO (+0.40) · NIHSS severity (+0.25) · CTP mismatch (+0.20) · LKW window (+0.10) · Normal glucose (−0.05 mimic‑lowering)
Counterfactuals
- If CTA showed no LVO → thrombolysis only (if eligible); no thrombectomy
- If NIHSS < 6 (milder deficits) → reconsider benefit/risk for thrombectomy
Result (target KPIs)
- Consistent stroke code activation
- Improved door‑to‑CT, door‑to‑needle, door‑to‑groin times
- Audit‑ready explanations for quality, training, and review
🧬 Multi‑Modal Integration (what’s fused)
- History & vitals: LKW, risk factors, anticoag status, BP/glucose
- Exam: NIHSS sub‑scores (Observation resources)
- Imaging: NCCT/CTA/CTP findings (ImagingStudy references)
- Labs: INR, platelets, glucose (LOINC‑coded)
- Policy: Site‑specific inclusion/exclusion checks
All evidence is linked in a FHIR Composition
for portability and auditing.
🛠 How Framework Mapping Actually Works
🔬 Circuit tracing (conceptual)
Input: "54F, sudden dysarthria, receptive>expressive aphasia, R hemiparesis; LKW 10:12; ECG AF; NCCT -ICH; CTA L ICA→M1; CTP mismatch"
1) Deficit recognition → features: aphasia, R hemiparesis
2) Syndrome & lateralization → left MCA cortical syndrome
3) Etiology prior → AF cardioembolic risk ↑
4) Imaging sequence → NCCT→CTA→CTP orchestration
5) LVO confirmation → L ICA→M1 occlusion; mismatch present
6) Eligibility check → no hemorrhage; labs within thresholds
7) Action → IV thrombolysis (if eligible) + thrombectomy activation
🧩 Three cornerstones in practice
- Monosemanticity: Target features that map cleanly to clinical concepts Examples: “L M1 occlusion present”, “Receptive aphasia”, “NIHSS ≥ 15”, “CTP mismatch positive”.
- Circuit analysis: Present the rationale DAG clinicians expect—evidence → hypotheses → recommendation—and make branches explicit.
- Feature attribution: Use normalized contribution indices with uncertainty bands; keep calibrated confidence separate from contributions.
📦 Contracts, Not Code (what the system guarantees)
Minimal API (excerpt)
post /api/explain:
requestBody:
content:
application/json:
schema: { $ref: '#/components/schemas/ExplainRequest' }
responses:
'200':
content:
application/json:
schema: { $ref: '#/components/schemas/Explanation' }
components:
schemas:
Explanation:
type: object
properties:
encounterId: { type: string }
modelVersion: { type: string }
explainAlgoVersion: { type: string }
calibratedConfidence: { type: number, minimum: 0, maximum: 1 }
contributions:
type: array
items:
$ref: '#/components/schemas/Contribution'
explanationRef: { type: string } # FHIR Composition id/url
Contribution:
type: object
properties:
factor: { type: string } # "L M1 occlusion (CTA)"
modality: { type: string } # text|image|lab|signal
index: { type: number, minimum: -1, maximum: 1 } # normalized
Why this matters: It gives clinicians and IT a stable contract. The heavy lifting (SAEs, circuit tests) can iterate behind this interface.
▶️ End‑to‑End Flow (Stroke code to thrombectomy)
sequenceDiagram
participant ED as ED Triage
participant Explain as Explain Service
participant CT as CT/CTA/CTP
participant Neuro as Neuro & Stroke Team
participant IR as Neuro‑IR Suite
participant FHIR as FHIR Store
ED->>Explain: POST /api/explain (LKW, deficits, vitals, ECG)
Explain-->>ED: { contributions[], calibratedConfidence, explanationRef }
ED->>CT: Stroke code — NCCT→CTA→CTP
CT-->>Explain: Imaging findings (no ICH; L ICA→M1; mismatch+)
Explain-->>Neuro: Rationale graph + eligibility check (counterfactuals)
Neuro->>FHIR: Create Composition + Provenance + AuditEvent
Neuro->>ED: Start IV thrombolysis (if eligible)
Neuro->>IR: Activate thrombectomy pathway
IR-->>FHIR: Procedure events (groin puncture, passes, reperfusion time)
🔐 Observability & Auditability (what we log)
- Explanation coverage: % outputs with L0–L2 artifacts
- Faithfulness suite: sanity checks, deletion curves, counterfactual reliability
- Equity dashboards: AUROC/PPV/calibration by subgroup (sex, age, language)
- Operational SLIs: latency (P95 edge), adoption, override reasons, drift MTTR
FHIR artifacts (trimmed)
{
"resourceType": "Composition",
"status": "final",
"type": { "coding": [{ "system": "http://loinc.org", "code": "60591-5" }] },
"subject": { "reference": "Patient/Stroke54F" },
"title": "AI Reasoning Summary — Acute LVO (L ICA→M1)",
"author": [{ "reference": "Device/regenemm-explain-svc" }],
"section": [
{ "title": "Evidence Anchors", "text": { "status": "generated",
"div": "<p>NIHSS=17; CTA: L ICA→M1; CTP mismatch; ECG AF; LKW 10:12</p>" } },
{ "title": "Top Contributing Factors", "text": { "status": "generated",
"div": "<p>CTA LVO (+0.40), NIHSS (+0.25), CTP mismatch (+0.20), LKW (+0.10)</p>" } },
{ "title": "Counterfactuals", "text": { "status": "generated",
"div": "<p>If CTA no LVO → thrombolysis only (if eligible); no thrombectomy.</p>" } }
]
}
Everything links to
Provenance
(model/explainer versions) andAuditEvent
(who saw what, when), aligning to IHE BALP.
⚙️ Guardrails (fail‑safes you can trust)
- Input sanity (schema, unit bounds, mandatory LKW/NIHSS fields)
- Policy safety (no autonomous orders; explicit consent and role awareness)
- Clinical guardrails (contraindications, anticoag flags →
DetectedIssue
) - Grounding & RAG (citations budget; de‑dup; jurisdiction tags AU/EU/US)
- Uncertainty & escalation (defer below threshold; reversible suggestions)
- Post‑market monitoring (drift alerts; rehearse rollback; PCCP change logs)
🛣 Implementation Roadmap (90 days to pilot)
Weeks 1–2 — Charter & data fitness
- Select 2 high‑stakes flows (e.g., Stroke LVO; Sepsis EWS)
- Define halt conditions (coverage/faithfulness/equity thresholds)
- Build Patient Context Pack capture (NILDOOCARP/NIHSS; LKW; labs)
Weeks 3–5 — Explain pipeline
/api/explain
live; SAE/probe training on open model (feature lexicon)- FHIR mappers (
Composition/Provenance/AuditEvent
) - Faithfulness suite + counterfactual harness
Weeks 6–7 — Shadow mode
- Run explanations with no action; measure lead‑time deltas; equity baselines
Weeks 8–10 — Controlled activation
- Limited live use in one unit; weekly safety huddle; dashboards online
Weeks 11–13 — PCCP & scale
- Canary updates; rollback drill; exec report; scale decision
🎓 Master Teacher Mode (education at point of care)
- Socratic prompts fill missing NILDOOCARP/NIHSS slots
- Rationale graph visible in teaching view; branches must be defended
- Counterfactual drills (“What would change your plan?”)
- Progress tracking (agreement with attending rationale, not just answers)
✅ Checklists (clip into your SOP)
Clinical service lead
- Pick pathways, targets, and halt conditions
- Approve guardrail policies and consent flow
- Convene weekly Discordance Board
Model/Platform steward
- Publish OpenAPI; set FHIR mappers live
- Dashboards: coverage, faithfulness, equity, latency
- PCCP cadence (canary → promote/rollback)
Compliance & security
- AU Core FHIR profile validation
- Provenance + AuditEvent active; PHI redaction verified
- Data retention & access reviews
📚 Resources (selected)
- Mechanistic interpretability (sparse autoencoders, feature discovery)
- Clinical AI evaluation guidelines (SPIRIT‑AI / CONSORT‑AI / DECIDE‑AI)
- FHIR
Composition
,Provenance
,AuditEvent
,DetectedIssue
(auditability) - NIST AI Risk Management Framework (governance & risk terminology)
Full bibliography and annotated links are provided in Paper No. 1. This paper focuses on framework alignment and operations.
🌟 The Bottom Line
Framework‑guided, guardrailed, interpretable AI turns opaque automation into a transparent clinical partner. When reasoning is aligned to clinician scaffolds, portable via FHIR, and measured through faithful tests, we gain speed, safety, education, and trust—without ceding human authority.
Companion repo available on written request (OpenAPI spec, mappers, test vignettes, and dashboard scaffolds).
Appendices (for engineers & architects)
Note: Keep the main paper code‑light. These appendices provide ready‑to‑use patterns. The production repo (with tests) is available on written request.
Appendix A — iOS (Swift): Capture NILDOOCARP → FHIR Observation
import Foundation
struct NildoocarpEntry: Codable {
var nature: String, intensity: String, location: String, duration: String
var onset: String, offset: String?
var concomitant: [String], aggravating: [String], relieving: [String]
var precipitate: String?
var recordedAt: Date, authorPractitionerId: String
}
func toFhirObservation(_ entry: NildoocarpEntry, patientId: String) throws -> Data {
let obs: [String: Any] = [
"resourceType": "Observation",
"status": "final",
"code": ["text": "Symptom characterization (NILDOOCARP)"],
"subject": ["reference": "Patient/\(patientId)"],
"effectiveDateTime": ISO8601DateFormatter().string(from: entry.recordedAt),
"component": [
["code": ["text": "Nature"], "valueString": entry.nature],
["code": ["text": "Intensity"], "valueString": entry.intensity],
["code": ["text": "Location"], "valueString": entry.location],
["code": ["text": "Duration"], "valueString": entry.duration],
["code": ["text": "Onset"], "valueString": entry.onset],
["code": ["text": "Offset"], "valueString": entry.offset ?? ""],
["code": ["text": "Concomitant"], "valueString": entry.concomitant.joined(separator: "; ")],
["code": ["text": "Aggravating"], "valueString": entry.aggravating.joined(separator: "; ")],
["code": ["text": "Relieving"], "valueString": entry.relieving.joined(separator: "; ")],
["code": ["text": "Precipitate"], "valueString": entry.precipitate ?? ""]
]
]
return try JSONSerialization.data(withJSONObject: obs, options: [.prettyPrinted])
}
Appendix B — Python (FastAPI): Guardrailed /api/explain
Skeleton
from fastapi import FastAPI
from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import datetime
import uuid
app = FastAPI(title="Regenemm Explain")
class Contribution(BaseModel):
factor: str
modality: str # text|image|lab|signal
index: float = Field(..., ge=-1.0, le=1.0)
class Explanation(BaseModel):
encounterId: str
modelVersion: str
explainAlgoVersion: str
calibratedConfidence: float = Field(..., ge=0.0, le=1.0)
contributions: List[Contribution]
explanationRef: str
createdAt: datetime
@app.post("/api/explain", response_model=Explanation)
def explain(_: dict) -> Explanation:
feats = [
Contribution(factor="CTA: L ICA→M1 occlusion", modality="image", index=0.40),
Contribution(factor="NIHSS severity (17)", modality="text", index=0.25),
Contribution(factor="CTP: mismatch positive", modality="image", index=0.20),
Contribution(factor="LKW within window", modality="text", index=0.10),
Contribution(factor="Glucose normal", modality="lab", index=-0.05),
]
return Explanation(
encounterId=str(uuid.uuid4()),
modelVersion="regenemm-gpt5t-2025-08",
explainAlgoVersion="1.4.0",
calibratedConfidence=0.86, # post-hoc calibrated (e.g., isotonic)
contributions=feats,
explanationRef="Composition/ai-expl-stroke-abc",
createdAt=datetime.utcnow(),
)
Production: Replace the stub with your attribution aggregator + faithfulness suite; write
Composition/Provenance/AuditEvent
upon success.
Appendix C — FHIR Composition
(stroke LVO, trimmed)
{
"resourceType": "Composition",
"status": "final",
"type": { "coding": [{ "system": "http://loinc.org", "code": "60591-5" }] },
"title": "AI Reasoning Summary — Acute LVO (L ICA→M1)",
"subject": { "reference": "Patient/Stroke54F" },
"author": [{ "reference": "Device/regenemm-explain-svc" }],
"date": "2025-08-23T10:22:00Z",
"extension": [{
"url": "urn:regenemm:ledger",
"valueString": "model=regenemm-gpt5t-2025-08;weights=sha256:…;explain=1.4.0;data=2025Q3"
}],
"section": [
{ "title": "Evidence Anchors",
"text": { "status": "generated",
"div": "<p>LKW 10:12; NIHSS=17; ECG AF; NCCT -ICH; CTA L ICA→M1; CTP mismatch+</p>" } },
{ "title": "Top Contributing Factors",
"text": { "status": "generated",
"div": "<p>CTA LVO (+0.40), NIHSS (+0.25), CTP mismatch (+0.20), LKW (+0.10)</p>" } },
{ "title": "Counterfactuals",
"text": { "status": "generated",
"div": "<p>If CTA no LVO → thrombolysis only (if eligible); thrombectomy not indicated.</p>" } },
{ "title": "Inference Policy",
"text": { "status": "generated",
"div": "<p>Human-in-the-loop; no autonomous orders; uncertainty required.</p>" } }
]
}
Appendix D — Additional Diagrams (Mermaid)
Stroke pathway rationale (DAG)
graph TD
A[Receptive>Expressive Dysphasia] --> H1[Left MCA Cortical Syndrome]
B[Right Hemiparesis] --> H1
C[AF on ECG] --> E1[Cardioembolic risk ↑]
H1 --> I1[NCCT → CTA → CTP]
I1 --> I2[CTA: L ICA→M1]
I1 --> I3[CTP: Mismatch+]
I2 --> R1[Thrombectomy Activation]
I3 --> R1
H1 --> R0[IV Thrombolysis (if eligible)]
Eight checkpoints (C0–C7)
flowchart LR
C0[C0 · Charter] --> C1[C1 · Data Fitness]
C1 --> C2[C2 · Analytical Validity]
C2 --> C3[C3 · Faithfulness]
C3 --> C4[C4 · Clinical Utility (Shadow)]
C4 --> C5[C5 · Controlled Activation]
C5 --> C6[C6 · Post‑Market Monitoring]
C6 --> C7[C7 · PCCP / Change Control]
Appendix E — Stroke Metrics (definitions)
- Door‑to‑CT: arrival → first NCCT slice
- Door‑to‑Needle: arrival → start of IV thrombolysis
- Door‑to‑Groin: arrival → arterial puncture for thrombectomy
- First‑pass reperfusion: time to first successful pass (eTICI)
- Explanation coverage: % stroke decisions with L0–L2 artifacts
- Counterfactual reliability: % counterfactual claims verified in sandbox tests
*Regenemm Healthcare — White Paper No. 2 (August 2025). This document builds on Paper No. 1 and provides a practical playbook for aligning AI to clinical frameworks with portable, faithful explanations. version: "1.0.0" date: "August 2025" Authors:
- "Dr Brendan O'Brien from CTI / Regenemm Healthcare"
- "GPT‑5 Thinking (perspective contributor), Claude Sonnet 4" tags: ["Clinical Reasoning", "Interpretability", "Mechanistic Interpretability", "FHIR", "Healthcare AI", "LLMOps", "Education"] notes: | Fictional clinical cases for design illustration only; not medical advice. Explanatory code lives in the Appendices; a companion repo is available on written request.