[Neurology and Neurological Surgery]{.chapter-title}

doi:10.5281/zenodo.18251405

Neurology and Neurological Surgery

Stroke AI saves lives. Algorithms that detect large vessel occlusions cut door-to-groin time by 30-50 minutes, improving functional outcomes in randomized trials. But psychiatric AI has failed dramatically: suicide prediction algorithms flagged thousands at high risk with positive predictive values below 1%, creating alert fatigue and false reassurance. Neurology AI excels at time-sensitive imaging interpretation but struggles with diagnosis, prognosis, and anything involving the complexity of human behavior.

Learning Objectives

After reading this chapter, you will be able to:

Evaluate AI systems for acute stroke detection and triage, including large vessel occlusion (LVO) and intracranial hemorrhage (ICH) algorithms
Critically assess AI applications in neuroimaging, including brain MRI lesion detection, tumor segmentation, and neurodegenerative imaging
Understand automated seizure detection systems and their role in epilepsy management
Analyze AI tools for neurodegenerative disease diagnosis and progression monitoring (Parkinson’s, ALS, Alzheimer’s)
Recognize the profound limitations of psychiatric AI, including failed suicide prediction algorithms
Apply evidence-based frameworks for evaluating neurology AI before clinical adoption
Navigate the ethical complexities unique to neurological and psychiatric AI applications

Chapter Summary (TL;DR)

The Clinical Context:

Neurology combines objective data (neuroimaging, EEG, EMG, cerebrospinal fluid analysis) with subjective clinical assessment (mental status examination, motor exam, gait analysis, cognitive testing). This duality creates both opportunities and challenges for AI.

Opportunities: Stroke imaging analysis, intracranial hemorrhage detection, seizure pattern recognition, and brain tumor segmentation leverage well-defined imaging patterns that AI excels at recognizing.

Challenges: Neurodegenerative disease diagnosis requires integrating subtle examination findings with patient history. Psychiatric diagnosis relies on symptom reporting and clinical judgment where biomarkers are absent. These domains resist algorithmic approaches.

The result: Neurology AI shows tremendous success in time-sensitive acute imaging interpretation but struggles with complex diagnosis, prognostication, and psychiatric applications.

Key Applications:

Large vessel occlusion (LVO) detection. Multiple FDA-cleared systems (Viz.ai, RapidAI, Brainomix), reduces door-to-groin time by 30-50 minutes, improves functional outcomes in RCTs
Intracranial hemorrhage (ICH) detection. >95% sensitivity, prioritizes radiology worklist, alerts neurosurgery, widely adopted in EDs
Brain tumor segmentation. Automated delineation for radiation planning, FDA-cleared, reduces planning time and improves consistency
Pre-surgical brain mapping (resting-state fMRI). Cirrus software (FDA 510(k) K251009) maps functional networks without task performance, 87% usability vs. 67% for task-based fMRI, enables mapping in pediatric, sedated, and non-English speaking patients
Automated seizure detection. Long-term EEG monitoring with 85-95% sensitivity, reduces neurologist review burden
Multiple sclerosis lesion detection. Variable performance across MRI scanners, useful for tracking but not diagnosis
Alzheimer’s neuroimaging. Hippocampal volumetry and amyloid PET quantification predict MCI→dementia conversion but don’t improve outcomes
Parkinson’s motor assessment. Smartphone/smartwatch apps for objective motor tracking, modest accuracy, not diagnostic
Stroke recurrence prediction. ML models show AUC 0.70-0.75, minimal improvement over CHADS-VASc and clinical scores
Suicide risk prediction. Multiple algorithmic failures, dangerous false reassurance from low-risk predictions, should not be clinically deployed
Depression diagnosis from facial analysis or voice. No validated clinical applications, privacy and consent concerns

What Actually Works:

BrainLab automated brain tumor segmentation: Reduces radiation planning time from 6-8 hours to 30-60 minutes, approved for glioblastoma treatment planning
Persyst automated seizure detection: 92% sensitivity for generalized tonic-clonic seizures on long-term EEG monitoring, FDA-cleared, reduces neurologist review time

What Doesn’t Work:

Vanderbilt Suicide Risk Algorithm: Predicted 5,000+ patients at “high risk” with PPV <1% (i.e., >99% false positives), caused alert fatigue, clinicians stopped responding
Social media suicide prevention AI: Announced with high expectations, yet independent evaluations show limited evidence of suicide prevention despite extensive algorithmic flagging
Autonomous psychiatric diagnosis from digital phenotyping: No validated systems for diagnosing depression, anxiety, PTSD, or schizophrenia from smartphone data alone
Alzheimer’s diagnosis from retinal imaging: Early studies showed promise, but external validation failed; not ready for clinical use

Critical Insights:

Time-sensitive imaging AI saves lives: LVO and ICH detection algorithms reduce treatment delays in stroke by 30-50 minutes. These are among medicine’s clearest AI success stories

Neurology AI excels at “what” not “why”: Algorithms detect that a lesion exists but struggle with what diagnosis it represents (MS plaque vs. small vessel ischemia vs. infectious lesion)

Psychiatric AI has failed repeatedly: Suicide risk algorithms, depression diagnosis tools, and behavioral prediction models consistently underperform and create ethical dilemmas

Prognostic algorithms for neurodegenerative disease are research tools: Predicting ALS progression or Alzheimer’s conversion doesn’t change management because we lack disease-modifying treatments

Equity gaps in neurology AI are under-studied: Most stroke and MS algorithms trained on predominantly white populations; performance in other demographics uncertain

Integration challenges remain: Even accurate LVO detection fails if it triggers pages to the wrong service or if interventionalists don’t trust the alerts

Clinical Bottom Line:

Neurology AI has proven clinical benefit for acute time-sensitive imaging interpretation (stroke LVO/ICH detection) where speed matters and imaging patterns are well-defined. Embrace these validated tools. They save brains and lives.

But maintain profound skepticism about: - Psychiatric AI (suicide prediction, depression diagnosis): Ethically fraught, repeatedly failed, risks false reassurance - Neurodegenerative prognostication: Doesn’t change management without disease-modifying therapies - Autonomous neurological diagnosis: Clinical context and examination findings remain essential

Demand prospective RCTs showing improved patient outcomes. Ask vendors: “What is the Number Needed to Screen to prevent one bad outcome using your algorithm?” Most can’t answer.

Medico-Legal Considerations:

Stroke AI liability: If algorithm detects LVO but interventionalist isn’t notified due to poor workflow integration, who is liable? (Usually the physician who failed to act)
False negatives in ICH detection: Algorithm missing 5% of hemorrhages doesn’t absolve physician of duty to review imaging personally
Psychiatric AI and wrongful death: If suicide risk algorithm flags patient as “low risk” and patient completes suicide, family may sue for negligent reliance on algorithmic prediction
Informed consent for experimental tools: Alzheimer’s prediction algorithms, psychiatric diagnostic tools not FDA-cleared should require explicit patient consent
Privacy in psychiatric AI: Digital phenotyping (smartphone monitoring for mood detection) raises HIPAA and consent questions
Duty to warn vs. algorithmic false positives: High false positive rates in seizure prediction may trigger unnecessary interventions or driving restrictions
Documentation requirements: Document all AI-assisted neurological decisions, especially when you disagree with algorithm recommendation

Essential Reading:

Karamchandani RR et al. (2023). “Automated detection of intracranial large vessel occlusions using Viz.ai software: Experience in a large, integrated stroke network.” Brain and Behavior. Largest real-world Viz.ai LVO validation study
Chilamkurthy S et al. (2018). “Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.” The Lancet 392:2388-2396. [Qure.ai ICH detection validation study]
Arbabshirani MR et al. (2018). “Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration.” npj Digital Medicine 1:9. [Aidoc ICH algorithm]
Walsh CG et al. (2018). “Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning.” Journal of Child Psychology and Psychiatry 59:1261-1270. Suicide prediction algorithm study
Commowick O et al. (2018). “Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure.” Scientific Reports 8:13650. MS lesion detection AI challenges

Introduction: The Promise and Peril of Neurological AI

The human brain is the most complex structure in the known universe. Three pounds of tissue containing 86 billion neurons, each forming thousands of synaptic connections, generating thought, emotion, movement, memory, and consciousness itself.

Neurology and psychiatry attempt to diagnose and treat disorders of this incomprehensibly complex organ using a combination of: - Imaging (CT, MRI, PET) that shows structure but not function - Physiologic testing (EEG, EMG, evoked potentials) that measures electrical activity - Clinical examination (mental status, cranial nerves, motor, sensory, coordination, gait) - Patient-reported symptoms (headache character, mood symptoms, cognitive complaints)

This combination of objective data and subjective assessment creates both remarkable opportunities and profound limitations for AI in neurology.

Where neurology AI succeeds: Acute stroke imaging analysis, where large vessel occlusions create clear imaging signatures and every minute saved prevents brain damage.

Where neurology AI struggles: Diagnosing Parkinson’s disease, where the exam requires detecting subtle bradykinesia, rigidity, and postural instability that vary moment-to-moment.

Where psychiatric AI fails dramatically: Predicting suicide risk, where algorithms trained on thousands of patients achieve impressive AUCs while generating >99% false positives that cause alert fatigue and dangerous false reassurance.

Neurological AI demands even more clinical skepticism than other specialties because the gap between imaging pattern recognition and clinical diagnosis is wider here than in any other domain.

Part 1: Stroke AI, Medicine’s Clearest AI Success Story

Large Vessel Occlusion Detection: When Minutes Matter

Acute ischemic stroke from large vessel occlusion (LVO) is a neurological emergency. Each minute of vessel occlusion destroys 1.9 million neurons. Mechanical thrombectomy can restore blood flow and prevent devastating disability, but only if patients reach comprehensive stroke centers quickly.

The problem AI solves: Many stroke patients present to community hospitals without thrombectomy capability. Identifying LVO patients who need immediate transfer is time-critical but requires expert interpretation of CT angiography that may not be immediately available at 2 AM.

The AI solution: Automated LVO detection analyzes CT angiography in seconds, flags potential LVO cases, and directly alerts the interventional neuroradiology team at receiving hospitals, bypassing the usual radiology reading queue.

Viz.ai LVO Detection

FDA clearance: 2018 (first stroke AI cleared by FDA)

Performance: - Sensitivity: 90-95% for M1/M2 occlusions - Specificity: 85-90% - Analysis time: <5 minutes from CT completion to alert - Negative predictive value: 98-99% (very few missed LVOs)

Clinical validation: Karamchandani et al. (2023) conducted the largest real-world evaluation of Viz.ai’s LVO detection in an integrated hub-and-spoke stroke network (Karamchandani et al., 2023):

Study design: Multicenter retrospective analysis across integrated stroke network
Diagnostic performance: High specificity and moderately high sensitivity for ICA and proximal MCA occlusions on CTA
Clinical impact: Streamlined code stroke workflows by enabling direct notification to neurointerventionalists

How it works: 1. Patient with suspected stroke gets CT angiography at community hospital 2. Viz.ai analyzes imaging automatically (integrated with PACS) 3. If LVO detected, Viz.ai sends HIPAA-compliant alerts to: - Neurointerventionalist’s smartphone (with images) - Neurologist on call - Stroke coordinator - Receiving hospital ED 4. Transfer arranged while local team evaluates patient 5. Patient arrives at comprehensive stroke center with cath lab team already mobilized

Current deployment: Over 1,000 hospitals in the U.S., analyzing >200,000 stroke CT scans annually.

Why this works: - Clear imaging signature: LVO on CTA is visible, well-defined pattern - Time-sensitive: Every minute saved prevents brain damage - Actionable: Positive detection triggers specific intervention (thrombectomy) - Integrated workflow: Alerts go directly to decision-makers - Validated in RCTs: Multiple studies show improved outcomes

Intracranial Hemorrhage Detection: Prioritizing the Radiology Worklist

CT scans of the head are one of the most common imaging studies ordered in emergency departments. Most are normal. But the 5-10% showing intracranial hemorrhage require urgent neurosurgical consultation.

The problem: Radiology worklists are first-come-first-served. A critical ICH scan may sit in the queue behind 20 normal CTs while the radiologist reads chronologically. By the time the radiologist sees the bleed, the patient has been herniating for 30 minutes.

The AI solution: Automated ICH detection algorithms analyze every head CT in seconds, flag those with suspected hemorrhage, and move them to the top of the worklist, ensuring critical cases are read first.

Aidoc ICH Detection

FDA clearance: 2018

Performance: - Sensitivity: 95.7% for any ICH - Specificity: 95.0% - False positive rate: 5% (acceptable for triage tool) - Analysis time: <60 seconds

Types of ICH detected: - Epidural hematoma - Subdural hematoma - Subarachnoid hemorrhage - Intraparenchymal hemorrhage - Intraventricular hemorrhage

Clinical benefit: In a prospective implementation study, Arbabshirani et al. (2018) integrated a deep learning ICH detector into routine outpatient head CT workflow (Arbabshirani et al., 2018):

Time to radiologist interpretation: AI-reprioritized studies reached interpretation in ~19 minutes vs. >8 hours for standard “routine” studies
Mechanism: Algorithm automatically escalated routine head CTs to “stat” priority when ICH detected
Key limitation: Radiologists must still review all scans; AI serves as triage tool, not replacement

How it works: 1. Head CT ordered in ED and sent to PACS 2. Aidoc analyzes every slice in real-time 3. If ICH suspected: - Case moved to top of radiology worklist - Alert sent to radiologist - Alert sent to ED physician - Optional alert to neurosurgery (if institution configures it) 4. Radiologist reviews flagged case immediately 5. Radiologist confirms or rejects AI finding

Current deployment: Deployed in 1,000+ hospitals globally, analyzing millions of head CTs annually.

Why this works: - Clear imaging pattern: Acute blood on CT is hyperdense, easy to detect algorithmically - Triage, not diagnosis: Algorithm doesn’t replace radiologist. It prioritizes worklist - High sensitivity acceptable: 95.7% sensitivity with 5% false positives is fine for triage (radiologist reviews all cases anyway) - Workflow integration: Smooth PACS integration, no extra clicks required - Clinical validation: Multiple studies show earlier detection

Implementation Reality: When Accurate Stroke AI Still Fails

LVO detection algorithms achieve 90-95% sensitivity. So why do some hospitals fail to see outcome improvements?

The implementation failures:

Alert fatigue: If LVO alerts go to a generic pager that also receives lab results, medication alerts, and bed assignments, interventionalists may ignore them
Workflow fragmentation: Algorithm detects LVO at community hospital, but transfer process still requires:
- ED physician calling transfer center
- Transfer center calling receiving hospital
- Receiving hospital calling interventionalist
- Interventionalist reviewing images
- Each step adds 10-15 minutes
Lack of trust: Interventionalists who don’t trust algorithm may wait for official radiology read, defeating purpose of AI early detection
False positives in wrong population: Algorithms trained on stroke patients perform poorly when applied to all head CTs (many false positives from artifacts, old infarcts, masses)
Equity gaps unknown: Most LVO algorithms trained predominantly on white populations; performance in Black and Hispanic patients not well-studied (though preliminary data suggests equivalent performance)

A real implementation failure: One academic medical center deployed Viz.ai but routed alerts to stroke coordinator’s work phone (not mobile). On nights/weekends, alerts went to voicemail. Six months post-deployment, door-to-groin times were unchanged because no one saw the alerts in real-time.

The lesson: Even the best AI is worthless without proper workflow integration.

Part 2: Neuroimaging AI Beyond Stroke

Brain Tumor Segmentation: Where AI Actually Saves Time

Radiation oncology treatment planning for brain tumors requires meticulous delineation of: - Gross tumor volume (GTV) - Clinical target volume (CTV) - Organs at risk (optic nerves, brainstem, hippocampi)

Manual segmentation takes 6-8 hours per patient. Automated AI segmentation reduces this to 30-60 minutes with radiologist review.

FDA-cleared systems: - BrainLab Elements: Automated glioblastoma segmentation - Neosoma Brain Mets: Brain metastases detection and volumetry - DeepMind (research): World-class segmentation published in Nature but not commercially deployed

Performance: - Dice coefficient: 0.85-0.90 (measures overlap between AI and expert segmentation) - Time savings: 80-90% reduction in segmentation time - Consistency: Reduces inter-observer variability in target volume delineation

Clinical use: Radiation oncologists review and edit AI-generated contours rather than manually tracing every structure. This is a genuine time-saver that improves workflow without compromising quality.

Limitations: - Complex cases require extensive editing: Tumors with necrosis, hemorrhage, or prior surgery confuse algorithms - Post-treatment changes: Pseudoprogression vs. true progression remains challenging for AI - Rare tumor types: Algorithms trained on glioblastoma perform poorly on meningiomas, metastases, lymphomas

Pre-Surgical Brain Mapping: Resting-State fMRI

Neurosurgery for brain tumors and epilepsy requires identifying critical brain networks (language, motor, vision) before resection. Traditional task-based functional MRI requires patients to perform specific tasks (speaking, moving fingers, visual tracking) while in the scanner. This approach fails in approximately one-third of cases due to patient inability to participate: children too young to follow instructions, sedated or anesthetized patients, non-English speakers, and those with cognitive impairment.

The problem task-based fMRI solves poorly: - Pediatric patients cannot reliably perform motor or language tasks during 45-60 minute scans - Patients requiring sedation for claustrophobia or movement disorders cannot participate in tasks - Non-English speakers cannot perform language mapping protocols designed for English - Patients with aphasia or severe cognitive deficits cannot follow task instructions - Task-based fMRI produces usable surgical maps in only 67% of cases

The AI solution: Resting-state fMRI analyzes spontaneous brain activity patterns while patients lie still, without performing tasks. AI algorithms identify correlated activity patterns across brain regions that correspond to functional networks (motor network, language network, visual network, default mode network).

FDA-cleared system: - Cirrus Resting State fMRI Software (Sora Neuroscience): FDA 510(k) clearance K251009, June 2025

Performance: - Usability rate: 87% of scans produce reliable surgical maps (vs. 67% for task-based fMRI) - Scan time: As few as 12 minutes of resting-state data (vs. 45-60 minutes for comprehensive task-based protocol) - Network detection: Identifies motor, language, visual, and default mode networks from spontaneous activity - Predicate device: Omniscient Neurotechnology Quicktome Software Suite (K222359)

Clinical workflow: 1. Patient undergoes 12-20 minute resting-state fMRI acquisition (eyes closed, no task performance required) 2. Cirrus software processes blood oxygen level dependent (BOLD) fMRI data 3. Algorithm generates network maps showing motor cortex, language areas, visual cortex locations 4. Quality report accompanies maps, indicating confidence levels 5. Neurosurgeon reviews maps alongside structural imaging for surgical planning 6. Maps guide tumor or seizure focus resection, identifying regions to preserve

Patient populations who benefit most: - Pediatric patients: Children as young as 5-6 can undergo resting-state mapping (just need to lie still) - Sedated patients: Those requiring anesthesia for claustrophobia or movement - Non-English speakers: Language network mapping works across all languages - Cognitively impaired: Patients with aphasia, dementia, or intellectual disability - High movement risk: Parkinson’s patients, essential tremor, severe anxiety

Development and validation: The technology was developed over decades at Washington University School of Medicine, building on foundational research in resting-state fMRI functional connectivity. The AI algorithms were created by Carl Hacker, MD, PhD during doctoral training in Eric Leuthardt’s laboratory. Initial patents filed in 2011, licensed to Sora Neuroscience (WashU startup) in 2021, with FDA clearance obtained in 2025.

Current deployment: Early adoption phase. Sora Neuroscience has non-exclusive distribution partnership with Prism Clinical Imaging for integration into Prism’s brain mapping platform. Primarily deployed at academic medical centers with neurosurgical programs.

Limitations: - Requires patient cooperation for stillness: Motion artifacts still degrade quality, though less problematic than task-based fMRI - Resolution limitations: Identifies general network locations but may lack precision for small, critical areas near resection margins - Validation against intraoperative mapping: Resting-state maps should be confirmed with awake craniotomy mapping when feasible for eloquent cortex near resection sites - Not a replacement for clinical judgment: Maps inform but do not dictate surgical decisions; surgeon experience with functional anatomy remains essential

Why this works: Functional networks show correlated spontaneous activity even at rest. The motor network activates together during movement, but component regions also show synchronized baseline activity patterns. Language networks, visual networks, and other systems exhibit similar intrinsic functional connectivity. Deep learning algorithms trained on thousands of scans can identify these patterns reliably without requiring task performance.

Clinical bottom line: Resting-state fMRI brain mapping extends functional mapping to patient populations previously excluded from pre-surgical planning. For pediatric neurosurgery and cases requiring sedation, this represents a significant advance. However, these maps complement rather than replace traditional approaches. When awake craniotomy with direct cortical stimulation is feasible, it remains the gold standard for language and motor mapping.

Multiple Sclerosis Lesion Detection: Promise and Pitfalls

MS diagnosis and monitoring rely on detecting white matter lesions on brain MRI. Lesion burden and new lesion formation guide treatment decisions.

AI applications: - Automated lesion counting - Lesion volume quantification - Detection of new/enlarging lesions compared to prior scans - Prediction of disease progression

Performance: Variable and scanner-dependent. Commowick et al. (2018) evaluated 13 MS lesion segmentation algorithms across a 53-case database from four imaging centers (Commowick et al., 2018): - Sensitivity: Varied substantially depending on algorithm and scanner - False positives: Many algorithms flagged normal periventricular white matter as lesions - Scanner dependence: Algorithms trained on Siemens MRI performed poorly on GE scanners

Why MS lesion AI is harder than stroke: - Lesion heterogeneity: MS lesions vary in size (2mm to 3cm), location, and appearance - Look-alikes: Small vessel ischemic disease, migraine, normal aging all produce white matter hyperintensities - Scanner variability: MRI protocols differ across institutions; algorithms don’t generalize well

Current clinical use: Research settings and pharmaceutical clinical trials (where standardized protocols and centralized reading reduce scanner variability). Not yet ready for routine clinical care.

Alzheimer’s Disease Neuroimaging: Prediction Without Treatment

AI can predict conversion from mild cognitive impairment (MCI) to Alzheimer’s dementia with AUC 0.80-0.85 using: - Hippocampal volume measurement - Entorhinal cortex thickness - Amyloid PET standardized uptake value ratios - FDG-PET glucose metabolism patterns

The clinical problem: These predictions don’t change management. We lack disease-modifying treatments for Alzheimer’s. Knowing that a patient with MCI will progress to dementia in 3 years doesn’t help them. It just causes anxiety.

Ethical concerns: - Prognostic disclosure: Should we tell patients they’ll develop dementia when we can’t prevent it? - Insurance discrimination: Will Alzheimer’s risk predictions affect life insurance, long-term care insurance? - Clinical trial recruitment: This is the main current use, enriching trials with high-risk patients

Until we have disease-modifying therapies for Alzheimer’s, these prognostic algorithms are research tools, not clinical tools. Knowing that a patient with MCI will progress to dementia in 3 years doesn’t change management. It just creates anxiety. The main legitimate use? Enriching clinical trials with high-risk patients.

Emerging approach: Sleep-based prediction. Foundation models trained on polysomnography data can predict dementia (C-Index 0.85) and Parkinson’s disease (C-Index 0.89) from a single night of sleep (Thapa et al., 2025). Sleep disturbances often precede clinical diagnosis by years, and REM sleep behavior disorder is a recognized prodromal marker for Parkinson’s. Whether sleep-based prediction offers advantages over neuroimaging (lower cost, wider availability) remains to be determined in prospective studies. See Emerging Technologies chapter for details.

Part 3: Seizure Detection and Epilepsy AI

Automated Seizure Detection from EEG

Long-term EEG monitoring generates 24-72 hours of continuous data per patient. Neurologists review this data looking for seizures, interictal epileptiform discharges, and background abnormalities.

The time burden: - 1 hour of EEG recording = ~10 minutes of expert review - 72-hour EEG = 12 hours of neurologist time - Most EEG shows no seizures (neurologists search for rare events in vast normal data)

AI solution: Automated seizure detection algorithms analyze EEG continuously, flag suspected seizures, and present condensed summaries to neurologists for review.

Persyst Seizure Detection: FDA-cleared automated EEG analysis system.

Performance: - Sensitivity for generalized tonic-clonic seizures: 92% - Sensitivity for complex partial seizures: 76% - False positive rate: 0.5-1.0 false detections per hour - Time savings: Reduces neurologist review time by 60-70%

How neurologists use it: 1. Review AI-flagged events first (likely seizures) 2. Quickly scroll through unflagged periods looking for missed events 3. Total review time: 3-4 hours instead of 12 hours for 72-hour EEG

Limitations: - Misses subtle seizures: Focal seizures without clear rhythmic activity often missed - False positives from artifacts: Chewing, movement, electrode problems cause false alarms - ICU EEG challenging: Critically ill patients on sedation with frequent interventions generate artifacts

Persyst and similar FDA-cleared systems genuinely save neurologist time. Reduces 12-hour EEG review to 3-4 hours without replacing expert interpretation. This is a useful clinical tool that works.

Wearable Seizure Detectors: High Sensitivity, High False Positive Rate

Smartwatch-based seizure detectors (Empatica Embrace, Nightwatch) use accelerometry and autonomic signals (heart rate, skin conductance) to detect generalized tonic-clonic seizures.

Use case: Preventing sudden unexpected death in epilepsy (SUDEP) by alerting caregivers when seizure occurs, particularly during sleep.

Performance: - Sensitivity for generalized tonic-clonic seizures: 90-95% - False positive rate: 1-5 false alarms per month - Focal seizures: Often undetected (no convulsive movements)

Clinical reality: - High-risk epilepsy patients (frequent convulsive seizures, intellectual disability, living alone) benefit from alerts - Low-risk patients (well-controlled focal epilepsy) find false alarms burdensome - Not a replacement for supervision: Algorithms detect seizures but can’t intervene

High-risk epilepsy patients benefit from wearable seizure detectors, particularly for SUDEP prevention during sleep. But the 1-5 false alarms per month mean this is an adjunct for high-risk patients, not a general screening tool for everyone with epilepsy.

Part 4: Neurodegenerative Disease AI

Parkinson’s Disease: Objective Motor Assessment

Parkinson’s disease diagnosis and monitoring rely on subjective clinical assessment of bradykinesia, rigidity, tremor, and gait. Symptom severity fluctuates throughout the day (medication on/off states).

AI applications: - Smartphone tapping tests: Measure finger tapping speed and rhythm - Smartwatch tremor detection: Accelerometry detects tremor frequency and amplitude - Voice analysis: Detect hypophonia and monotone speech - Gait analysis: Computer vision from smartphone video analyzes stride length, arm swing

Performance: Modest. Multiple smartphone-based PD motor assessments show only moderate correlation with MDS-UPDRS motor scores and limited ability to distinguish PD from other movement disorders:

Correlation with MDS-UPDRS motor scores: Moderate (r values typically 0.5-0.7 across studies)
Distinguishing PD from healthy controls: Reasonable performance in research settings
Distinguishing PD from other movement disorders: Poor, limiting diagnostic utility

Why this doesn’t work well yet: - Bradykinesia is nuanced: Requires observing finger tapping, hand movements, leg agility. Smartphones capture only finger tapping - Medication state confounds: Patients tested 1 hour post-dose look different than 4 hours post-dose - Non-motor symptoms ignored: Cognitive impairment, autonomic dysfunction, psychiatric symptoms not measured

Current use: Research setting (clinical trials tracking motor progression). Not diagnostic.

ALS Progression Prediction: Research Tool, Not Clinical Tool

Amyotrophic lateral sclerosis (ALS) causes progressive motor neuron degeneration with highly variable progression rates. Some patients survive 10+ years; others decline within 12 months.

AI applications: ML models predict functional decline (ALSFRS-R score) using baseline demographics, genetics, respiratory function, and EMG data.

Performance: Kueffner et al. (2015) achieved AUC 0.70-0.75 for predicting rapid vs. slow progression (Kueffner et al., 2015).

Why this doesn’t matter clinically: - We lack disease-modifying treatments: Riluzole and edaravone provide minimal benefit; knowing prognosis doesn’t change management - Clinical trial enrichment: The actual use case, select rapid progressors for trials to detect treatment effects faster - Individual predictions unreliable: AUC 0.75 population-level still yields wide confidence intervals for individuals

Ethical concerns: Should we tell ALS patients they’ll likely die within 18 months when predictions are uncertain and we can’t prevent it? Most neurologists say no.

Part 5: The Psychiatric AI Disaster

Why Psychiatric AI Fails: The Fundamental Problem

Psychiatric diagnosis relies on: - Patient self-report of symptoms (low reliability) - Clinician assessment of behavior and affect (subjective, low inter-rater reliability) - Absence of biomarkers (no blood test for depression, no scan for schizophrenia) - Heterogeneous presentations (10 patients with depression may have 10 different symptom patterns)

This makes psychiatry fundamentally resistant to algorithmic approaches.

The Suicide Prediction Algorithm Failures

Vanderbilt University Suicide Risk Algorithm (2017-2020):

The promise: Predict suicide risk from EHR data (diagnoses, medications, ED visits, hospitalizations) and flag high-risk patients for intervention.

The reality: - 5,000+ patients flagged as “high risk” over 3 years - Actual suicides among flagged patients: 31 - Positive predictive value: 0.6% (99.4% false positives) - Clinical response: Alert fatigue. Clinicians stopped responding to flags after being overwhelmed by false alarms

Why it failed: - Base rate problem: Suicide is rare (even in high-risk populations, <1% attempt per year); any screening test yields massive false positives - Unpredictability: Most suicides occur in people with no prior psychiatric contact; EHR-based algorithms can’t detect them - False reassurance: “Low risk” predictions gave clinicians false confidence, potentially missing suicidal patients who didn’t fit algorithmic profile

Social Media Suicide Prevention AI:

The promise: Detect suicidal content in posts and live videos; alert human reviewers to contact users in crisis.

The reality: - Platforms have deployed AI systems to detect suicide-related content - Limited published data on effectiveness in preventing actual suicides - No peer-reviewed evidence demonstrating reduced suicide rates from algorithmic detection - Privacy advocates raise concerns about surveillance and consent

The lesson: Suicide is inherently unpredictable. Algorithms that promise to identify who will attempt suicide based on digital data create dangerous false confidence without robust evidence of effectiveness.

Depression Diagnosis from Digital Phenotyping: Privacy Nightmare

The concept: Passively monitor smartphone use (typing speed, app usage, GPS movement patterns, voice call frequency) to detect depression without patient self-report.

The problems: 1. Consent: Is continuous monitoring with periodic algorithm-generated diagnoses truly informed consent? 2. Privacy: Smartphone data reveals intimate details of life (where you go, who you talk to, what you search) 3. Accuracy: Correlation between “reduced movement” and depression doesn’t mean algorithm can diagnose depression (could be physical illness, weather, life circumstances) 4. Equity: Algorithms trained on white populations may interpret cultural differences in communication or movement as pathology

Current status: Research-stage only. Multiple academic studies, zero validated clinical applications.

Ethical consensus: Most bioethicists and psychiatrists agree: Digital phenotyping for psychiatric diagnosis raises profound ethical concerns that haven’t been resolved.

Part 6: Equity in Neurological AI

The Underappreciated Problem

Stroke, MS, Parkinson’s, and Alzheimer’s all show different prevalence, presentation, and prognosis across racial and ethnic groups:

Stroke: Black Americans have 2x stroke incidence of white Americans, different stroke subtypes
MS: More common in white populations; Black patients with MS have more aggressive disease
Alzheimer’s: Higher prevalence in Black and Hispanic Americans, often diagnosed later
Parkinson’s: Lower prevalence in Black populations, different symptom profiles

Yet most neurology AI algorithms are trained on: - Predominantly white populations - Tertiary academic medical centers - North American and European datasets

Consequences:

LVO detection algorithms: - Preliminary data suggests equivalent performance across races - But comprehensive equity studies not published for most commercial systems - Ask vendors: “What is sensitivity/specificity stratified by race?”

MS lesion detection: - Trained mostly on white Scandinavian and North American populations (where MS prevalence is highest) - Performance in Black and Hispanic MS patients unknown

Alzheimer’s neuroimaging: - Hippocampal volume norms based on white populations - Black Americans have different brain volumetrics; algorithms may misclassify

What neurologists should do: 1. Ask vendors for race-stratified performance data 2. Validate algorithms locally on your patient population 3. Monitor for algorithmic errors by race/ethnicity 4. Don’t assume “overall accuracy” applies to all patients

Part 7: Implementation Framework

Before Adopting Neurology AI

Questions to ask vendors:

“Where is the peer-reviewed study showing this algorithm improves patient outcomes?”
- LVO detection has this evidence (Karamchandani et al. 2023, Devlin et al. 2022)
- Most other neurology AI doesn’t
- Demand JAMA Neurology, Lancet Neurology, Stroke, Neurology, Brain and Behavior publications
“What is the algorithm’s performance in patients like mine?”
- Academic medical center algorithms may fail in community hospitals
- Pediatric algorithms don’t work in adults
- Request validation data from similar patient populations
“What is the false positive rate, and how will we manage false alarms?”
- ICH detection: 5% false positive rate = 50 false alarms per 1,000 head CTs
- Who triages these? What’s the workflow?
“How does this integrate with our PACS/EHR/radiology workflow?”
- Demand live demonstration in your specific environment
- Poor integration = alert fatigue = missed critical cases
“What happens when the algorithm fails?”
- All algorithms miss some cases
- LVO algorithms miss 5-10% of occlusions
- You need to know which cases are likely to be missed (posterior circulation, tandem occlusions, distal occlusions)
“Can we validate locally before full deployment?”
- Retrospective validation on 500-1,000 prior cases
- Compare algorithm performance to actual outcomes in your population
“What are the equity implications?”
- Request race/ethnicity-stratified performance metrics
- If vendor doesn’t have this data, algorithm wasn’t validated equitably
“Who is liable if the algorithm misses a critical finding?”
- Read the vendor contract carefully
- Most disclaim liability
- Physicians remain responsible for all interpretations
“What is the cost, and what’s the evidence of cost-effectiveness?”
- LVO detection: ~$50,000-100,000/year for medium-sized hospital
- Cost-effectiveness shown for LVO/ICH detection
- Most other neurology AI lacks cost-effectiveness data
“Can you provide references from neurologists who use this tool?”
- Talk to actual users
- Ask about false positives, workflow disruptions, whether they’d recommend it

Red Flags (Walk Away If You See These)

Claims to “diagnose” psychiatric conditions from digital data (no validated systems exist)
Suicide risk prediction without published PPV data (all existing models have PPV <5%)
No external validation studies (validated only in development cohort)
Vendor refuses to share peer-reviewed publications (“proprietary algorithm”)
Black box psychiatric AI (explainability is essential for consent and ethics)

Part 8: Cost-Benefit Reality

What Does Neurology AI Cost?

Stroke AI (LVO/ICH detection): - Viz.ai: ~$50,000-100,000/year depending on hospital size - RapidAI: Similar pricing - Aidoc ICH: ~$40,000-80,000/year

Brain tumor segmentation: - BrainLab Elements: Bundled into treatment planning system (~$10,000-20,000/year)

EEG seizure detection: - Persyst: ~$15,000-30,000/year

MS lesion detection (research only): - Not commercially available for routine use

Parkinson’s/ALS apps: - Mostly research tools, not commercial products

Do These Tools Save Money?

LVO detection: YES - Cost per LVO: ~$50-100 (based on number of strokes screened) - Benefit: 43-minute reduction in door-to-groin time → better functional outcomes → less long-term disability cost - Cost-effectiveness: Multiple studies show LVO AI saves money by reducing disability and rehabilitation costs

ICH detection: PROBABLY - Cost per ICH: ~$30-50 - Benefit: Earlier neurosurgical evaluation → faster intervention for surgical candidates - Cost-effectiveness: Not formally studied but likely cost-effective given low per-case cost

Brain tumor segmentation: YES - Saves 6-7 hours of radiation oncologist/dosimetrist time per case - At $200/hour labor cost = $1,200-1,400 saved per case - Algorithm cost: ~$20-40 per case - Clear cost savings

Seizure detection: MAYBE - Saves neurologist time (12 hours → 4 hours for 72-hour EEG review) - But doesn’t change patient outcomes - Cost-effectiveness depends on neurologist salary and EEG volume

MS/Alzheimer’s/Parkinson’s AI: NO - Research tools that don’t change clinical management - No cost savings or outcome benefits demonstrated

Part 9: The Future of Neurology AI

What’s Coming in the Next 5 Years

Likely to reach clinical use: 1. Expanded stroke AI: Posterior circulation stroke detection, stroke mimics identification 2. Automated EEG reporting: Summary reports for routine EEGs (not just seizure detection) 3. Neurosurgical planning AI: Tumor resection planning, deep brain stimulation targeting 4. Gait analysis from smartphone video: Parkinson’s and ataxia monitoring

Promising but uncertain: 1. Alzheimer’s blood biomarkers + AI: Combining plasma p-tau217 with MRI and cognitive testing for early diagnosis 2. Seizure forecasting: Predicting when seizure will occur hours in advance (still research stage) 3. Precision psychiatry: Matching patients to antidepressants based on genetics + symptoms (early trials)

Overhyped and unlikely: 1. Autonomous psychiatric diagnosis from digital phenotyping 2. Suicide prediction from social media 3. AI replacing neurological examination

The rate-limiting factor: Not algorithmic accuracy. Prospective RCTs showing improved patient outcomes and ethical frameworks for psychiatric AI.

Professional Society Guidelines on AI in Neurology

AAN Position on AI

The American Academy of Neurology has endorsed the “AMA Principles for Augmented Intelligence Development, Deployment, and Use” (2023), establishing foundational guidance for AI in neurological practice.

AAN Annual Meeting AI Sessions (2024-2025):

At the 2024 AAN Annual Meeting, the session “Artificial Intelligence (AI) and the Neurologist: New Horizons” covered:

Types of AI and machine learning relevant to neurology
Present and potential clinical applications
Benefits and challenges AI creates for neurologists

At AAN 2025, researchers discussed: - AI-driven behavioral analysis for Alzheimer’s disease progression modeling - Machine learning for identifying patients at high risk for hematoma expansion

AI Applications Addressed by Neurology Societies

Stroke: - LVO detection algorithms (Viz.ai, RapidAI) integrated into stroke systems of care - ASPECTS scoring automation - Perfusion imaging analysis

Epilepsy: - Automated EEG seizure detection - Long-term monitoring pattern recognition - Seizure prediction research

Neurodegenerative Disease: - Imaging biomarkers for Alzheimer’s disease - Parkinson’s disease tremor analysis - ALS progression modeling

American Clinical Neurophysiology Society (ACNS)

ACNS has published extensive guidelines and consensus statements on EEG standards, including:

Standardized Critical Care EEG Terminology (2021 version)
Minimum technical requirements for clinical EEG
Guidelines for continuous EEG monitoring in critical care
Standards for neonatal EEG monitoring

These technical standards provide the foundation for evaluating automated EEG analysis systems, though ACNS has not published specific guidance on AI implementation. The society’s emphasis on standardized terminology and technical requirements supports interoperability with algorithmic tools.

International League Against Epilepsy (ILAE)

ILAE has addressed AI applications in epilepsy care:

Wearable device seizure detection validation
AI-assisted presurgical evaluation
Automated seizure diary analysis

Key Recommendation: AI-detected seizures from wearable devices require correlation with clinical events and should not independently drive medication changes without physician review.

Endorsed Decision Support Tools

Validated tools integrated into neurological practice include:

NIH Stroke Scale: Standardized severity assessment
ASPECTS: CT-based stroke scoring
ABCD2: TIA stroke risk stratification

These represent the foundation for algorithmic decision support in neurology, with AI-enhanced versions under development.

Key Takeaways

10 Principles for Neurology AI

Stroke LVO/ICH detection saves lives: These are among medicine’s clearest AI success stories; embrace them
Time-sensitive imaging AI works; diagnostic AI doesn’t: Algorithms excel at detecting what exists, struggle with what it means
Psychiatric AI has failed repeatedly: Suicide prediction, depression diagnosis from digital data. Don’t deploy these clinically
Prognostic algorithms need disease-modifying treatments: Predicting Alzheimer’s or ALS progression doesn’t help without effective therapies
Seizure detection saves neurologist time: Automated EEG review is useful adjunct, not replacement
Equity data is missing: Most algorithms lack race-stratified performance metrics; validate locally
Integration determines success: Even accurate LVO detection fails with poor workflow integration
False positives cause alert fatigue: 5% false positive rate sounds low until you get 50 false alarms per 1,000 scans
Demand RCT evidence: Technical accuracy ≠ improved patient outcomes
Neurological exam remains irreplaceable: AI assists with imaging and data interpretation but can’t replace clinical judgment

Clinical Scenario: Evaluating a Neurology AI Tool

Scenario: Your Neurology Department Is Considering MS Lesion Detection AI

The pitch: A vendor demonstrates AI that detects and quantifies MS lesions on brain MRI. They show you: - Sensitivity 94% for lesion detection - “Reduces radiologist reading time by 50%” - Automatic comparison to prior scans to detect new lesions - Cost: $75,000/year

The department chair asks for your recommendation.

Questions to Ask:

“What peer-reviewed publications validate this algorithm?”
- Look for JAMA Neurology, Multiple Sclerosis Journal, Neurology publications
- Commowick et al. (2018) study showed MS algorithms are scanner-dependent
“How does this algorithm perform on our specific MRI scanner?”
- Most MS AI trained on Siemens scanners
- Performance often degrades on GE or Philips scanners
- Request validation data on your exact scanner model and protocol
“What is the false positive rate for lesion detection?”
- Small vessel ischemic disease, migraine, normal aging create white matter hyperintensities
- How does algorithm distinguish MS lesions from mimics?
“Does this algorithm change clinical management?”
- MS diagnosis still requires McDonald criteria (clinical + imaging + CSF/evoked potentials)
- Lesion burden doesn’t directly guide treatment decisions in most cases
- If algorithm doesn’t change what you do, why pay $75,000/year?
“How will this integrate with our radiology workflow?”
- Does it require manual upload of prior scans?
- Does it work with our PACS?
- Who reviews the AI output, radiologist or neurologist?
“What happens with atypical presentations?”
- Tumefactive MS (large lesions mimicking tumors)
- Posterior fossa lesions (often missed by algorithms)
- Infratentorial disease
“Can we pilot on 100 MS patients before committing to $75,000/year?”
- Compare algorithm lesion counts to expert neuroradiologist
- Measure actual time savings
- Assess false positive burden

Red Flags in This Scenario:

“Reduces reading time by 50%” without time-motion study data: Unverified claim

No external validation studies: If algorithm was validated only at vendor’s institution, performance at your hospital uncertain

Sensitivity 94% without specificity data: Useless. High sensitivity with low specificity = many false positives

Scanner-agnostic claims: MS lesion AI is notoriously scanner-dependent; claims of universal performance are suspicious

No discussion of clinical utility: Detecting lesions doesn’t equal improving patient outcomes

Check Your Understanding

Scenario 1: The LVO Alert at 2 AM

Clinical situation: You’re the interventional neuroradiologist on call. At 2:15 AM, you receive a Viz.ai mobile alert: 68-year-old man, right-sided weakness, NIHSS 14, CT angiography shows left M1 occlusion. Patient is at community hospital 30 minutes away by helicopter.

You open the Viz.ai app and review the images. The M1 occlusion is clear. But you also notice the patient had a large left MCA territory stroke 2 years ago (visible on non-contrast CT as encephalomalacia).

Question 1: Do you mobilize the cath lab team for thrombectomy?

Click to reveal answer

Answer: Yes, but with additional information gathering.

Reasoning: - Acute M1 occlusion is a thrombectomy indication regardless of prior stroke history - Prior stroke in same territory doesn’t automatically exclude thrombectomy, but does require considering: - What is baseline disability? (if mRS 5 before this stroke, thrombectomy less likely to benefit) - What is new deficit vs. baseline? (need to talk to patient’s family or primary physician) - What is stroke onset time? (still within window?)

What to do: 1. Accept the transfer (don’t delay getting patient to comprehensive stroke center) 2. Call community hospital ED while patient is en route: - What was baseline functional status? - What is time last known normal? - Any contraindications to thrombectomy (anticoagulation, recent surgery)? 3. Review imaging carefully when patient arrives: - Confirm acute occlusion (not chronic occlusion from prior stroke) - Assess collateral status - Check ASPECTS score for ischemic core 4. Make final decision based on: - Baseline function (if mRS 0-2, proceed) - Deficit is new and severe (NIHSS 14 indicates major deficit) - Time window (if <6 hours from onset, proceed; if 6-24 hours, check perfusion imaging)

Bottom line: LVO alerts are not autonomous treatment decisions. They accelerate evaluation and transfer, but clinical judgment remains essential.

Prior stroke in same territory is NOT an absolute contraindication. Many patients with prior stroke in one territory can have new stroke in another branch and benefit from thrombectomy.

The LVO algorithm did its job: Detected acute occlusion and got patient to you faster. Your job is clinical decision-making.

Scenario 2: The Suicide Risk Algorithm

Clinical situation: Your hospital deployed a suicide risk prediction algorithm that analyzes EHR data. You’re seeing a 32-year-old woman in primary care for diabetes follow-up. She mentions feeling “a bit down lately” and having trouble sleeping.

You check the EHR: The suicide risk algorithm has flagged her as “LOW RISK” (10th percentile).

Question 2: Do you skip the detailed depression and suicidality screening because the algorithm says low risk?

Click to reveal answer

Answer: Absolutely not. The algorithm is irrelevant to your clinical assessment.

Reasoning:

Why suicide risk algorithms fail: 1. Base rate problem: Even “high risk” predictions have PPV <5% (i.e., >95% false positives) 2. “Low risk” is dangerous false reassurance: Most suicides occur in people predicted to be low risk 3. Algorithms can’t detect acute stressors: EHR data from last week doesn’t capture what happened this morning (job loss, relationship breakup, eviction notice) 4. Clinical presentation matters: Patient saying “I’m feeling down” is a red flag requiring exploration, regardless of algorithmic score

What you should do: 1. Ignore the algorithm completely 2. Perform standard depression screening: - PHQ-9 questionnaire - Direct questions about suicidal ideation: “Have you thought about hurting yourself?” - Risk factors: Prior attempts, family history, access to means, substance use 3. Don’t document reliance on algorithm: If patient later completes suicide, EHR note saying “I didn’t ask about suicide because algorithm said low risk” is medicolegally indefensible 4. Advocate for removing the algorithm: Suicide risk algorithms create false confidence and should not be clinically deployed

Why this matters: - Patient is telling you she’s depressed (“feeling down, trouble sleeping”) - Believe the patient, not the algorithm - Standard of care requires assessing suicidality when patient reports depressive symptoms - No algorithm absolves you of clinical responsibility

Bottom line: Suicide risk algorithms are worse than useless. They’re dangerous. They create false reassurance that discourages proper clinical assessment.

If your hospital has deployed one, ignore it and advocate loudly for its removal.

Scenario 3: The MS Lesion Count Discrepancy

Clinical situation: You’re evaluating a 29-year-old woman with optic neuritis and a single brain MRI white matter lesion. McDonald criteria require ≥2 lesions for MS diagnosis. The radiologist’s report says “1 lesion in left periventricular white matter.”

However, the MS lesion detection AI (which your hospital recently deployed) flags 4 lesions total: the one the radiologist saw, plus 3 additional small (3-4mm) lesions in different locations.

Question 3: Do you diagnose MS based on the AI lesion count of 4?

Click to reveal answer

Answer: No. Review the MRI yourself with a neuroradiologist before making MS diagnosis.

Reasoning:

Why lesion count discrepancies happen: 1. AI false positives: Small vessel ischemia, perivascular spaces, artifacts often flagged as MS lesions 2. Size threshold differences: Radiologists may not report 2-3mm lesions; AI flags everything 3. Lesion location matters: McDonald criteria require lesions in specific locations (periventricular, cortical, infratentorial, spinal cord). AI may count lesions in non-diagnostic locations

What you should do:

Review the MRI images yourself:
- Look at the 3 additional lesions the AI detected
- Are they in McDonald criteria locations?
- Do they look like MS lesions (ovoid, perpendicular to ventricles, Dawson fingers)?
- Or do they look like small vessel ischemic disease, perivascular spaces, artifacts?
Get neuroradiology over-read:
- “AI detected 4 lesions; report lists 1. Can you review and clarify?”
- Experienced neuroradiologist can distinguish MS plaques from mimics
Consider additional testing:
- Spinal cord MRI (may show additional lesions supporting MS diagnosis)
- CSF analysis for oligoclonal bands
- Evoked potentials
- Wait for second clinical event (if not urgent to start treatment)
Don’t diagnose MS based solely on AI lesion count:
- MS diagnosis has major implications (lifelong immunosuppression, insurance, disability, prognosis)
- Requires high confidence, not algorithmic suggestion

Why this matters: - False positive MS diagnosis causes harm: Unnecessary treatment with expensive immunosuppressants that have side effects - McDonald criteria exist for a reason: They require clinical + imaging evidence to prevent overdiagnosis - AI lesion detection is a tool, not a diagnosis: It flags possible lesions for expert review, not autonomous diagnostic conclusions

A real scenario: A patient was diagnosed with MS based on AI lesion counts, started on natalizumab (PML risk), then re-reviewed by MS specialist who determined lesions were migraine-related white matter changes. Patient stopped unnecessary immunosuppression but had spent 6 months on risky medication.

Bottom line: Use MS lesion AI to help detect possible lesions, but never to diagnose MS without expert confirmation.

When AI and human expert disagree, trust the human expert (assuming they’ve reviewed the AI findings).