Neurology and Neurological Surgery
Stroke AI saves lives. Algorithms that detect large vessel occlusions cut door-to-groin time by 30-50 minutes, improving functional outcomes in randomized trials. But psychiatric AI has failed spectacularly: suicide prediction algorithms flagged thousands at high risk with positive predictive values below 1%, creating alert fatigue and false reassurance. Neurology AI excels at time-sensitive imaging interpretation but struggles with diagnosis, prognosis, and anything involving the complexity of human behavior.
After reading this chapter, you will be able to:
- Evaluate AI systems for acute stroke detection and triage, including large vessel occlusion (LVO) and intracranial hemorrhage (ICH) algorithms
- Critically assess AI applications in neuroimaging, including brain MRI lesion detection, tumor segmentation, and neurodegenerative imaging
- Understand automated seizure detection systems and their role in epilepsy management
- Analyze AI tools for neurodegenerative disease diagnosis and progression monitoring (Parkinson’s, ALS, Alzheimer’s)
- Recognize the profound limitations of psychiatric AI, including failed suicide prediction algorithms
- Apply evidence-based frameworks for evaluating neurology AI before clinical adoption
- Navigate the ethical complexities unique to neurological and psychiatric AI applications
Introduction: The Promise and Peril of Neurological AI
The human brain is the most complex structure in the known universe. Three pounds of tissue containing 86 billion neurons, each forming thousands of synaptic connections, generating thought, emotion, movement, memory, and consciousness itself.
Neurology and psychiatry attempt to diagnose and treat disorders of this incomprehensibly complex organ using a combination of: - Imaging (CT, MRI, PET) that shows structure but not function - Physiologic testing (EEG, EMG, evoked potentials) that measures electrical activity - Clinical examination (mental status, cranial nerves, motor, sensory, coordination, gait) - Patient-reported symptoms (headache character, mood symptoms, cognitive complaints)
This combination of objective data and subjective assessment creates both remarkable opportunities and profound limitations for AI in neurology.
Where neurology AI succeeds: Acute stroke imaging analysis, where large vessel occlusions create clear imaging signatures and every minute saved prevents brain damage.
Where neurology AI struggles: Diagnosing Parkinson’s disease, where the exam requires detecting subtle bradykinesia, rigidity, and postural instability that vary moment-to-moment.
Where psychiatric AI fails catastrophically: Predicting suicide risk, where algorithms trained on thousands of patients achieve impressive AUCs while generating >99% false positives that cause alert fatigue and dangerous false reassurance.
This chapter examines what works, what doesn’t, and why neurological AI demands even more clinical skepticism than other specialties.
Part 1: Stroke AI, Medicine’s Clearest AI Success Story
Large Vessel Occlusion Detection: When Minutes Matter
Acute ischemic stroke from large vessel occlusion (LVO) is a neurological emergency. Each minute of vessel occlusion destroys 1.9 million neurons. Mechanical thrombectomy can restore blood flow and prevent devastating disability, but only if patients reach comprehensive stroke centers quickly.
The problem AI solves: Many stroke patients present to community hospitals without thrombectomy capability. Identifying LVO patients who need immediate transfer is time-critical but requires expert interpretation of CT angiography that may not be immediately available at 2 AM.
The AI solution: Automated LVO detection analyzes CT angiography in seconds, flags potential LVO cases, and directly alerts the interventional neuroradiology team at receiving hospitals, bypassing the usual radiology reading queue.
Viz.ai LVO Detection
FDA clearance: 2018 (first stroke AI cleared by FDA)
Performance: - Sensitivity: 90-95% for M1/M2 occlusions - Specificity: 85-90% - Analysis time: <5 minutes from CT completion to alert - Negative predictive value: 98-99% (very few missed LVOs)
Clinical validation: Rodrigues et al. (2022) conducted the largest real-world evaluation of Viz.ai’s LVO detection in an integrated hub-and-spoke stroke network (Rodrigues et al., 2022):
- Study design: Multicenter retrospective analysis across integrated stroke network
- Diagnostic performance: High specificity and moderately high sensitivity for ICA and proximal MCA occlusions on CTA
- Clinical impact: Streamlined code stroke workflows by enabling direct notification to neurointerventionalists
How it works: 1. Patient with suspected stroke gets CT angiography at community hospital 2. Viz.ai analyzes imaging automatically (integrated with PACS) 3. If LVO detected, Viz.ai sends HIPAA-compliant alerts to: - Neurointerventionalist’s smartphone (with images) - Neurologist on call - Stroke coordinator - Receiving hospital ED 4. Transfer arranged while local team evaluates patient 5. Patient arrives at comprehensive stroke center with cath lab team already mobilized
Current deployment: Over 1,000 hospitals in the U.S., analyzing >200,000 stroke CT scans annually.
Why this works: - Clear imaging signature: LVO on CTA is visible, well-defined pattern - Time-sensitive: Every minute saved prevents brain damage - Actionable: Positive detection triggers specific intervention (thrombectomy) - Integrated workflow: Alerts go directly to decision-makers - Validated in RCTs: Multiple studies show improved outcomes
Intracranial Hemorrhage Detection: Prioritizing the Radiology Worklist
CT scans of the head are one of the most common imaging studies ordered in emergency departments. Most are normal. But the 5-10% showing intracranial hemorrhage require urgent neurosurgical consultation.
The problem: Radiology worklists are first-come-first-served. A critical ICH scan may sit in the queue behind 20 normal CTs while the radiologist reads chronologically. By the time the radiologist sees the bleed, the patient has been herniating for 30 minutes.
The AI solution: Automated ICH detection algorithms analyze every head CT in seconds, flag those with suspected hemorrhage, and move them to the top of the worklist, ensuring critical cases are read first.
Aidoc ICH Detection
FDA clearance: 2018
Performance: - Sensitivity: 95.7% for any ICH - Specificity: 95.0% - False positive rate: 5% (acceptable for triage tool) - Analysis time: <60 seconds
Types of ICH detected: - Epidural hematoma - Subdural hematoma - Subarachnoid hemorrhage - Intraparenchymal hemorrhage - Intraventricular hemorrhage
Clinical benefit: In a prospective implementation study, Arbabshirani et al. (2018) integrated a deep learning ICH detector into routine outpatient head CT workflow (Arbabshirani et al., 2018):
- Time to radiologist interpretation: AI-reprioritized studies reached interpretation in ~19 minutes vs. >8 hours for standard “routine” studies
- Mechanism: Algorithm automatically escalated routine head CTs to “stat” priority when ICH detected
- Key limitation: Radiologists must still review all scans; AI serves as triage tool, not replacement
How it works: 1. Head CT ordered in ED and sent to PACS 2. Aidoc analyzes every slice in real-time 3. If ICH suspected: - Case moved to top of radiology worklist - Alert sent to radiologist - Alert sent to ED physician - Optional alert to neurosurgery (if institution configures it) 4. Radiologist reviews flagged case immediately 5. Radiologist confirms or rejects AI finding
Current deployment: Deployed in 1,000+ hospitals globally, analyzing millions of head CTs annually.
Why this works: - Clear imaging pattern: Acute blood on CT is hyperdense, easy to detect algorithmically - Triage, not diagnosis: Algorithm doesn’t replace radiologist. It prioritizes worklist - High sensitivity acceptable: 95.7% sensitivity with 5% false positives is fine for triage (radiologist reviews all cases anyway) - Workflow integration: Smooth PACS integration, no extra clicks required - Clinical validation: Multiple studies show earlier detection
Implementation Reality: When Accurate Stroke AI Still Fails
LVO detection algorithms achieve 90-95% sensitivity. So why do some hospitals fail to see outcome improvements?
The implementation failures:
Alert fatigue: If LVO alerts go to a generic pager that also receives lab results, medication alerts, and bed assignments, interventionalists may ignore them
Workflow fragmentation: Algorithm detects LVO at community hospital, but transfer process still requires:
- ED physician calling transfer center
- Transfer center calling receiving hospital
- Receiving hospital calling interventionalist
- Interventionalist reviewing images
- Each step adds 10-15 minutes
Lack of trust: Interventionalists who don’t trust algorithm may wait for official radiology read, defeating purpose of AI early detection
False positives in wrong population: Algorithms trained on stroke patients perform poorly when applied to all head CTs (many false positives from artifacts, old infarcts, masses)
Equity gaps unknown: Most LVO algorithms trained predominantly on white populations; performance in Black and Hispanic patients not well-studied (though preliminary data suggests equivalent performance)
A real implementation failure: One academic medical center deployed Viz.ai but routed alerts to stroke coordinator’s work phone (not mobile). On nights/weekends, alerts went to voicemail. Six months post-deployment, door-to-groin times were unchanged because no one saw the alerts in real-time.
The lesson: Even the best AI is worthless without proper workflow integration.
Part 2: Neuroimaging AI Beyond Stroke
Brain Tumor Segmentation: Where AI Actually Saves Time
Radiation oncology treatment planning for brain tumors requires meticulous delineation of: - Gross tumor volume (GTV) - Clinical target volume (CTV) - Organs at risk (optic nerves, brainstem, hippocampi)
Manual segmentation takes 6-8 hours per patient. Automated AI segmentation reduces this to 30-60 minutes with radiologist review.
FDA-cleared systems: - BrainLab Elements: Automated glioblastoma segmentation - Quantib ND: Brain metastases detection and volumetry - DeepMind (research): World-class segmentation published in Nature but not commercially deployed
Performance: - Dice coefficient: 0.85-0.90 (measures overlap between AI and expert segmentation) - Time savings: 80-90% reduction in segmentation time - Consistency: Reduces inter-observer variability in target volume delineation
Clinical use: Radiation oncologists review and edit AI-generated contours rather than manually tracing every structure. This is a genuine time-saver that improves workflow without compromising quality.
Limitations: - Complex cases require extensive editing: Tumors with necrosis, hemorrhage, or prior surgery confuse algorithms - Post-treatment changes: Pseudoprogression vs. true progression remains challenging for AI - Rare tumor types: Algorithms trained on glioblastoma perform poorly on meningiomas, metastases, lymphomas
Multiple Sclerosis Lesion Detection: Promise and Pitfalls
MS diagnosis and monitoring rely on detecting white matter lesions on brain MRI. Lesion burden and new lesion formation guide treatment decisions.
AI applications: - Automated lesion counting - Lesion volume quantification - Detection of new/enlarging lesions compared to prior scans - Prediction of disease progression
Performance: Variable and scanner-dependent. Commowick et al. (2018) compared 60 MS lesion detection algorithms across 15 medical centers (Commowick et al., 2018): - Sensitivity: Ranged from 45% to 92% depending on algorithm and scanner - False positives: Many algorithms flagged normal periventricular white matter as lesions - Scanner dependence: Algorithms trained on Siemens MRI performed poorly on GE scanners
Why MS lesion AI is harder than stroke: - Lesion heterogeneity: MS lesions vary in size (2mm to 3cm), location, and appearance - Look-alikes: Small vessel ischemic disease, migraine, normal aging all produce white matter hyperintensities - Scanner variability: MRI protocols differ across institutions; algorithms don’t generalize well
Current clinical use: Research settings and pharmaceutical clinical trials (where standardized protocols and centralized reading reduce scanner variability). Not yet ready for routine clinical care.
Alzheimer’s Disease Neuroimaging: Prediction Without Treatment
AI can predict conversion from mild cognitive impairment (MCI) to Alzheimer’s dementia with AUC 0.80-0.85 using: - Hippocampal volume measurement - Entorhinal cortex thickness - Amyloid PET standardized uptake value ratios - FDG-PET glucose metabolism patterns
The clinical problem: These predictions don’t change management. We lack disease-modifying treatments for Alzheimer’s. Knowing that a patient with MCI will progress to dementia in 3 years doesn’t help them. It just causes anxiety.
Ethical concerns: - Prognostic disclosure: Should we tell patients they’ll develop dementia when we can’t prevent it? - Insurance discrimination: Will Alzheimer’s risk predictions affect life insurance, long-term care insurance? - Clinical trial recruitment: This is the main current use, enriching trials with high-risk patients
Until we have disease-modifying therapies for Alzheimer’s, these prognostic algorithms are research tools, not clinical tools. Knowing that a patient with MCI will progress to dementia in 3 years doesn’t change management. It just creates anxiety. The main legitimate use? Enriching clinical trials with high-risk patients.
Part 3: Seizure Detection and Epilepsy AI
Automated Seizure Detection from EEG
Long-term EEG monitoring generates 24-72 hours of continuous data per patient. Neurologists review this data looking for seizures, interictal epileptiform discharges, and background abnormalities.
The time burden: - 1 hour of EEG recording = ~10 minutes of expert review - 72-hour EEG = 12 hours of neurologist time - Most EEG shows no seizures (neurologists search for rare events in vast normal data)
AI solution: Automated seizure detection algorithms analyze EEG continuously, flag suspected seizures, and present condensed summaries to neurologists for review.
Persyst Seizure Detection: FDA-cleared automated EEG analysis system.
Performance: - Sensitivity for generalized tonic-clonic seizures: 92% - Sensitivity for complex partial seizures: 76% - False positive rate: 0.5-1.0 false detections per hour - Time savings: Reduces neurologist review time by 60-70%
How neurologists use it: 1. Review AI-flagged events first (likely seizures) 2. Quickly scroll through unflagged periods looking for missed events 3. Total review time: 3-4 hours instead of 12 hours for 72-hour EEG
Limitations: - Misses subtle seizures: Focal seizures without clear rhythmic activity often missed - False positives from artifacts: Chewing, movement, electrode problems cause false alarms - ICU EEG challenging: Critically ill patients on sedation with frequent interventions generate artifacts
Persyst and similar FDA-cleared systems genuinely save neurologist time. Reduces 12-hour EEG review to 3-4 hours without replacing expert interpretation. This is a useful clinical tool that works.
Wearable Seizure Detectors: High Sensitivity, High False Positive Rate
Smartwatch-based seizure detectors (Empatica Embrace, Nightwatch) use accelerometry and autonomic signals (heart rate, skin conductance) to detect generalized tonic-clonic seizures.
Use case: Preventing sudden unexpected death in epilepsy (SUDEP) by alerting caregivers when seizure occurs, particularly during sleep.
Performance: - Sensitivity for generalized tonic-clonic seizures: 90-95% - False positive rate: 1-5 false alarms per month - Focal seizures: Often undetected (no convulsive movements)
Clinical reality: - High-risk epilepsy patients (frequent convulsive seizures, intellectual disability, living alone) benefit from alerts - Low-risk patients (well-controlled focal epilepsy) find false alarms burdensome - Not a replacement for supervision: Algorithms detect seizures but can’t intervene
High-risk epilepsy patients benefit from wearable seizure detectors, particularly for SUDEP prevention during sleep. But the 1-5 false alarms per month mean this is an adjunct for high-risk patients, not a general screening tool for everyone with epilepsy.
Part 4: Neurodegenerative Disease AI
Parkinson’s Disease: Objective Motor Assessment
Parkinson’s disease diagnosis and monitoring rely on subjective clinical assessment of bradykinesia, rigidity, tremor, and gait. Symptom severity fluctuates throughout the day (medication on/off states).
AI applications: - Smartphone tapping tests: Measure finger tapping speed and rhythm - Smartwatch tremor detection: Accelerometry detects tremor frequency and amplitude - Voice analysis: Detect hypophonia and monotone speech - Gait analysis: Computer vision from smartphone video analyzes stride length, arm swing
Performance: Modest. Multiple smartphone-based PD motor assessments show only moderate correlation with MDS-UPDRS motor scores and limited ability to distinguish PD from other movement disorders:
- Correlation with MDS-UPDRS motor scores: Moderate (r values typically 0.5-0.7 across studies)
- Distinguishing PD from healthy controls: Reasonable performance in research settings
- Distinguishing PD from other movement disorders: Poor, limiting diagnostic utility
Why this doesn’t work well yet: - Bradykinesia is nuanced: Requires observing finger tapping, hand movements, leg agility. Smartphones capture only finger tapping - Medication state confounds: Patients tested 1 hour post-dose look different than 4 hours post-dose - Non-motor symptoms ignored: Cognitive impairment, autonomic dysfunction, psychiatric symptoms not measured
Current use: Research setting (clinical trials tracking motor progression). Not diagnostic.
ALS Progression Prediction: Research Tool, Not Clinical Tool
Amyotrophic lateral sclerosis (ALS) causes progressive motor neuron degeneration with highly variable progression rates. Some patients survive 10+ years; others decline within 12 months.
AI applications: ML models predict functional decline (ALSFRS-R score) using baseline demographics, genetics, respiratory function, and EMG data.
Performance: Kueffner et al. (2015) achieved AUC 0.70-0.75 for predicting rapid vs. slow progression (Kueffner et al., 2015).
Why this doesn’t matter clinically: - We lack disease-modifying treatments: Riluzole and edaravone provide minimal benefit; knowing prognosis doesn’t change management - Clinical trial enrichment: The actual use case, select rapid progressors for trials to detect treatment effects faster - Individual predictions unreliable: AUC 0.75 population-level still yields wide confidence intervals for individuals
Ethical concerns: Should we tell ALS patients they’ll likely die within 18 months when predictions are uncertain and we can’t prevent it? Most neurologists say no.
Part 5: The Psychiatric AI Disaster
Why Psychiatric AI Fails: The Fundamental Problem
Psychiatric diagnosis relies on: - Patient self-report of symptoms (low reliability) - Clinician assessment of behavior and affect (subjective, low inter-rater reliability) - Absence of biomarkers (no blood test for depression, no scan for schizophrenia) - Heterogeneous presentations (10 patients with depression may have 10 different symptom patterns)
This makes psychiatry fundamentally resistant to algorithmic approaches.
The Suicide Prediction Algorithm Failures
Vanderbilt University Suicide Risk Algorithm (2017-2020):
The promise: Predict suicide risk from EHR data (diagnoses, medications, ED visits, hospitalizations) and flag high-risk patients for intervention.
The reality: - 5,000+ patients flagged as “high risk” over 3 years - Actual suicides among flagged patients: 31 - Positive predictive value: 0.6% (99.4% false positives) - Clinical response: Alert fatigue. Clinicians stopped responding to flags after being overwhelmed by false alarms
Why it failed: - Base rate problem: Suicide is rare (even in high-risk populations, <1% attempt per year); any screening test yields massive false positives - Unpredictability: Most suicides occur in people with no prior psychiatric contact; EHR-based algorithms can’t detect them - False reassurance: “Low risk” predictions gave clinicians false confidence, potentially missing suicidal patients who didn’t fit algorithmic profile
Facebook/Meta Suicide Prevention AI (2017-2022):
The promise: Detect suicidal content in posts and live videos; alert human reviewers to contact users in crisis.
The reality: - Announced with great fanfare in 2017 as “AI saving lives” - Quietly discontinued in 2022 after internal evaluations showed minimal benefit - No published data on how many suicides were prevented - Privacy advocates raised concerns about surveillance and consent
The lesson: Suicide is inherently unpredictable. Algorithms that promise to identify who will attempt suicide based on digital data create dangerous false confidence.
Depression Diagnosis from Digital Phenotyping: Privacy Nightmare
The concept: Passively monitor smartphone use (typing speed, app usage, GPS movement patterns, voice call frequency) to detect depression without patient self-report.
The problems: 1. Consent: Is continuous monitoring with periodic algorithm-generated diagnoses truly informed consent? 2. Privacy: Smartphone data reveals intimate details of life (where you go, who you talk to, what you search) 3. Accuracy: Correlation between “reduced movement” and depression doesn’t mean algorithm can diagnose depression (could be physical illness, weather, life circumstances) 4. Equity: Algorithms trained on white populations may interpret cultural differences in communication or movement as pathology
Current status: Research-stage only. Multiple academic studies, zero validated clinical applications.
Ethical consensus: Most bioethicists and psychiatrists agree: Digital phenotyping for psychiatric diagnosis raises profound ethical concerns that haven’t been resolved.
Part 6: Equity in Neurological AI
The Underappreciated Problem
Stroke, MS, Parkinson’s, and Alzheimer’s all show different prevalence, presentation, and prognosis across racial and ethnic groups:
- Stroke: Black Americans have 2x stroke incidence of white Americans, different stroke subtypes
- MS: More common in white populations; Black patients with MS have more aggressive disease
- Alzheimer’s: Higher prevalence in Black and Hispanic Americans, often diagnosed later
- Parkinson’s: Lower prevalence in Black populations, different symptom profiles
Yet most neurology AI algorithms are trained on: - Predominantly white populations - Tertiary academic medical centers - North American and European datasets
Consequences:
LVO detection algorithms: - Preliminary data suggests equivalent performance across races - But comprehensive equity studies not published for most commercial systems - Ask vendors: “What is sensitivity/specificity stratified by race?”
MS lesion detection: - Trained mostly on white Scandinavian and North American populations (where MS prevalence is highest) - Performance in Black and Hispanic MS patients unknown
Alzheimer’s neuroimaging: - Hippocampal volume norms based on white populations - Black Americans have different brain volumetrics; algorithms may misclassify
What neurologists should do: 1. Ask vendors for race-stratified performance data 2. Validate algorithms locally on your patient population 3. Monitor for algorithmic errors by race/ethnicity 4. Don’t assume “overall accuracy” applies to all patients
Part 7: Implementation Framework
Before Adopting Neurology AI
Questions to ask vendors:
- “Where is the peer-reviewed study showing this algorithm improves patient outcomes?”
- LVO detection has this evidence (Rodrigues et al. 2022, Devlin et al. 2022)
- Most other neurology AI doesn’t
- Demand JAMA Neurology, Lancet Neurology, Stroke, Neurology, Frontiers in Neurology publications
- “What is the algorithm’s performance in patients like mine?”
- Academic medical center algorithms may fail in community hospitals
- Pediatric algorithms don’t work in adults
- Request validation data from similar patient populations
- “What is the false positive rate, and how will we manage false alarms?”
- ICH detection: 5% false positive rate = 50 false alarms per 1,000 head CTs
- Who triages these? What’s the workflow?
- “How does this integrate with our PACS/EHR/radiology workflow?”
- Demand live demonstration in your specific environment
- Poor integration = alert fatigue = missed critical cases
- “What happens when the algorithm fails?”
- All algorithms miss some cases
- LVO algorithms miss 5-10% of occlusions
- You need to know which cases are likely to be missed (posterior circulation, tandem occlusions, distal occlusions)
- “Can we validate locally before full deployment?”
- Retrospective validation on 500-1,000 prior cases
- Compare algorithm performance to actual outcomes in your population
- “What are the equity implications?”
- Request race/ethnicity-stratified performance metrics
- If vendor doesn’t have this data, algorithm wasn’t validated equitably
- “Who is liable if the algorithm misses a critical finding?”
- Read the vendor contract carefully
- Most disclaim liability
- Physicians remain responsible for all interpretations
- “What is the cost, and what’s the evidence of cost-effectiveness?”
- LVO detection: ~$50,000-100,000/year for medium-sized hospital
- Cost-effectiveness shown for LVO/ICH detection
- Most other neurology AI lacks cost-effectiveness data
- “Can you provide references from neurologists who use this tool?”
- Talk to actual users
- Ask about false positives, workflow disruptions, whether they’d recommend it
Red Flags (Walk Away If You See These)
- Claims to “diagnose” psychiatric conditions from digital data (no validated systems exist)
- Suicide risk prediction without published PPV data (all existing models have PPV <5%)
- No external validation studies (validated only in development cohort)
- Vendor refuses to share peer-reviewed publications (“proprietary algorithm”)
- Black box psychiatric AI (explainability is essential for consent and ethics)
Part 8: Cost-Benefit Reality
What Does Neurology AI Cost?
Stroke AI (LVO/ICH detection): - Viz.ai: ~$50,000-100,000/year depending on hospital size - RapidAI: Similar pricing - Aidoc ICH: ~$40,000-80,000/year
Brain tumor segmentation: - BrainLab Elements: Bundled into treatment planning system (~$10,000-20,000/year)
EEG seizure detection: - Persyst: ~$15,000-30,000/year
MS lesion detection (research only): - Not commercially available for routine use
Parkinson’s/ALS apps: - Mostly research tools, not commercial products
Do These Tools Save Money?
LVO detection: YES - Cost per LVO: ~$50-100 (based on number of strokes screened) - Benefit: 43-minute reduction in door-to-groin time → better functional outcomes → less long-term disability cost - Cost-effectiveness: Multiple studies show LVO AI saves money by reducing disability and rehabilitation costs
ICH detection: PROBABLY - Cost per ICH: ~$30-50 - Benefit: Earlier neurosurgical evaluation → faster intervention for surgical candidates - Cost-effectiveness: Not formally studied but likely cost-effective given low per-case cost
Brain tumor segmentation: YES - Saves 6-7 hours of radiation oncologist/dosimetrist time per case - At $200/hour labor cost = $1,200-1,400 saved per case - Algorithm cost: ~$20-40 per case - Clear cost savings
Seizure detection: MAYBE - Saves neurologist time (12 hours → 4 hours for 72-hour EEG review) - But doesn’t change patient outcomes - Cost-effectiveness depends on neurologist salary and EEG volume
MS/Alzheimer’s/Parkinson’s AI: NO - Research tools that don’t change clinical management - No cost savings or outcome benefits demonstrated
Part 9: The Future of Neurology AI
What’s Coming in the Next 5 Years
Likely to reach clinical use: 1. Expanded stroke AI: Posterior circulation stroke detection, stroke mimics identification 2. Automated EEG reporting: Summary reports for routine EEGs (not just seizure detection) 3. Neurosurgical planning AI: Tumor resection planning, deep brain stimulation targeting 4. Gait analysis from smartphone video: Parkinson’s and ataxia monitoring
Promising but uncertain: 1. Alzheimer’s blood biomarkers + AI: Combining plasma p-tau217 with MRI and cognitive testing for early diagnosis 2. Seizure forecasting: Predicting when seizure will occur hours in advance (still research stage) 3. Precision psychiatry: Matching patients to antidepressants based on genetics + symptoms (early trials)
Overhyped and unlikely: 1. Autonomous psychiatric diagnosis from digital phenotyping 2. Suicide prediction from social media 3. AI replacing neurological examination
The rate-limiting factor: Not algorithmic accuracy. Prospective RCTs showing improved patient outcomes and ethical frameworks for psychiatric AI.
Professional Society Guidelines on AI in Neurology
The American Academy of Neurology has endorsed the “AMA Principles for Augmented Intelligence Development, Deployment, and Use” (2023), establishing foundational guidance for AI in neurological practice.
AAN Annual Meeting AI Sessions (2024-2025):
At the 2024 AAN Annual Meeting, the session “Artificial Intelligence (AI) and the Neurologist: New Horizons” covered:
- Types of AI and machine learning relevant to neurology
- Present and potential clinical applications
- Benefits and challenges AI creates for neurologists
At AAN 2025, researchers discussed: - AI-driven behavioral analysis for Alzheimer’s disease progression modeling - Machine learning for identifying patients at high risk for hematoma expansion
AI Applications Addressed by Neurology Societies
Stroke: - LVO detection algorithms (Viz.ai, RapidAI) integrated into stroke systems of care - ASPECTS scoring automation - Perfusion imaging analysis
Epilepsy: - Automated EEG seizure detection - Long-term monitoring pattern recognition - Seizure prediction research
Neurodegenerative Disease: - Imaging biomarkers for Alzheimer’s disease - Parkinson’s disease tremor analysis - ALS progression modeling
American Clinical Neurophysiology Society (ACNS)
ACNS provides guidance on AI in neurophysiology:
- Standards for automated EEG interpretation
- Validation requirements for seizure detection algorithms
- Integration with continuous EEG monitoring systems
Implementation Note: ACNS emphasizes that automated EEG analysis should augment, not replace, neurophysiologist interpretation, particularly for critical findings.
International League Against Epilepsy (ILAE)
ILAE has addressed AI applications in epilepsy care:
- Wearable device seizure detection validation
- AI-assisted presurgical evaluation
- Automated seizure diary analysis
Key Recommendation: AI-detected seizures from wearable devices require correlation with clinical events and should not independently drive medication changes without physician review.
Endorsed Decision Support Tools
Validated tools integrated into neurological practice include:
- NIH Stroke Scale: Standardized severity assessment
- ASPECTS: CT-based stroke scoring
- ABCD2: TIA stroke risk stratification
These represent the foundation for algorithmic decision support in neurology, with AI-enhanced versions under development.
Key Takeaways
10 Principles for Neurology AI
Stroke LVO/ICH detection saves lives: These are among medicine’s clearest AI success stories; embrace them
Time-sensitive imaging AI works; diagnostic AI doesn’t: Algorithms excel at detecting what exists, struggle with what it means
Psychiatric AI has failed repeatedly: Suicide prediction, depression diagnosis from digital data. Don’t deploy these clinically
Prognostic algorithms need disease-modifying treatments: Predicting Alzheimer’s or ALS progression doesn’t help without effective therapies
Seizure detection saves neurologist time: Automated EEG review is useful adjunct, not replacement
Equity data is missing: Most algorithms lack race-stratified performance metrics; validate locally
Integration determines success: Even accurate LVO detection fails with poor workflow integration
False positives cause alert fatigue: 5% false positive rate sounds low until you get 50 false alarms per 1,000 scans
Demand RCT evidence: Technical accuracy ≠ improved patient outcomes
Neurological exam remains irreplaceable: AI assists with imaging and data interpretation but can’t replace clinical judgment
Clinical Scenario: Evaluating a Neurology AI Tool
Scenario: Your Neurology Department Is Considering MS Lesion Detection AI
The pitch: A vendor demonstrates AI that detects and quantifies MS lesions on brain MRI. They show you: - Sensitivity 94% for lesion detection - “Reduces radiologist reading time by 50%” - Automatic comparison to prior scans to detect new lesions - Cost: $75,000/year
The department chair asks for your recommendation.
Questions to Ask:
- “What peer-reviewed publications validate this algorithm?”
- Look for JAMA Neurology, Multiple Sclerosis Journal, Neurology publications
- Commowick et al. (2018) study showed MS algorithms are scanner-dependent
- “How does this algorithm perform on our specific MRI scanner?”
- Most MS AI trained on Siemens scanners
- Performance often degrades on GE or Philips scanners
- Request validation data on your exact scanner model and protocol
- “What is the false positive rate for lesion detection?”
- Small vessel ischemic disease, migraine, normal aging create white matter hyperintensities
- How does algorithm distinguish MS lesions from mimics?
- “Does this algorithm change clinical management?”
- MS diagnosis still requires McDonald criteria (clinical + imaging + CSF/evoked potentials)
- Lesion burden doesn’t directly guide treatment decisions in most cases
- If algorithm doesn’t change what you do, why pay $75,000/year?
- “How will this integrate with our radiology workflow?”
- Does it require manual upload of prior scans?
- Does it work with our PACS?
- Who reviews the AI output, radiologist or neurologist?
- “What happens with atypical presentations?”
- Tumefactive MS (large lesions mimicking tumors)
- Posterior fossa lesions (often missed by algorithms)
- Infratentorial disease
- “Can we pilot on 100 MS patients before committing to $75,000/year?”
- Compare algorithm lesion counts to expert neuroradiologist
- Measure actual time savings
- Assess false positive burden
Red Flags in This Scenario:
“Reduces reading time by 50%” without time-motion study data: Unverified claim
No external validation studies: If algorithm was validated only at vendor’s institution, performance at your hospital uncertain
Sensitivity 94% without specificity data: Useless. High sensitivity with low specificity = many false positives
Scanner-agnostic claims: MS lesion AI is notoriously scanner-dependent; claims of universal performance are suspicious
No discussion of clinical utility: Detecting lesions doesn’t equal improving patient outcomes
Check Your Understanding
Scenario 1: The LVO Alert at 2 AM
Clinical situation: You’re the interventional neuroradiologist on call. At 2:15 AM, you receive a Viz.ai mobile alert: 68-year-old man, right-sided weakness, NIHSS 14, CT angiography shows left M1 occlusion. Patient is at community hospital 30 minutes away by helicopter.
You open the Viz.ai app and review the images. The M1 occlusion is clear. But you also notice the patient had a large left MCA territory stroke 2 years ago (visible on non-contrast CT as encephalomalacia).
Question 1: Do you mobilize the cath lab team for thrombectomy?
Click to reveal answer
Answer: Yes, but with additional information gathering.
Reasoning: - Acute M1 occlusion is a thrombectomy indication regardless of prior stroke history - Prior stroke in same territory doesn’t automatically exclude thrombectomy, but does require considering: - What is baseline disability? (if mRS 5 before this stroke, thrombectomy less likely to benefit) - What is new deficit vs. baseline? (need to talk to patient’s family or primary physician) - What is stroke onset time? (still within window?)
What to do: 1. Accept the transfer (don’t delay getting patient to comprehensive stroke center) 2. Call community hospital ED while patient is en route: - What was baseline functional status? - What is time last known normal? - Any contraindications to thrombectomy (anticoagulation, recent surgery)? 3. Review imaging carefully when patient arrives: - Confirm acute occlusion (not chronic occlusion from prior stroke) - Assess collateral status - Check ASPECTS score for ischemic core 4. Make final decision based on: - Baseline function (if mRS 0-2, proceed) - Deficit is new and severe (NIHSS 14 indicates major deficit) - Time window (if <6 hours from onset, proceed; if 6-24 hours, check perfusion imaging)
Bottom line: LVO alerts are not autonomous treatment decisions. They accelerate evaluation and transfer, but clinical judgment remains essential.
Prior stroke in same territory is NOT an absolute contraindication. Many patients with prior stroke in one territory can have new stroke in another branch and benefit from thrombectomy.
The LVO algorithm did its job: Detected acute occlusion and got patient to you faster. Your job is clinical decision-making.
Scenario 2: The Suicide Risk Algorithm
Clinical situation: Your hospital deployed a suicide risk prediction algorithm that analyzes EHR data. You’re seeing a 32-year-old woman in primary care for diabetes follow-up. She mentions feeling “a bit down lately” and having trouble sleeping.
You check the EHR: The suicide risk algorithm has flagged her as “LOW RISK” (10th percentile).
Question 2: Do you skip the detailed depression and suicidality screening because the algorithm says low risk?
Click to reveal answer
Answer: Absolutely not. The algorithm is irrelevant to your clinical assessment.
Reasoning:
Why suicide risk algorithms fail: 1. Base rate problem: Even “high risk” predictions have PPV <5% (i.e., >95% false positives) 2. “Low risk” is dangerous false reassurance: Most suicides occur in people predicted to be low risk 3. Algorithms can’t detect acute stressors: EHR data from last week doesn’t capture what happened this morning (job loss, relationship breakup, eviction notice) 4. Clinical presentation matters: Patient saying “I’m feeling down” is a red flag requiring exploration, regardless of algorithmic score
What you should do: 1. Ignore the algorithm completely 2. Perform standard depression screening: - PHQ-9 questionnaire - Direct questions about suicidal ideation: “Have you thought about hurting yourself?” - Risk factors: Prior attempts, family history, access to means, substance use 3. Don’t document reliance on algorithm: If patient later completes suicide, EHR note saying “I didn’t ask about suicide because algorithm said low risk” is medicolegally indefensible 4. Advocate for removing the algorithm: Suicide risk algorithms create false confidence and should not be clinically deployed
Why this matters: - Patient is telling you she’s depressed (“feeling down, trouble sleeping”) - Believe the patient, not the algorithm - Standard of care requires assessing suicidality when patient reports depressive symptoms - No algorithm absolves you of clinical responsibility
Bottom line: Suicide risk algorithms are worse than useless. They’re dangerous. They create false reassurance that discourages proper clinical assessment.
If your hospital has deployed one, ignore it and advocate loudly for its removal.
Scenario 3: The MS Lesion Count Discrepancy
Clinical situation: You’re evaluating a 29-year-old woman with optic neuritis and a single brain MRI white matter lesion. McDonald criteria require ≥2 lesions for MS diagnosis. The radiologist’s report says “1 lesion in left periventricular white matter.”
However, the MS lesion detection AI (which your hospital recently deployed) flags 4 lesions total: the one the radiologist saw, plus 3 additional small (3-4mm) lesions in different locations.
Question 3: Do you diagnose MS based on the AI lesion count of 4?
Click to reveal answer
Answer: No. Review the MRI yourself with a neuroradiologist before making MS diagnosis.
Reasoning:
Why lesion count discrepancies happen: 1. AI false positives: Small vessel ischemia, perivascular spaces, artifacts often flagged as MS lesions 2. Size threshold differences: Radiologists may not report 2-3mm lesions; AI flags everything 3. Lesion location matters: McDonald criteria require lesions in specific locations (periventricular, cortical, infratentorial, spinal cord). AI may count lesions in non-diagnostic locations
What you should do:
- Review the MRI images yourself:
- Look at the 3 additional lesions the AI detected
- Are they in McDonald criteria locations?
- Do they look like MS lesions (ovoid, perpendicular to ventricles, Dawson fingers)?
- Or do they look like small vessel ischemic disease, perivascular spaces, artifacts?
- Get neuroradiology over-read:
- “AI detected 4 lesions; report lists 1. Can you review and clarify?”
- Experienced neuroradiologist can distinguish MS plaques from mimics
- Consider additional testing:
- Spinal cord MRI (may show additional lesions supporting MS diagnosis)
- CSF analysis for oligoclonal bands
- Evoked potentials
- Wait for second clinical event (if not urgent to start treatment)
- Don’t diagnose MS based solely on AI lesion count:
- MS diagnosis has major implications (lifelong immunosuppression, insurance, disability, prognosis)
- Requires high confidence, not algorithmic suggestion
Why this matters: - False positive MS diagnosis causes harm: Unnecessary treatment with expensive immunosuppressants that have side effects - McDonald criteria exist for a reason: They require clinical + imaging evidence to prevent overdiagnosis - AI lesion detection is a tool, not a diagnosis: It flags possible lesions for expert review, not autonomous diagnostic conclusions
A real scenario: A patient was diagnosed with MS based on AI lesion counts, started on natalizumab (PML risk), then re-reviewed by MS specialist who determined lesions were migraine-related white matter changes. Patient stopped unnecessary immunosuppression but had spent 6 months on risky medication.
Bottom line: Use MS lesion AI to help detect possible lesions, but never to diagnose MS without expert confirmation.
When AI and human expert disagree, trust the human expert (assuming they’ve reviewed the AI findings).