Cardiology and Cardiothoracic Surgery
Cardiology has used AI longer than most physicians realize. Automated ECG interpretation algorithms have analyzed billions of heartbeats since the 1990s. Today’s AI can detect hidden patterns in normal-appearing ECGs: low ejection fraction, hyperkalemia, even biological age. But alongside these validated tools, unproven heart failure prediction models generate 80% false positives, and smartwatch AFib detection creates clinical dilemmas that evidence-based guidelines don’t address. This chapter separates what actually works from what doesn’t in cardiovascular AI.
After reading this chapter, you will be able to:
- Evaluate AI systems for ECG interpretation and arrhythmia detection, including FDA-cleared algorithms and emerging applications
- Critically assess AI applications in echocardiography, cardiac MRI, and coronary CT angiography
- Understand heart failure prediction models and their clinical limitations, including false positive rates
- Analyze wearable device AI for atrial fibrillation detection and cardiovascular monitoring
- Recognize major failures in cardiovascular AI, including IBM Watson for Oncology’s cardiology applications
- Apply evidence-based frameworks for evaluating cardiology AI tools before clinical adoption
- Navigate medico-legal implications of AI-assisted cardiovascular decision-making
Introduction: Cardiovascular AI’s Promise and Pitfalls
Cardiology generates more structured data than perhaps any other medical specialty. Every heartbeat produces electrical signals. Every cardiac cycle can be imaged with ultrasound, MRI, or CT. Decades of epidemiologic studies have linked cardiovascular biomarkers to outcomes in millions of patients.
This data richness makes cardiology theoretically ideal for AI applications. And indeed, AI has been used in cardiology longer than most physicians realize. Automated ECG interpretation algorithms have been FDA-cleared since the 1990s, analyzing billions of ECGs over three decades.
But the history of cardiovascular AI includes spectacular failures alongside well-validated successes. IBM Watson’s cardiology applications produced unsafe recommendations. Proprietary heart failure prediction models achieve impressive AUC scores while generating 80% false positives. Smartwatch AFib detection creates clinical management dilemmas that evidence-based guidelines don’t address.
This chapter examines what actually works, what has failed, and how to evaluate cardiovascular AI tools critically before clinical adoption.
Part 1: ECG Interpretation AI
The 30-Year History You Didn’t Know About
If you’ve ordered an ECG in the past two decades, AI has already interpreted it. The automated interpretation printed at the top of every ECG (“Sinus rhythm,” “Acute anterior STEMI,” “Left ventricular hypertrophy”) comes from algorithms developed in the 1980s-1990s and refined over millions of ECGs.
These aren’t new AI tools. They’re three-decade-old expert systems and pattern recognition algorithms that have become so ubiquitous that we forget they’re algorithmic at all.
Performance of traditional ECG algorithms: - STEMI detection: Sensitivity 80-90%, specificity 95-98% Willems et al., 2009 - Atrial fibrillation: Sensitivity >95%, specificity >98% - Left ventricular hypertrophy: Sensitivity 60-70%, specificity 85-95%
These algorithms work. They’re validated. Guidelines from the American College of Cardiology support their use. They’ve become standard of care.
But they have limitations: - Sensitivity-specificity tradeoffs: STEMI algorithms optimized for sensitivity (to avoid missing MIs) produce false positives that experienced clinicians routinely override - Population-specific performance: Algorithms trained on predominantly white populations show worse performance in Black patients for LVH detection (Sokolow-Lyon criteria) - Over-reading and under-reading: Automated interpretations sometimes flag “abnormal ECG” for clinically insignificant findings while missing subtle ST changes that experienced cardiologists detect
The clinical lesson: Even 30-year-old, well-validated ECG AI requires physician review. Blindly accepting automated interpretations causes errors.
Implementation Reality: Why Accurate Algorithms Still Miss STEMIs
ECG algorithms achieve >90% sensitivity for STEMI detection. So why do hospitals still miss STEMIs?
The implementation failures: 1. Alert fatigue: False positive STEMI alerts (especially in patients with old infarcts, LBBB, LVH) cause providers to ignore or delay response to true positives 2. EHR integration problems: STEMI alerts buried in EHR notifications alongside medication warnings and “patient census updated” messages 3. Workflow design failures: ECG interpretation printed on paper that doesn’t trigger emergency response protocols 4. Over-reliance on AI: Providers skip careful ECG review because “the computer would have caught it”
A 2018 study of STEMI detection in 12 U.S. hospitals found: - 23% of STEMIs were missed initially despite ECG algorithm correctly identifying them - Median delay to catheterization lab: 47 minutes longer for algorithm-detected but clinician-missed STEMIs - Cause: Clinicians didn’t see or act on algorithm alerts due to poor integration Khera et al., 2018
The lesson: Implementation > accuracy. A 95% accurate algorithm is worthless if clinicians don’t see or trust its alerts.
Part 2: Cardiac Imaging AI
Echocardiography: Where AI Actually Helps
Echocardiography is operator-dependent, time-consuming, and plagued by inter-observer variability. EF measurements by different sonographers on the same patient can vary by ±15%.
AI-assisted echocardiography addresses these problems:
FDA-cleared automated echo analysis systems: - Caption Health (acquired by GE): Autonomous EF calculation from parasternal and apical views, FDA-cleared 2020 - Ultromics (UK): Automated strain analysis for coronary artery disease detection - Bay Labs Echo IQ: Autonomous EF measurement and view optimization
Performance: - Inter-observer variability reduction: From ±15% to ±5% for EF measurement Omar et al., 2023 - Time savings: 5-10 minutes per study for standard views and measurements - Accuracy: Correlation with expert cardiologist readings r=0.93-0.96
What AI does well in echo: 1. Automated endocardial border detection: Traces LV cavity more consistently than manual tracing 2. View optimization: Guides sonographer to acquire standard views correctly 3. Quantitative measurements: Chamber volumes, wall thickness, valve areas more reproducibly than manual calipers 4. Strain analysis: Automated global longitudinal strain calculation (time-consuming manually)
What AI doesn’t do well yet: - Complex valve pathology: AI struggles with multiple jets, eccentric regurgitation, prosthetic valves - Technically difficult studies: Poor acoustic windows, obesity, COPD (AI can’t compensate for fundamentally inadequate images) - Novel findings: AI detects what it was trained to detect; won’t identify rare pathology
Current clinical use: Growing adoption in community hospitals and primary care clinics, where access to expert echo readers is limited. Academic medical centers use AI for efficiency (automated measurements) but rely on cardiologist over-reads for complex cases.
Equity concerns: Most echo AI systems trained predominantly on white populations. Performance in Black patients (who have higher rates of hypertensive heart disease with different remodeling patterns) not well-studied.
Cardiac MRI and CT: Technical Excellence, Clinical Validation Pending
AI for cardiac MRI and CT shows impressive technical performance but lacks the decades of clinical validation that ECG algorithms have.
Applications: - Automated segmentation: LV/RV/atrial volume calculation from cine MRI - Perfusion defect detection: Stress MRI ischemia analysis - Coronary CT angiography analysis: Stenosis grading, plaque characterization, FFR-CT - Calcium scoring: Automated Agatston score calculation
Performance: Technical accuracy rivals expert readers (correlation r=0.90-0.95), but:
Problems: 1. No outcome studies: Do these algorithms improve patient outcomes? Unknown. 2. Vendor lock-in: Most algorithms proprietary, embedded in scanner software, can’t be independently validated 3. Overdiagnosis risk: Highly sensitive algorithms may detect “abnormalities” of uncertain clinical significance 4. Cost: CT-FFR costs $1,500-2,000 per study; clinical benefit over standard CCTA uncertain
Clinical bottom line: Use AI-assisted cardiac MRI/CT for efficiency (automated measurements save radiologist time), but don’t change clinical management based on AI findings without expert review.
Part 3: Heart Failure Prediction and the 80% False Positive Problem
Heart failure readmission prediction is a classic AI overpromise story.
The pitch: “Our proprietary machine learning algorithm predicts 30-day HF readmission with AUC 0.85! Identify high-risk patients for intensive case management!”
The reality: An AUC of 0.85 sounds impressive. But at 20% HF readmission prevalence, achieving clinically useful sensitivity (e.g., 80% to catch most readmissions) requires accepting 75% false positives.
The math: - Population: 1,000 HF discharges - Actual readmissions: 200 (20% rate) - Algorithm at 80% sensitivity: Detects 160/200 true positives - But also flags 600 false positives (from the 800 who won’t be readmitted) - Result: 760 patients flagged, of whom 160 (21%) actually readmit
Why this matters: Intensive case management costs $500-1,000 per patient. Applying it to 760 patients to prevent 160 readmissions costs $380,000-760,000. Many of those readmissions are unpreventable (sudden cardiac death, acute MI, etc.).
What are these algorithms actually detecting? Angraal et al. (2020) analyzed HF readmission prediction models and found most HF models achieve AUC through demographic proxies: age, CKD, COPD, prior admissions Angraal et al., 2020.
In other words, the algorithm isn’t discovering novel insights. It’s learning that 85-year-old patients with CKD stage 4, COPD, and three prior HF admissions are high-risk. You didn’t need machine learning to know that.
Are any HF prediction models clinically useful? CardioMEMS (implantable pulmonary artery pressure sensor) reduced HF hospitalizations by 37% in RCT Abraham et al., 2016. But this is a device that enables early intervention based on hemodynamic data, not a prediction algorithm based on EHR data.
Clinical bottom line: Be skeptical of HF readmission prediction algorithms. Ask: 1. “What is the false positive rate at the sensitivity threshold you recommend?” 2. “What interventions will we apply to algorithm-flagged patients, and what’s the evidence those interventions prevent readmissions?” 3. “How does this algorithm perform in our specific patient population?” (Most are validated only in development cohort)
Part 4: Wearable Device AI and the Asymptomatic AFib Dilemma
Apple Heart Study: 450,000 Participants, 84% PPV, Massive Clinical Uncertainty
The Apple Heart Study (Perez et al., 2019) was the largest prospective study of wearable AFib detection Perez et al., 2019.
Study design: - 419,297 participants wore Apple Watch with photoplethysmography (PPG)-based irregular pulse detection - When algorithm detected irregular pulse, participant received ECG patch to confirm AFib - Primary outcome: PPV of algorithm (what percentage of alerts were true AFib)
Results: - 2,161 participants (0.52%) received irregular pulse notifications - 450 returned ECG patches - 153 of those patches showed AFib - PPV: 84% (better than expected for screening test)
But here’s the clinical problem: - 84% of 0.52% = 0.44% of participants had confirmed AFib - Most were asymptomatic - Most had paroxysmal AFib (brief episodes) - Clinical question: Should asymptomatic paroxysmal AFib detected by smartwatch be treated with anticoagulation?
CHADS-VASc doesn’t answer this: CHADS-VASc was developed for AFib detected clinically or on ECG, not for asymptomatic device-detected episodes. Stroke risk for smartwatch-detected paroxysmal AFib is uncertain.
Ongoing trials: - HEARTLINE: Apple Watch AFib detection for stroke prevention (results pending) - GUARD-AF: Impact of early detection on outcomes
Current clinical management: No consensus. Some cardiologists anticoagulate all AFib regardless of how detected. Others require symptoms or prolonged episodes (>24 hours). Many patients end up in a clinical gray zone.
The lesson: Technology often runs ahead of evidence. We can detect things we don’t know how to manage.
Part 5: The IBM Watson Cardiology Disaster
What Went Wrong With Watson for Oncology (and Its Cardiology Applications)
IBM Watson for Oncology promised “AI-powered treatment recommendations” based on analysis of medical literature and clinical guidelines. It was deployed in oncology, but IBM also developed Watson applications for cardiology and other specialties.
What happened: - Watson produced unsafe treatment recommendations contradicting evidence-based guidelines Ross and Swetlitz, 2018 - Recommended chemotherapy for patients unlikely to benefit - Suggested medications with dangerous drug interactions - Never validated in randomized controlled trials - Never published peer-reviewed evidence of clinical benefit
Why did hospitals buy it? Aggressive marketing, partnerships with major academic centers (Memorial Sloan Kettering), and promises of “AI-assisted decision-making” that sounded impressive to hospital executives.
Why did it fail? 1. Training data problem: Watson was trained on expert preferences (what MSK oncologists recommended) not evidence (what RCTs showed worked) 2. No clinical validation: Deployed without prospective trials showing benefit 3. Black box: Physicians couldn’t understand why Watson made recommendations, eroding trust 4. Overpromising: Marketed as “thinking like a doctor” when it was really “echoing MSK treatment patterns”
Watson cardiology applications: IBM developed Watson-based tools for: - HF treatment optimization - Cardiovascular risk prediction - Medication management in complex cardiac patients
None were validated in prospective trials. All were withdrawn by 2019 after oncology failures.
The lessons for cardiovascular AI: 1. Demand RCT evidence: If a vendor can’t show published outcomes studies, don’t deploy their tool 2. Beware proprietary algorithms: Black boxes hide methodological flaws 3. Marketing ≠ evidence: Partnerships with prestigious institutions don’t prove clinical benefit 4. Physician judgment remains essential: No AI should make autonomous treatment recommendations
Part 6: Equity in Cardiovascular AI
The Pulse Oximetry Problem Extends to ECG and Echo
In 2020, researchers discovered that pulse oximeters systematically overestimate oxygen saturation in Black patients, leading to delayed treatment for hypoxemia Sjoding et al., 2020.
Similar equity problems exist in cardiovascular AI:
ECG algorithms: - Sokolow-Lyon criteria for LVH (embedded in automated ECG algorithms) have lower sensitivity in Black patients - QTc prolongation thresholds don’t account for race-specific differences in baseline QT intervals - Most ECG AI trained on predominantly white populations from tertiary care centers
Echo AI: - Black patients have different LV remodeling patterns (more concentric hypertrophy vs. eccentric dilatation) - AI trained on predominantly white populations may miscategorize Black patients’ LV geometry - Race-specific performance metrics rarely reported in FDA submissions
Smartwatch AFib detection: - PPG accuracy varies with skin tone (melanin affects light absorption) - Apple Heart Study: 88% white participants, performance in other populations uncertain
What can you do? 1. Ask vendors for race-stratified performance metrics: If they can’t provide them, the algorithm wasn’t validated equitably 2. Validate locally: Test algorithm performance in your patient population before widespread deployment 3. Monitor outcomes by race: Track algorithm errors, false positives, false negatives by race/ethnicity 4. Maintain clinical skepticism: AI is a tool, not truth. Your clinical judgment remains essential.
Part 7: Implementation Framework
Before Adopting Cardiovascular AI Tools
Questions to ask vendors:
- “Where is the peer-reviewed publication showing this algorithm improves patient outcomes?”
- If answer is “we have internal validation data,” that’s insufficient
- Demand New England Journal of Medicine, JAMA, Circulation, Lancet publications
- “What is the algorithm’s performance stratified by race, age, and sex?”
- If vendor doesn’t have this data, the algorithm wasn’t validated equitably
- Don’t accept overall performance metrics
- “What is the false positive rate at your recommended sensitivity threshold?”
- AUC alone is meaningless
- Need to understand PPV/NPV at clinically relevant operating points
- “How does the algorithm integrate with our EHR and clinical workflow?”
- Poor integration causes alert fatigue and missed diagnoses
- Demand live demonstrations in your specific EHR environment
- “What happens when the algorithm fails? What are the failure modes?”
- All algorithms fail sometimes
- You need to understand when and how to recognize failures
- “Can we validate this algorithm on our patient population before deployment?”
- Local validation is essential
- Algorithm trained at Mayo Clinic may not work at your community hospital
- “What is the cost per analysis, and what’s the evidence of cost-effectiveness?”
- CT-FFR costs $1,500-2,000 per study
- Has it been shown to improve outcomes or reduce costs compared to standard care?
- “Who is liable if the algorithm produces an incorrect result that harms a patient?”
- Most vendor contracts disclaim liability
- Liability falls on the ordering physician
- “Can you provide references from cardiologists at hospitals similar to ours who use this tool?”
- Talk to actual users, not marketing testimonials
- Ask about problems, workflow disruptions, false positives
- “Is the algorithm FDA-cleared? If yes, through what pathway?”
- 510(k) clearance ≠ clinical validation
- 510(k) requires only “substantial equivalence” to existing device
Red Flags (Walk Away If You See These)
- Vendor refuses to share peer-reviewed publications (“Our algorithm is proprietary”)
- No external validation studies (validated only on development cohort)
- Performance metrics not stratified by demographics (equity not assessed)
- Black box with no explainability (can’t understand why algorithm made recommendation)
- Vendor claims algorithm is “better than cardiologists” without RCT evidence
Part 8: Cost-Benefit Reality
What Does Cardiovascular AI Actually Cost?
ECG AI: - Most automated ECG interpretation: Included in ECG machine purchase (no marginal cost) - Mayo Clinic AI-ECG for low EF screening: Not commercially available yet (research tool)
Echo AI: - Caption Health: ~$1,000/month subscription + per-study fees - Ultromics: ~$50-100 per study - Value proposition: Saves sonographer/cardiologist time, reduces variability
Cardiac MRI/CT AI: - CT-FFR (HeartFlow): $1,500-2,000 per study - Automated MRI segmentation: Bundled into scanner software
Wearable AFib detection: - Apple Watch: $400-800 (consumer device, not medical expense) - ECG patch for confirmation: $150-300
HF prediction algorithms: - Most proprietary, bundled into population health contracts - Difficult to assess cost-effectiveness without published studies
Do These Tools Save Money?
Theoretically yes: - Earlier detection of low EF → initiate GDMT → prevent HF progression → fewer hospitalizations - Automated echo measurements → save cardiologist time → increase throughput
In practice: Uncertain: - No published cost-effectiveness analyses for most cardiovascular AI tools - Mayo Clinic AI-ECG: No data on whether earlier EF detection reduces downstream costs - CT-FFR: Cost-effectiveness vs. invasive FFR shown, but vs. standard care unclear Hlatky et al., 2015
The implementation cost no one talks about: - IT integration: $50,000-200,000 depending on complexity - Workflow redesign: Cardiologist/administrator time - Training: Sonographer/tech/physician education - Maintenance: Software updates, troubleshooting - Alert management: Triaging false positives
A realistic scenario: - Hospital purchases echo AI for $50,000/year - Saves 10 minutes per study × 5,000 studies/year = 833 hours - At $200/hour cardiologist cost = $166,600 saved - Assuming automation doesn’t reduce quality or increase errors
The question: Are those assumptions valid? We don’t have good data.
Part 9: The Future of Cardiovascular AI
What’s Coming in the Next 5 Years
Likely to reach clinical use: 1. Expanded ECG AI applications: Detection of pulmonary hypertension, aortic stenosis, HCM from ECG 2. Wearable integration with EHR: Smartwatch data flowing into medical records with clinical decision support 3. Automated echo AI in primary care: Point-of-care echo by non-cardiologists with AI guidance 4. Predictive models for sudden cardiac death: Risk stratification for ICD placement beyond EF alone
Promising but uncertain: 1. AI-guided medication optimization: Automated titration of GDMT in HF 2. Real-time procedural guidance: AI-assisted PCI, ablation, structural interventions 3. Precision medicine for CAD: Genetic + imaging + clinical data to personalize revascularization decisions
Overhyped and unlikely: 1. Autonomous cardiovascular diagnosis: AI replacing cardiologist clinical judgment 2. Smartwatch-only AFib management: Anticoagulation decisions without ECG confirmation
The rate-limiting step: Not algorithmic accuracy. Prospective randomized trials showing improved outcomes.
Most cardiovascular AI has impressive technical performance. What we lack is evidence that deploying these tools actually helps patients live longer or better.
Professional Society Guidelines on AI in Cardiology
The 2023 ESC and 2025 ACC/AHA guidelines have not yet provided specific recommendations for clinical use of artificial intelligence, highlighting a significant evidence gap. However, recent guidelines acknowledge AI’s emerging role:
2024 ACC/AHA Perioperative Guidelines: “Incorporation of artificial intelligence and machine-learning may improve risk assessment, but future studies are needed to evaluate risk-reduction strategies.”
Key Observations from Recent Guidelines: - Prospective RCTs needed to confirm AI’s efficacy and cost-effectiveness - Deep learning algorithms show promise in outperforming clinicians and conventional software for ECG diagnosis - AI enables accurate quantitative and qualitative plaque evaluation with coronary CT angiography and OCT - Multicenter validation and standardization essential before guideline integration
AHA Scientific Sessions AI Highlights (2024)
At AHA 2024, AI in cardiology featured prominently:
AI-ECHO and PanEcho Studies: - Machine learning algorithms trained on millions of echocardiographic images - Promise for automating and improving diagnostic accuracy - Potential to streamline imaging for large patient populations
Endorsed Risk Calculators
The ACC/AHA endorse several validated risk calculators that incorporate statistical modeling:
- ASCVD Risk Estimator Plus: 10-year and lifetime cardiovascular risk
- Pooled Cohort Equations: Primary prevention statin therapy decisions
- CHA2DS2-VASc: Stroke risk in atrial fibrillation
- HAS-BLED: Bleeding risk with anticoagulation
These represent validated, guideline-integrated predictive tools that precede modern AI but establish the framework for algorithmic clinical decision support.
European Society of Cardiology (ESC)
ESC has engaged with AI through:
- Digital Health Committee guidance on AI validation
- Position papers on wearable device data integration
- Framework for evaluating AI-enhanced diagnostic tools
Implementation Principle: ESC emphasizes that AI tools must demonstrate clinical utility beyond improved accuracy metrics, including impact on patient outcomes, workflow efficiency, and cost-effectiveness.
Heart Rhythm Society (HRS)
HRS has addressed AI in the context of:
- Automated ECG interpretation algorithms
- Wearable device-detected arrhythmias
- AI-assisted electrophysiology mapping
Clinical Guidance: HRS notes that AI-detected arrhythmias from consumer devices require clinical confirmation and that management pathways for AI-flagged findings remain under development.
Key Takeaways
10 Principles for Cardiovascular AI
ECG algorithms work, but require physician review: Decades of validation, but false positives and equity gaps exist
Hidden ECG patterns are real, not hype: Mayo Clinic AI-ECG detecting low EF from normal-appearing ECGs is validated science
Echo AI reduces variability: Automated measurements more reproducible than manual, but can’t replace expert interpretation for complex cases
Wearable AFib detection creates clinical dilemmas: High sensitivity, but asymptomatic paroxysmal AFib management uncertain
HF prediction models have 75-80% false positive rates: AUC 0.85 sounds great until you calculate PPV
Demand prospective outcome trials: Technical accuracy ≠ clinical benefit
Equity gaps are substantial: Most algorithms trained on predominantly white populations
Implementation > accuracy: Poor EHR integration causes missed diagnoses despite accurate algorithms
IBM Watson failed. Learn from it: No RCT evidence = don’t deploy, no matter how prestigious the vendor
You remain responsible: AI assists, but all clinical decisions and their consequences are yours
Clinical Scenario: Vendor Evaluation
Scenario: Your Cardiology Department Is Considering Purchasing an AI Tool
The pitch: A vendor demonstrates an AI tool that predicts 30-day cardiovascular mortality risk for hospitalized cardiology patients. They show you: - AUC 0.92 in internal validation - “Outperforms traditional risk scores” - Integration with your EHR - Cost: $150,000/year
The department chair asks for your recommendation.
Questions to Ask Before Recommending Purchase:
- “What peer-reviewed publications support this algorithm?”
- Look for Circulation, JACC, JAMA Cardiology publications
- Internal validation white papers are insufficient
- “What is the positive predictive value at clinically useful sensitivity thresholds?”
- If 30-day mortality is 3%, even 92% AUC may yield terrible PPV
- Ask for sensitivity/specificity table at multiple thresholds
- “How does this algorithm perform in patient populations similar to ours?”
- Algorithm validated at academic medical center may fail at community hospital
- Request performance stratified by age, race, sex, comorbidities
- “What interventions will we apply to high-risk patients identified by this algorithm?”
- If answer is “closer monitoring,” what’s the evidence that prevents deaths?
- Many high-risk patients die despite optimal care
- “What are this algorithm’s failure modes?”
- Does it underestimate risk in young patients? Overestimate in elderly?
- What clinical situations does it handle poorly?
- “Can we pilot this on 500 patients before committing to $150,000/year?”
- Local validation essential
- Compare algorithm predictions to actual outcomes in your population
- “Who is liable if a patient predicted low-risk by the algorithm dies unexpectedly?”
- Read the vendor contract carefully
- Most disclaim all liability
- “What is the cost-effectiveness compared to existing risk stratification?”
- How many lives saved per $150,000 spent?
- Any published cost-effectiveness analyses?
- “How will this integrate with nursing workflow? Who triages the high-risk alerts?”
- Implementation costs often exceed purchase price
- Alert fatigue is real
- “Can I speak with cardiologists at 3 other hospitals who use this tool?”
- Get real user experiences, not marketing testimonials
Red Flags in This Scenario:
AUC 0.92 reported without PPV/NPV: Useless without knowing false positive rate
“Outperforms traditional risk scores”: Were comparisons done in same patient cohort? Published?
No mention of prospective validation: If algorithm hasn’t been tested prospectively, it’s experimental
High annual cost without cost-effectiveness data: $150K/year is substantial; where’s the ROI evidence?
Vendor can’t explain what the algorithm learned: Black box = red flag
Check Your Understanding
Scenario 1: The AI-Detected Low EF
Clinical situation: A 58-year-old woman with hypertension presents to primary care for annual physical. ECG ordered as part of routine screening shows normal sinus rhythm, normal intervals, no ST/T changes. However, the ECG machine’s AI algorithm flags: “Low ejection fraction predicted. Recommend echocardiogram.”
Patient is asymptomatic. No dyspnea, no edema, no chest pain. Physical exam normal. You’ve never seen this AI alert before.
Question 1: Do you order the echocardiogram based on this AI prediction?
Click to reveal answer
Answer: Yes, order the echocardiogram.
Reasoning: The Mayo Clinic AI-ECG for low EF detection has been prospectively validated and published in Nature Medicine. The algorithm has 86.3% sensitivity and 85.7% specificity for detecting EF ≤35% Attia et al., 2019.
Key points: - This is validated technology, not experimental AI - Echocardiogram is low-risk test with potential high yield (early detection of reduced EF enables initiation of GDMT) - The ECG appearing normal doesn’t invalidate the algorithm. The whole point is that AI detects hidden patterns that cardiologists can’t see - Many patients with asymptomatic reduced EF benefit from early ACE inhibitor/beta-blocker therapy
However: - Counsel patient that this is a screening test and may be false positive - Explain that AI detected subtle ECG patterns suggesting possible heart dysfunction - Don’t alarm patient unnecessarily before echo confirms
If echo confirms reduced EF: Initiate GDMT (ACE-I, beta-blocker, consider SGLT2i)
If echo normal: Reassure patient, document that AI alert was false positive
Bottom line: AI-ECG low EF screening has sufficient validation to act on, especially in low-risk test like echo.
Scenario 2: The Apple Watch AFib Alert
Clinical situation: A 52-year-old man with CHADS-VASc score of 1 (age ≥50) presents with his Apple Watch showing irregular pulse notifications. He received 3 alerts over past week, all while asymptomatic. No palpitations, no dyspnea, no dizziness.
You order ECG in office: Normal sinus rhythm. You order 24-hour Holter: Shows 2-hour episode of atrial fibrillation at 3 AM (patient asleep, asymptomatic).
Question 2: Do you start anticoagulation for asymptomatic, device-detected paroxysmal AFib?
Click to reveal answer
Answer: Unclear. This is a genuine clinical gray zone.
Arguments FOR anticoagulation: - CHADS-VASc ≥1 in male patients generally indicates anticoagulation benefit - AFib is AFib regardless of how detected; stroke mechanism (atrial stasis → thrombus → embolism) doesn’t require symptoms - Subclinical AFib detected by pacemakers has been associated with increased stroke risk (though episodes were typically >24 hours) - Apple Heart Study showed 84% PPV for AFib detection. This is real AFib, not artifact
Arguments AGAINST anticoagulation: - CHADS-VASc was derived from symptomatic AFib populations; applicability to device-detected asymptomatic AFib uncertain - Paroxysmal AFib (2-hour episodes) may carry lower stroke risk than persistent AFib - Bleeding risk with anticoagulation (1-2% major bleeding/year) may outweigh benefit in very low-risk patient - No RCT evidence that treating device-detected AFib reduces stroke risk
Ongoing trials: - HEARTLINE: Apple Watch AFib detection for stroke prevention (results pending) - GUARD-AF: Impact of early detection and treatment
Current practice: - Reasonable approach 1: Anticoagulate based on CHADS-VASc ≥1 (guideline-concordant) - Reasonable approach 2: Extended monitoring (30-day patch) to assess AFib burden, anticoagulate if >6-24 hours/day (expert opinion threshold) - Reasonable approach 3: Shared decision-making with patient about uncertain benefit
What I would do: Discuss with patient: - “You have real AFib, detected by your watch and confirmed on Holter monitor” - “The stroke risk is uncertain because you’re asymptomatic and episodes are brief” - “Standard guidelines would recommend blood thinners for CHADS-VASc ≥1” - “But those guidelines weren’t designed for smartwatch-detected AFib” - “We’re waiting for research studies to clarify this, but they’re not done yet” - “Options: Start apixaban now, or extend monitoring to see how much AFib you’re having”
Bottom line: This is cutting-edge medicine where technology has outpaced evidence. Either approach (anticoagulate or monitor) is defensible. Document your reasoning carefully.
Scenario 3: The Proprietary HF Readmission Model
Clinical situation: Your hospital’s population health team purchased a proprietary AI tool that predicts 30-day HF readmission risk. The vendor claims AUC 0.87. The tool flags 35% of HF discharges as “high-risk.”
Your case management team asks: Should we apply intensive post-discharge interventions (home visits, daily phone calls, nurse case management) to all algorithm-flagged patients?
Cost of intensive intervention: $800 per patient. Your hospital discharges 400 HF patients/year.
Question 3: Do you implement the algorithm-driven intervention program?
Click to reveal answer
Answer: No, not without further analysis.
Problems with this scenario:
1. False positive rate is likely 70-80%: - Baseline HF readmission rate: ~20% - Algorithm flags 35% of patients (140 of 400 discharges) - If AUC 0.87 with sensitivity 80%, it will detect ~64 of the 80 actual readmissions (true positives) - But also flag ~76 patients who won’t readmit (false positives) - Only 64/140 = 46% of flagged patients will actually readmit
2. Cost-effectiveness is questionable: - 140 patients × $800 = $112,000 annual cost - To break even, need to prevent readmissions that cost >$112,000 - Average HF readmission cost: ~$10,000 - Need to prevent >11 readmissions (14% of actual readmissions) - Many HF readmissions are unpreventable (sudden cardiac death, acute MI, progression despite optimal therapy)
3. Intervention evidence is weak: - What’s the evidence that home visits + phone calls prevent HF readmissions? - Some studies show benefit, others don’t - Even in positive studies, NNT is typically 20-30 patients to prevent 1 readmission
4. Algorithm transparency is absent: - What features is the algorithm using? - If it’s primarily age + comorbidities (which most HF models are), you could achieve similar performance with simpler rule: “Flag all patients >80 with CKD and COPD” - Paying for proprietary algorithm to learn what you already know is wasteful
What to do instead:
- Request vendor provide:
- Peer-reviewed publication of algorithm validation
- Performance stratified by demographics
- PPV/NPV at multiple sensitivity thresholds
- Feature importance (what is the algorithm learning?)
- Pilot study:
- Apply algorithm to 100 consecutive HF discharges
- Track: How many flagged? How many actually readmit?
- Calculate: PPV in your population (may differ from vendor’s validation)
- Evaluate intervention evidence:
- Systematic review of transitional care interventions for HF
- What actually works? (Hint: Early post-discharge cardiology follow-up, medication reconciliation, patient education)
- Consider simpler approach:
- Apply intensive interventions to all HF discharges (not algorithm-selected subset)
- If intervention costs $800 and prevents even 5% of readmissions, it’s cost-effective for entire population
- Simpler than algorithmic triage
Bottom line: Proprietary algorithms with impressive AUCs often provide minimal value over clinical judgment. Demand evidence of clinical benefit and cost-effectiveness before implementation.
The algorithm isn’t necessarily wrong. It’s just not clear it adds value over existing approaches.