Cardiology and Cardiothoracic Surgery

Cardiology has used AI longer than most physicians realize. Automated ECG interpretation algorithms have analyzed billions of heartbeats since the 1990s. Today’s AI can detect hidden patterns in normal-appearing ECGs: low ejection fraction, hyperkalemia, even biological age. But alongside these validated tools, unproven heart failure prediction models generate 80% false positives, and smartwatch AFib detection creates clinical dilemmas that evidence-based guidelines don’t address. This chapter separates what actually works from what doesn’t in cardiovascular AI.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI systems for ECG interpretation and arrhythmia detection, including FDA-cleared algorithms and emerging applications
  • Critically assess AI applications in echocardiography, cardiac MRI, and coronary CT angiography
  • Understand heart failure prediction models and their clinical limitations, including false positive rates
  • Analyze wearable device AI for atrial fibrillation detection and cardiovascular monitoring
  • Recognize major failures in cardiovascular AI, including IBM Watson for Oncology’s cardiology applications
  • Apply evidence-based frameworks for evaluating cardiology AI tools before clinical adoption
  • Navigate medico-legal implications of AI-assisted cardiovascular decision-making

The Clinical Context:

Cardiovascular AI leverages rich physiologic signals (ECG, echocardiography, cardiac MRI, wearables) and decades of outcome data from millions of patients. Applications range from well-validated and widely deployed (ECG interpretation algorithms from the 1990s) to clinically promising (automated echocardiography) to controversial and sometimes harmful (deep learning risk prediction without external validation).

The abundance of cardiovascular data (structured ECG waveforms, standardized imaging protocols, well-defined clinical endpoints) makes cardiology theoretically ideal for AI. But clinical validation, algorithmic equity, integration with existing workflows, and honest acknowledgment of limitations remain critical challenges that vendor marketing often obscures.

Key Applications:

  • ECG interpretation AI: FDA-cleared algorithms in most ECG machines (1990s technology), high accuracy for STEMI/AFib/LVH, well-validated
  • AI-enabled hidden ECG patterns: Detect low EF, hyperkalemia, age from normal-appearing ECGs (Mayo Clinic studies, Lancet/JACC publications)
  • Echocardiography automation: Automated EF calculation, chamber quantification, reduces inter-observer variability (JASE evidence)
  • Cardiac MRI/CT analysis: Strong technical performance but clinical validation ongoing, equity concerns
  • Heart failure prediction models: AUC 0.75-0.85 but high false positive rates (70-80%) limit utility
  • Smartwatch AFib detection: Apple Watch/Fitbit FDA-cleared, high sensitivity but low PPV creates clinical management challenges
  • IBM Watson cardiology recommendations: Unsafe suggestions, never validated in RCTs, withdrawn after oncology failures
  • Autonomous risk stratification without validation: Proprietary algorithms deployed without external validation studies

What Actually Works:

  1. Traditional ECG algorithms: Decades of validation, integrated into clinical practice, accepted as standard of care for basic interpretation
  2. Mayo Clinic AI-ECG for low EF detection: Sensitivity 86.3%, specificity 85.7% for EF ≤35%, published in Nature Medicine (2019)
  3. Automated echocardiography measurements: Reduces variability in EF calculation from ±15% to ±5%, FDA-cleared systems available
  4. Apple Heart Study AFib detection: 84% positive predictive value confirmed by ECG patch in 450,000-patient study (NEJM 2019)

What Doesn’t Work:

  1. IBM Watson for Cardiology: Unsafe treatment recommendations, no RCT validation, multiple institutions reported dangerous suggestions (2013-2018 deployment failures)
  2. Unvalidated HF readmission models: Many proprietary systems with AUC 0.75-0.80 achieve this through demographic proxies (age, comorbidities) rather than novel insights
  3. Wearable PPG-based blood pressure: Most consumer devices show poor correlation with oscillometric measurements, not FDA-cleared for clinical decisions
  4. Autonomous coronary artery stenosis grading: High inter-algorithm variability, poor performance in calcified vessels

Critical Insights:

Validation ≠ clinical utility: An algorithm with AUC 0.85 for HF prediction still generates 75% false positives at clinically useful sensitivity thresholds

Hidden ECG patterns are real: AI can detect conditions (low EF, hyperkalemia, age >65) from ECGs that appear normal to cardiologists. This isn’t hype, it’s validated science

Integration matters more than accuracy: ECG algorithms achieve >95% accuracy for STEMI detection, but alert fatigue and poor EHR integration cause missed diagnoses

Wearables create new clinical dilemmas: Detecting asymptomatic paroxysmal AFib in millions of people raises treatment questions that CHADS-VASc wasn’t designed to answer

Equity gaps are substantial: ECG and echo algorithms trained predominantly on white populations show 5-15% worse performance in Black and Hispanic patients

Proprietary algorithms are black boxes: Most commercial cardiovascular AI tools don’t publish validation studies or share performance metrics stratified by race/ethnicity

Clinical Bottom Line:

Cardiology AI shows tremendous promise, with some applications ready for widespread clinical use (ECG interpretation, automated echocardiography measurements) and others requiring prospective trials before adoption (HF prediction models, wearable integration into treatment algorithms).

Demand prospective validation studies. Ask vendors: “Where is the published RCT showing this algorithm improves patient outcomes?” Most can’t provide one. Until they can, treat AI as hypothesis-generating, not decision-making.

The real implementation challenge isn’t accuracy. It’s workflow integration, alert fatigue, equity, and honest communication about limitations.

Medico-Legal Considerations:

  • Document all AI-assisted cardiovascular decisions in medical record
  • Understand that you remain legally responsible for all clinical decisions, regardless of AI recommendations
  • FDA clearance ≠ clinical validation: 510(k) clearance requires only substantial equivalence to existing devices, not outcome trials
  • Informed consent for experimental AI tools (investigational algorithms not yet FDA-cleared)
  • Malpractice risk exists for both following incorrect AI recommendations AND ignoring correct AI alerts (failure to act on algorithm-detected STEMI)
  • Many cardiovascular AI tools lack published external validation studies. Using them may constitute off-label use
  • Liability for diagnostic delays: If your institution’s ECG AI misses a STEMI due to poor integration, who is liable? (Spoiler: Usually the physician)

Essential Reading:

  • Attia ZI et al. (2019). “Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram.” Nature Medicine 25:70-74. [The Mayo Clinic AI-ECG low EF detection study]

  • Perez MV et al. (2019). “Large-Scale Assessment of a Smartwatch to Identify Atrial Fibrillation.” New England Journal of Medicine 381:1909-1917. [Apple Heart Study with 450,000 participants]

  • Hannun AY et al. (2019). “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.” Nature Medicine 25:65-69. [Stanford deep learning ECG algorithm]

  • Ross EG et al. (2022). “The use of machine learning for the identification of peripheral artery disease and future mortality risk.” Journal of Vascular Surgery 75:1321-1330. [ML for cardiovascular risk prediction]

  • Sengupta PP et al. (2020). “Cognitive Machine-Learning Algorithm for Cardiac Imaging.” Circulation: Cardiovascular Imaging 13:e009357. [Comprehensive review of cardiac imaging AI]


Introduction: Cardiovascular AI’s Promise and Pitfalls

Cardiology generates more structured data than perhaps any other medical specialty. Every heartbeat produces electrical signals. Every cardiac cycle can be imaged with ultrasound, MRI, or CT. Decades of epidemiologic studies have linked cardiovascular biomarkers to outcomes in millions of patients.

This data richness makes cardiology theoretically ideal for AI applications. And indeed, AI has been used in cardiology longer than most physicians realize. Automated ECG interpretation algorithms have been FDA-cleared since the 1990s, analyzing billions of ECGs over three decades.

But the history of cardiovascular AI includes spectacular failures alongside well-validated successes. IBM Watson’s cardiology applications produced unsafe recommendations. Proprietary heart failure prediction models achieve impressive AUC scores while generating 80% false positives. Smartwatch AFib detection creates clinical management dilemmas that evidence-based guidelines don’t address.

This chapter examines what actually works, what has failed, and how to evaluate cardiovascular AI tools critically before clinical adoption.


Part 1: ECG Interpretation AI

The 30-Year History You Didn’t Know About

If you’ve ordered an ECG in the past two decades, AI has already interpreted it. The automated interpretation printed at the top of every ECG (“Sinus rhythm,” “Acute anterior STEMI,” “Left ventricular hypertrophy”) comes from algorithms developed in the 1980s-1990s and refined over millions of ECGs.

These aren’t new AI tools. They’re three-decade-old expert systems and pattern recognition algorithms that have become so ubiquitous that we forget they’re algorithmic at all.

Performance of traditional ECG algorithms: - STEMI detection: Sensitivity 80-90%, specificity 95-98% Willems et al., 2009 - Atrial fibrillation: Sensitivity >95%, specificity >98% - Left ventricular hypertrophy: Sensitivity 60-70%, specificity 85-95%

These algorithms work. They’re validated. Guidelines from the American College of Cardiology support their use. They’ve become standard of care.

But they have limitations: - Sensitivity-specificity tradeoffs: STEMI algorithms optimized for sensitivity (to avoid missing MIs) produce false positives that experienced clinicians routinely override - Population-specific performance: Algorithms trained on predominantly white populations show worse performance in Black patients for LVH detection (Sokolow-Lyon criteria) - Over-reading and under-reading: Automated interpretations sometimes flag “abnormal ECG” for clinically insignificant findings while missing subtle ST changes that experienced cardiologists detect

The clinical lesson: Even 30-year-old, well-validated ECG AI requires physician review. Blindly accepting automated interpretations causes errors.

The New AI: Hidden Patterns in ECGs

Modern deep learning has discovered something remarkable: ECGs contain information about cardiac function that’s invisible to human cardiologists. AI can detect low ejection fraction, hyperkalemia, and even patient age from ECGs that appear completely normal to expert readers.

This isn’t vendor hype. It’s published in Nature Medicine, Lancet, and JACC.

Mayo Clinic AI-ECG for Low Ejection Fraction Detection

The Study: Attia et al. (2019) trained a convolutional neural network on 44,959 ECGs paired with echocardiograms from Mayo Clinic patients. The algorithm learned to identify ECGs associated with left ventricular ejection fraction ≤35%, even when the ECG appeared normal to cardiologists Attia et al., 2019.

Performance: - Sensitivity: 86.3% - Specificity: 85.7% - AUC: 0.93 - Crucially: The algorithm detected low EF in patients whose ECGs showed normal sinus rhythm with no obvious abnormalities

Clinical validation: In a prospective validation cohort of 52,870 patients, the algorithm identified 420 patients with low EF whose echocardiograms would have been delayed without AI screening. Median time to diagnosis: 475 days earlier than standard care.

What the algorithm detects: The AI identifies subtle repolarization patterns, QRS morphology variations, and axis shifts that correlate with reduced systolic function but aren’t perceptible to human readers. We don’t fully understand what the algorithm sees (it’s a black box), but the statistical association is robust across external validation cohorts.

Current clinical use: Mayo Clinic deployed this algorithm in routine practice in 2019. Patients with positive AI screens are referred for echocardiography. The algorithm has now analyzed over 500,000 ECGs, identifying thousands of patients with previously undiagnosed left ventricular dysfunction.

Limitations: - External validation needed: Developed and validated primarily at Mayo Clinic; performance in other populations uncertain - Positive predictive value: At 4% prevalence of low EF, even 85.7% specificity yields many false positives (though echo is low-risk test) - Equity unknown: Race/ethnicity-specific performance not reported in initial publication - Mechanism unclear: Black box algorithm makes clinical interpretation difficult

AI Detection of Hyperkalemia from ECG

Galloway et al. (2019) demonstrated that AI can detect serum potassium >5.5 mEq/L from ECGs with AUC 0.86, potentially identifying hyperkalemia before lab results return Galloway et al., 2019.

Why this matters: Severe hyperkalemia causes sudden cardiac death. Dialysis patients and heart failure patients on ACE inhibitors + spironolactone are at high risk. An ECG-based early warning system could trigger stat potassium checks and prevent fatal arrhythmias.

Why this doesn’t work yet: - No prospective implementation studies: Algorithm hasn’t been deployed in clinical practice to see if it improves outcomes - False positive rate: 86% AUC still means many false alerts, potentially causing alert fatigue - Confounders: Algorithm may detect CKD, heart failure, or medication use rather than hyperkalemia directly

Current status: Research tool, not clinical tool. Needs prospective validation.

Implementation Reality: Why Accurate Algorithms Still Miss STEMIs

ECG algorithms achieve >90% sensitivity for STEMI detection. So why do hospitals still miss STEMIs?

The implementation failures: 1. Alert fatigue: False positive STEMI alerts (especially in patients with old infarcts, LBBB, LVH) cause providers to ignore or delay response to true positives 2. EHR integration problems: STEMI alerts buried in EHR notifications alongside medication warnings and “patient census updated” messages 3. Workflow design failures: ECG interpretation printed on paper that doesn’t trigger emergency response protocols 4. Over-reliance on AI: Providers skip careful ECG review because “the computer would have caught it”

A 2018 study of STEMI detection in 12 U.S. hospitals found: - 23% of STEMIs were missed initially despite ECG algorithm correctly identifying them - Median delay to catheterization lab: 47 minutes longer for algorithm-detected but clinician-missed STEMIs - Cause: Clinicians didn’t see or act on algorithm alerts due to poor integration Khera et al., 2018

The lesson: Implementation > accuracy. A 95% accurate algorithm is worthless if clinicians don’t see or trust its alerts.


Part 2: Cardiac Imaging AI

Echocardiography: Where AI Actually Helps

Echocardiography is operator-dependent, time-consuming, and plagued by inter-observer variability. EF measurements by different sonographers on the same patient can vary by ±15%.

AI-assisted echocardiography addresses these problems:

FDA-cleared automated echo analysis systems: - Caption Health (acquired by GE): Autonomous EF calculation from parasternal and apical views, FDA-cleared 2020 - Ultromics (UK): Automated strain analysis for coronary artery disease detection - Bay Labs Echo IQ: Autonomous EF measurement and view optimization

Performance: - Inter-observer variability reduction: From ±15% to ±5% for EF measurement Omar et al., 2023 - Time savings: 5-10 minutes per study for standard views and measurements - Accuracy: Correlation with expert cardiologist readings r=0.93-0.96

What AI does well in echo: 1. Automated endocardial border detection: Traces LV cavity more consistently than manual tracing 2. View optimization: Guides sonographer to acquire standard views correctly 3. Quantitative measurements: Chamber volumes, wall thickness, valve areas more reproducibly than manual calipers 4. Strain analysis: Automated global longitudinal strain calculation (time-consuming manually)

What AI doesn’t do well yet: - Complex valve pathology: AI struggles with multiple jets, eccentric regurgitation, prosthetic valves - Technically difficult studies: Poor acoustic windows, obesity, COPD (AI can’t compensate for fundamentally inadequate images) - Novel findings: AI detects what it was trained to detect; won’t identify rare pathology

Current clinical use: Growing adoption in community hospitals and primary care clinics, where access to expert echo readers is limited. Academic medical centers use AI for efficiency (automated measurements) but rely on cardiologist over-reads for complex cases.

Equity concerns: Most echo AI systems trained predominantly on white populations. Performance in Black patients (who have higher rates of hypertensive heart disease with different remodeling patterns) not well-studied.

Cardiac MRI and CT: Technical Excellence, Clinical Validation Pending

AI for cardiac MRI and CT shows impressive technical performance but lacks the decades of clinical validation that ECG algorithms have.

Applications: - Automated segmentation: LV/RV/atrial volume calculation from cine MRI - Perfusion defect detection: Stress MRI ischemia analysis - Coronary CT angiography analysis: Stenosis grading, plaque characterization, FFR-CT - Calcium scoring: Automated Agatston score calculation

Performance: Technical accuracy rivals expert readers (correlation r=0.90-0.95), but:

Problems: 1. No outcome studies: Do these algorithms improve patient outcomes? Unknown. 2. Vendor lock-in: Most algorithms proprietary, embedded in scanner software, can’t be independently validated 3. Overdiagnosis risk: Highly sensitive algorithms may detect “abnormalities” of uncertain clinical significance 4. Cost: CT-FFR costs $1,500-2,000 per study; clinical benefit over standard CCTA uncertain

Clinical bottom line: Use AI-assisted cardiac MRI/CT for efficiency (automated measurements save radiologist time), but don’t change clinical management based on AI findings without expert review.


Part 3: Heart Failure Prediction and the 80% False Positive Problem

Heart failure readmission prediction is a classic AI overpromise story.

The pitch: “Our proprietary machine learning algorithm predicts 30-day HF readmission with AUC 0.85! Identify high-risk patients for intensive case management!”

The reality: An AUC of 0.85 sounds impressive. But at 20% HF readmission prevalence, achieving clinically useful sensitivity (e.g., 80% to catch most readmissions) requires accepting 75% false positives.

The math: - Population: 1,000 HF discharges - Actual readmissions: 200 (20% rate) - Algorithm at 80% sensitivity: Detects 160/200 true positives - But also flags 600 false positives (from the 800 who won’t be readmitted) - Result: 760 patients flagged, of whom 160 (21%) actually readmit

Why this matters: Intensive case management costs $500-1,000 per patient. Applying it to 760 patients to prevent 160 readmissions costs $380,000-760,000. Many of those readmissions are unpreventable (sudden cardiac death, acute MI, etc.).

What are these algorithms actually detecting? Angraal et al. (2020) analyzed HF readmission prediction models and found most HF models achieve AUC through demographic proxies: age, CKD, COPD, prior admissions Angraal et al., 2020.

In other words, the algorithm isn’t discovering novel insights. It’s learning that 85-year-old patients with CKD stage 4, COPD, and three prior HF admissions are high-risk. You didn’t need machine learning to know that.

Are any HF prediction models clinically useful? CardioMEMS (implantable pulmonary artery pressure sensor) reduced HF hospitalizations by 37% in RCT Abraham et al., 2016. But this is a device that enables early intervention based on hemodynamic data, not a prediction algorithm based on EHR data.

Clinical bottom line: Be skeptical of HF readmission prediction algorithms. Ask: 1. “What is the false positive rate at the sensitivity threshold you recommend?” 2. “What interventions will we apply to algorithm-flagged patients, and what’s the evidence those interventions prevent readmissions?” 3. “How does this algorithm perform in our specific patient population?” (Most are validated only in development cohort)


Part 4: Wearable Device AI and the Asymptomatic AFib Dilemma

Apple Heart Study: 450,000 Participants, 84% PPV, Massive Clinical Uncertainty

The Apple Heart Study (Perez et al., 2019) was the largest prospective study of wearable AFib detection Perez et al., 2019.

Study design: - 419,297 participants wore Apple Watch with photoplethysmography (PPG)-based irregular pulse detection - When algorithm detected irregular pulse, participant received ECG patch to confirm AFib - Primary outcome: PPV of algorithm (what percentage of alerts were true AFib)

Results: - 2,161 participants (0.52%) received irregular pulse notifications - 450 returned ECG patches - 153 of those patches showed AFib - PPV: 84% (better than expected for screening test)

But here’s the clinical problem: - 84% of 0.52% = 0.44% of participants had confirmed AFib - Most were asymptomatic - Most had paroxysmal AFib (brief episodes) - Clinical question: Should asymptomatic paroxysmal AFib detected by smartwatch be treated with anticoagulation?

CHADS-VASc doesn’t answer this: CHADS-VASc was developed for AFib detected clinically or on ECG, not for asymptomatic device-detected episodes. Stroke risk for smartwatch-detected paroxysmal AFib is uncertain.

Ongoing trials: - HEARTLINE: Apple Watch AFib detection for stroke prevention (results pending) - GUARD-AF: Impact of early detection on outcomes

Current clinical management: No consensus. Some cardiologists anticoagulate all AFib regardless of how detected. Others require symptoms or prolonged episodes (>24 hours). Many patients end up in a clinical gray zone.

The lesson: Technology often runs ahead of evidence. We can detect things we don’t know how to manage.


Part 5: The IBM Watson Cardiology Disaster

What Went Wrong With Watson for Oncology (and Its Cardiology Applications)

IBM Watson for Oncology promised “AI-powered treatment recommendations” based on analysis of medical literature and clinical guidelines. It was deployed in oncology, but IBM also developed Watson applications for cardiology and other specialties.

What happened: - Watson produced unsafe treatment recommendations contradicting evidence-based guidelines Ross and Swetlitz, 2018 - Recommended chemotherapy for patients unlikely to benefit - Suggested medications with dangerous drug interactions - Never validated in randomized controlled trials - Never published peer-reviewed evidence of clinical benefit

Why did hospitals buy it? Aggressive marketing, partnerships with major academic centers (Memorial Sloan Kettering), and promises of “AI-assisted decision-making” that sounded impressive to hospital executives.

Why did it fail? 1. Training data problem: Watson was trained on expert preferences (what MSK oncologists recommended) not evidence (what RCTs showed worked) 2. No clinical validation: Deployed without prospective trials showing benefit 3. Black box: Physicians couldn’t understand why Watson made recommendations, eroding trust 4. Overpromising: Marketed as “thinking like a doctor” when it was really “echoing MSK treatment patterns”

Watson cardiology applications: IBM developed Watson-based tools for: - HF treatment optimization - Cardiovascular risk prediction - Medication management in complex cardiac patients

None were validated in prospective trials. All were withdrawn by 2019 after oncology failures.

The lessons for cardiovascular AI: 1. Demand RCT evidence: If a vendor can’t show published outcomes studies, don’t deploy their tool 2. Beware proprietary algorithms: Black boxes hide methodological flaws 3. Marketing ≠ evidence: Partnerships with prestigious institutions don’t prove clinical benefit 4. Physician judgment remains essential: No AI should make autonomous treatment recommendations


Part 6: Equity in Cardiovascular AI

The Pulse Oximetry Problem Extends to ECG and Echo

In 2020, researchers discovered that pulse oximeters systematically overestimate oxygen saturation in Black patients, leading to delayed treatment for hypoxemia Sjoding et al., 2020.

Similar equity problems exist in cardiovascular AI:

ECG algorithms: - Sokolow-Lyon criteria for LVH (embedded in automated ECG algorithms) have lower sensitivity in Black patients - QTc prolongation thresholds don’t account for race-specific differences in baseline QT intervals - Most ECG AI trained on predominantly white populations from tertiary care centers

Echo AI: - Black patients have different LV remodeling patterns (more concentric hypertrophy vs. eccentric dilatation) - AI trained on predominantly white populations may miscategorize Black patients’ LV geometry - Race-specific performance metrics rarely reported in FDA submissions

Smartwatch AFib detection: - PPG accuracy varies with skin tone (melanin affects light absorption) - Apple Heart Study: 88% white participants, performance in other populations uncertain

What can you do? 1. Ask vendors for race-stratified performance metrics: If they can’t provide them, the algorithm wasn’t validated equitably 2. Validate locally: Test algorithm performance in your patient population before widespread deployment 3. Monitor outcomes by race: Track algorithm errors, false positives, false negatives by race/ethnicity 4. Maintain clinical skepticism: AI is a tool, not truth. Your clinical judgment remains essential.


Part 7: Implementation Framework

Before Adopting Cardiovascular AI Tools

Questions to ask vendors:

  1. “Where is the peer-reviewed publication showing this algorithm improves patient outcomes?”
    • If answer is “we have internal validation data,” that’s insufficient
    • Demand New England Journal of Medicine, JAMA, Circulation, Lancet publications
  2. “What is the algorithm’s performance stratified by race, age, and sex?”
    • If vendor doesn’t have this data, the algorithm wasn’t validated equitably
    • Don’t accept overall performance metrics
  3. “What is the false positive rate at your recommended sensitivity threshold?”
    • AUC alone is meaningless
    • Need to understand PPV/NPV at clinically relevant operating points
  4. “How does the algorithm integrate with our EHR and clinical workflow?”
    • Poor integration causes alert fatigue and missed diagnoses
    • Demand live demonstrations in your specific EHR environment
  5. “What happens when the algorithm fails? What are the failure modes?”
    • All algorithms fail sometimes
    • You need to understand when and how to recognize failures
  6. “Can we validate this algorithm on our patient population before deployment?”
    • Local validation is essential
    • Algorithm trained at Mayo Clinic may not work at your community hospital
  7. “What is the cost per analysis, and what’s the evidence of cost-effectiveness?”
    • CT-FFR costs $1,500-2,000 per study
    • Has it been shown to improve outcomes or reduce costs compared to standard care?
  8. “Who is liable if the algorithm produces an incorrect result that harms a patient?”
    • Most vendor contracts disclaim liability
    • Liability falls on the ordering physician
  9. “Can you provide references from cardiologists at hospitals similar to ours who use this tool?”
    • Talk to actual users, not marketing testimonials
    • Ask about problems, workflow disruptions, false positives
  10. “Is the algorithm FDA-cleared? If yes, through what pathway?”
    • 510(k) clearance ≠ clinical validation
    • 510(k) requires only “substantial equivalence” to existing device

Red Flags (Walk Away If You See These)

  1. Vendor refuses to share peer-reviewed publications (“Our algorithm is proprietary”)
  2. No external validation studies (validated only on development cohort)
  3. Performance metrics not stratified by demographics (equity not assessed)
  4. Black box with no explainability (can’t understand why algorithm made recommendation)
  5. Vendor claims algorithm is “better than cardiologists” without RCT evidence

Part 8: Cost-Benefit Reality

What Does Cardiovascular AI Actually Cost?

ECG AI: - Most automated ECG interpretation: Included in ECG machine purchase (no marginal cost) - Mayo Clinic AI-ECG for low EF screening: Not commercially available yet (research tool)

Echo AI: - Caption Health: ~$1,000/month subscription + per-study fees - Ultromics: ~$50-100 per study - Value proposition: Saves sonographer/cardiologist time, reduces variability

Cardiac MRI/CT AI: - CT-FFR (HeartFlow): $1,500-2,000 per study - Automated MRI segmentation: Bundled into scanner software

Wearable AFib detection: - Apple Watch: $400-800 (consumer device, not medical expense) - ECG patch for confirmation: $150-300

HF prediction algorithms: - Most proprietary, bundled into population health contracts - Difficult to assess cost-effectiveness without published studies

Do These Tools Save Money?

Theoretically yes: - Earlier detection of low EF → initiate GDMT → prevent HF progression → fewer hospitalizations - Automated echo measurements → save cardiologist time → increase throughput

In practice: Uncertain: - No published cost-effectiveness analyses for most cardiovascular AI tools - Mayo Clinic AI-ECG: No data on whether earlier EF detection reduces downstream costs - CT-FFR: Cost-effectiveness vs. invasive FFR shown, but vs. standard care unclear Hlatky et al., 2015

The implementation cost no one talks about: - IT integration: $50,000-200,000 depending on complexity - Workflow redesign: Cardiologist/administrator time - Training: Sonographer/tech/physician education - Maintenance: Software updates, troubleshooting - Alert management: Triaging false positives

A realistic scenario: - Hospital purchases echo AI for $50,000/year - Saves 10 minutes per study × 5,000 studies/year = 833 hours - At $200/hour cardiologist cost = $166,600 saved - Assuming automation doesn’t reduce quality or increase errors

The question: Are those assumptions valid? We don’t have good data.


Part 9: The Future of Cardiovascular AI

What’s Coming in the Next 5 Years

Likely to reach clinical use: 1. Expanded ECG AI applications: Detection of pulmonary hypertension, aortic stenosis, HCM from ECG 2. Wearable integration with EHR: Smartwatch data flowing into medical records with clinical decision support 3. Automated echo AI in primary care: Point-of-care echo by non-cardiologists with AI guidance 4. Predictive models for sudden cardiac death: Risk stratification for ICD placement beyond EF alone

Promising but uncertain: 1. AI-guided medication optimization: Automated titration of GDMT in HF 2. Real-time procedural guidance: AI-assisted PCI, ablation, structural interventions 3. Precision medicine for CAD: Genetic + imaging + clinical data to personalize revascularization decisions

Overhyped and unlikely: 1. Autonomous cardiovascular diagnosis: AI replacing cardiologist clinical judgment 2. Smartwatch-only AFib management: Anticoagulation decisions without ECG confirmation

The rate-limiting step: Not algorithmic accuracy. Prospective randomized trials showing improved outcomes.

Most cardiovascular AI has impressive technical performance. What we lack is evidence that deploying these tools actually helps patients live longer or better.


Professional Society Guidelines on AI in Cardiology

ACC/AHA on AI in Cardiovascular Medicine

The 2023 ESC and 2025 ACC/AHA guidelines have not yet provided specific recommendations for clinical use of artificial intelligence, highlighting a significant evidence gap. However, recent guidelines acknowledge AI’s emerging role:

2024 ACC/AHA Perioperative Guidelines: “Incorporation of artificial intelligence and machine-learning may improve risk assessment, but future studies are needed to evaluate risk-reduction strategies.”

Key Observations from Recent Guidelines: - Prospective RCTs needed to confirm AI’s efficacy and cost-effectiveness - Deep learning algorithms show promise in outperforming clinicians and conventional software for ECG diagnosis - AI enables accurate quantitative and qualitative plaque evaluation with coronary CT angiography and OCT - Multicenter validation and standardization essential before guideline integration

AHA Scientific Sessions AI Highlights (2024)

At AHA 2024, AI in cardiology featured prominently:

AI-ECHO and PanEcho Studies: - Machine learning algorithms trained on millions of echocardiographic images - Promise for automating and improving diagnostic accuracy - Potential to streamline imaging for large patient populations

Endorsed Risk Calculators

The ACC/AHA endorse several validated risk calculators that incorporate statistical modeling:

  • ASCVD Risk Estimator Plus: 10-year and lifetime cardiovascular risk
  • Pooled Cohort Equations: Primary prevention statin therapy decisions
  • CHA2DS2-VASc: Stroke risk in atrial fibrillation
  • HAS-BLED: Bleeding risk with anticoagulation

These represent validated, guideline-integrated predictive tools that precede modern AI but establish the framework for algorithmic clinical decision support.

European Society of Cardiology (ESC)

ESC has engaged with AI through:

  • Digital Health Committee guidance on AI validation
  • Position papers on wearable device data integration
  • Framework for evaluating AI-enhanced diagnostic tools

Implementation Principle: ESC emphasizes that AI tools must demonstrate clinical utility beyond improved accuracy metrics, including impact on patient outcomes, workflow efficiency, and cost-effectiveness.

Heart Rhythm Society (HRS)

HRS has addressed AI in the context of:

  • Automated ECG interpretation algorithms
  • Wearable device-detected arrhythmias
  • AI-assisted electrophysiology mapping

Clinical Guidance: HRS notes that AI-detected arrhythmias from consumer devices require clinical confirmation and that management pathways for AI-flagged findings remain under development.


Key Takeaways

10 Principles for Cardiovascular AI

  1. ECG algorithms work, but require physician review: Decades of validation, but false positives and equity gaps exist

  2. Hidden ECG patterns are real, not hype: Mayo Clinic AI-ECG detecting low EF from normal-appearing ECGs is validated science

  3. Echo AI reduces variability: Automated measurements more reproducible than manual, but can’t replace expert interpretation for complex cases

  4. Wearable AFib detection creates clinical dilemmas: High sensitivity, but asymptomatic paroxysmal AFib management uncertain

  5. HF prediction models have 75-80% false positive rates: AUC 0.85 sounds great until you calculate PPV

  6. Demand prospective outcome trials: Technical accuracy ≠ clinical benefit

  7. Equity gaps are substantial: Most algorithms trained on predominantly white populations

  8. Implementation > accuracy: Poor EHR integration causes missed diagnoses despite accurate algorithms

  9. IBM Watson failed. Learn from it: No RCT evidence = don’t deploy, no matter how prestigious the vendor

  10. You remain responsible: AI assists, but all clinical decisions and their consequences are yours


Clinical Scenario: Vendor Evaluation

Scenario: Your Cardiology Department Is Considering Purchasing an AI Tool

The pitch: A vendor demonstrates an AI tool that predicts 30-day cardiovascular mortality risk for hospitalized cardiology patients. They show you: - AUC 0.92 in internal validation - “Outperforms traditional risk scores” - Integration with your EHR - Cost: $150,000/year

The department chair asks for your recommendation.

Questions to Ask Before Recommending Purchase:

  1. “What peer-reviewed publications support this algorithm?”
    • Look for Circulation, JACC, JAMA Cardiology publications
    • Internal validation white papers are insufficient
  2. “What is the positive predictive value at clinically useful sensitivity thresholds?”
    • If 30-day mortality is 3%, even 92% AUC may yield terrible PPV
    • Ask for sensitivity/specificity table at multiple thresholds
  3. “How does this algorithm perform in patient populations similar to ours?”
    • Algorithm validated at academic medical center may fail at community hospital
    • Request performance stratified by age, race, sex, comorbidities
  4. “What interventions will we apply to high-risk patients identified by this algorithm?”
    • If answer is “closer monitoring,” what’s the evidence that prevents deaths?
    • Many high-risk patients die despite optimal care
  5. “What are this algorithm’s failure modes?”
    • Does it underestimate risk in young patients? Overestimate in elderly?
    • What clinical situations does it handle poorly?
  6. “Can we pilot this on 500 patients before committing to $150,000/year?”
    • Local validation essential
    • Compare algorithm predictions to actual outcomes in your population
  7. “Who is liable if a patient predicted low-risk by the algorithm dies unexpectedly?”
    • Read the vendor contract carefully
    • Most disclaim all liability
  8. “What is the cost-effectiveness compared to existing risk stratification?”
    • How many lives saved per $150,000 spent?
    • Any published cost-effectiveness analyses?
  9. “How will this integrate with nursing workflow? Who triages the high-risk alerts?”
    • Implementation costs often exceed purchase price
    • Alert fatigue is real
  10. “Can I speak with cardiologists at 3 other hospitals who use this tool?”
    • Get real user experiences, not marketing testimonials

Red Flags in This Scenario:

AUC 0.92 reported without PPV/NPV: Useless without knowing false positive rate

“Outperforms traditional risk scores”: Were comparisons done in same patient cohort? Published?

No mention of prospective validation: If algorithm hasn’t been tested prospectively, it’s experimental

High annual cost without cost-effectiveness data: $150K/year is substantial; where’s the ROI evidence?

Vendor can’t explain what the algorithm learned: Black box = red flag


Check Your Understanding

Scenario 1: The AI-Detected Low EF

Clinical situation: A 58-year-old woman with hypertension presents to primary care for annual physical. ECG ordered as part of routine screening shows normal sinus rhythm, normal intervals, no ST/T changes. However, the ECG machine’s AI algorithm flags: “Low ejection fraction predicted. Recommend echocardiogram.”

Patient is asymptomatic. No dyspnea, no edema, no chest pain. Physical exam normal. You’ve never seen this AI alert before.

Question 1: Do you order the echocardiogram based on this AI prediction?

Click to reveal answer

Answer: Yes, order the echocardiogram.

Reasoning: The Mayo Clinic AI-ECG for low EF detection has been prospectively validated and published in Nature Medicine. The algorithm has 86.3% sensitivity and 85.7% specificity for detecting EF ≤35% Attia et al., 2019.

Key points: - This is validated technology, not experimental AI - Echocardiogram is low-risk test with potential high yield (early detection of reduced EF enables initiation of GDMT) - The ECG appearing normal doesn’t invalidate the algorithm. The whole point is that AI detects hidden patterns that cardiologists can’t see - Many patients with asymptomatic reduced EF benefit from early ACE inhibitor/beta-blocker therapy

However: - Counsel patient that this is a screening test and may be false positive - Explain that AI detected subtle ECG patterns suggesting possible heart dysfunction - Don’t alarm patient unnecessarily before echo confirms

If echo confirms reduced EF: Initiate GDMT (ACE-I, beta-blocker, consider SGLT2i)

If echo normal: Reassure patient, document that AI alert was false positive

Bottom line: AI-ECG low EF screening has sufficient validation to act on, especially in low-risk test like echo.


Scenario 2: The Apple Watch AFib Alert

Clinical situation: A 52-year-old man with CHADS-VASc score of 1 (age ≥50) presents with his Apple Watch showing irregular pulse notifications. He received 3 alerts over past week, all while asymptomatic. No palpitations, no dyspnea, no dizziness.

You order ECG in office: Normal sinus rhythm. You order 24-hour Holter: Shows 2-hour episode of atrial fibrillation at 3 AM (patient asleep, asymptomatic).

Question 2: Do you start anticoagulation for asymptomatic, device-detected paroxysmal AFib?

Click to reveal answer

Answer: Unclear. This is a genuine clinical gray zone.

Arguments FOR anticoagulation: - CHADS-VASc ≥1 in male patients generally indicates anticoagulation benefit - AFib is AFib regardless of how detected; stroke mechanism (atrial stasis → thrombus → embolism) doesn’t require symptoms - Subclinical AFib detected by pacemakers has been associated with increased stroke risk (though episodes were typically >24 hours) - Apple Heart Study showed 84% PPV for AFib detection. This is real AFib, not artifact

Arguments AGAINST anticoagulation: - CHADS-VASc was derived from symptomatic AFib populations; applicability to device-detected asymptomatic AFib uncertain - Paroxysmal AFib (2-hour episodes) may carry lower stroke risk than persistent AFib - Bleeding risk with anticoagulation (1-2% major bleeding/year) may outweigh benefit in very low-risk patient - No RCT evidence that treating device-detected AFib reduces stroke risk

Ongoing trials: - HEARTLINE: Apple Watch AFib detection for stroke prevention (results pending) - GUARD-AF: Impact of early detection and treatment

Current practice: - Reasonable approach 1: Anticoagulate based on CHADS-VASc ≥1 (guideline-concordant) - Reasonable approach 2: Extended monitoring (30-day patch) to assess AFib burden, anticoagulate if >6-24 hours/day (expert opinion threshold) - Reasonable approach 3: Shared decision-making with patient about uncertain benefit

What I would do: Discuss with patient: - “You have real AFib, detected by your watch and confirmed on Holter monitor” - “The stroke risk is uncertain because you’re asymptomatic and episodes are brief” - “Standard guidelines would recommend blood thinners for CHADS-VASc ≥1” - “But those guidelines weren’t designed for smartwatch-detected AFib” - “We’re waiting for research studies to clarify this, but they’re not done yet” - “Options: Start apixaban now, or extend monitoring to see how much AFib you’re having”

Bottom line: This is cutting-edge medicine where technology has outpaced evidence. Either approach (anticoagulate or monitor) is defensible. Document your reasoning carefully.


Scenario 3: The Proprietary HF Readmission Model

Clinical situation: Your hospital’s population health team purchased a proprietary AI tool that predicts 30-day HF readmission risk. The vendor claims AUC 0.87. The tool flags 35% of HF discharges as “high-risk.”

Your case management team asks: Should we apply intensive post-discharge interventions (home visits, daily phone calls, nurse case management) to all algorithm-flagged patients?

Cost of intensive intervention: $800 per patient. Your hospital discharges 400 HF patients/year.

Question 3: Do you implement the algorithm-driven intervention program?

Click to reveal answer

Answer: No, not without further analysis.

Problems with this scenario:

1. False positive rate is likely 70-80%: - Baseline HF readmission rate: ~20% - Algorithm flags 35% of patients (140 of 400 discharges) - If AUC 0.87 with sensitivity 80%, it will detect ~64 of the 80 actual readmissions (true positives) - But also flag ~76 patients who won’t readmit (false positives) - Only 64/140 = 46% of flagged patients will actually readmit

2. Cost-effectiveness is questionable: - 140 patients × $800 = $112,000 annual cost - To break even, need to prevent readmissions that cost >$112,000 - Average HF readmission cost: ~$10,000 - Need to prevent >11 readmissions (14% of actual readmissions) - Many HF readmissions are unpreventable (sudden cardiac death, acute MI, progression despite optimal therapy)

3. Intervention evidence is weak: - What’s the evidence that home visits + phone calls prevent HF readmissions? - Some studies show benefit, others don’t - Even in positive studies, NNT is typically 20-30 patients to prevent 1 readmission

4. Algorithm transparency is absent: - What features is the algorithm using? - If it’s primarily age + comorbidities (which most HF models are), you could achieve similar performance with simpler rule: “Flag all patients >80 with CKD and COPD” - Paying for proprietary algorithm to learn what you already know is wasteful

What to do instead:

  1. Request vendor provide:
    • Peer-reviewed publication of algorithm validation
    • Performance stratified by demographics
    • PPV/NPV at multiple sensitivity thresholds
    • Feature importance (what is the algorithm learning?)
  2. Pilot study:
    • Apply algorithm to 100 consecutive HF discharges
    • Track: How many flagged? How many actually readmit?
    • Calculate: PPV in your population (may differ from vendor’s validation)
  3. Evaluate intervention evidence:
    • Systematic review of transitional care interventions for HF
    • What actually works? (Hint: Early post-discharge cardiology follow-up, medication reconciliation, patient education)
  4. Consider simpler approach:
    • Apply intensive interventions to all HF discharges (not algorithm-selected subset)
    • If intervention costs $800 and prevents even 5% of readmissions, it’s cost-effective for entire population
    • Simpler than algorithmic triage

Bottom line: Proprietary algorithms with impressive AUCs often provide minimal value over clinical judgment. Demand evidence of clinical benefit and cost-effectiveness before implementation.

The algorithm isn’t necessarily wrong. It’s just not clear it adds value over existing approaches.


References