Internal Medicine and Hospital Medicine

Internal medicine AI confronts messy reality. Hospital patients have five or more comorbidities, ten medications, fragmented care across rotating teams, and time-pressured workflows interrupted by constant alerts. Unlike radiology’s standardized images or pathology’s discrete specimens, hospital medicine generates heterogeneous data under constraints that make AI deployment particularly challenging. This chapter examines which AI tools actually work in real hospital environments.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI-powered early warning systems for clinical deterioration
  • Critically assess readmission prediction models and their limitations
  • Understand AI applications in chronic disease management
  • Navigate EHR-integrated clinical decision support systems
  • Recognize failure modes specific to hospital AI implementations
  • Apply evidence-based frameworks for selecting hospital AI tools

The Clinical Context:

Internal medicine and hospital medicine face unique AI challenges: complex multimorbid patients, fragmented care across multiple teams, time-pressured decision-making, and workflow constraints that make implementation far more difficult than in procedure-based specialties.

Key Applications:

  • Hospital early warning systems: Predict deterioration, cardiac arrest, sepsis (Epic Deterioration Index, WAVE Clinical Platform)
  • Readmission risk prediction: Identify high-risk patients for enhanced discharge planning (HOSPITAL score enhanced, LACE index AI versions)
  • Sepsis prediction: Controversial; Epic sepsis model had major implementation issues
  • Chronic disease management: Diabetes, heart failure, COPD monitoring with AI-enhanced alerts
  • Medication safety: Drug-drug interactions, deprescribing recommendations, personalized dosing
  • Autonomous treatment recommendations: Mostly experimental; liability and safety concerns

What Actually Works:

  1. Epic Deterioration Index (EDI): Predicts clinical deterioration 6-12 hours before traditional measures, but with high false positive rates (70-80%)
  2. Readmission models: Good prediction (C-statistic 0.65-0.75), but interventions to reduce readmissions remain elusive
  3. Heart failure remote monitoring: AI analysis of implantable device data reduces hospitalizations by 30-40% in selected patients

What Doesn’t Work (Yet):

  1. Epic Sepsis Model: Deployed widely, then found to have only 33% sensitivity (missed 67% of sepsis cases) with 12% PPV (88% false alerts) Wong et al., 2021
  2. Autonomous discharge planning: Too many uncontrolled variables; requires physician oversight
  3. AI-automated consult recommendations: Workflow disruption outweighs benefits

Critical Insights:

Alert fatigue is the primary failure mode. Hospital systems generate 100+ alerts per patient per day; AI adds to this burden without solving it.

External validation often fails. Hospital AI trained at one institution performs poorly at others due to EHR differences, patient populations, and workflows.

Implementation > accuracy. A 75% accurate model that integrates smoothly beats a 90% model that disrupts workflow.

Liability remains with physicians. AI recommendations don’t change malpractice responsibility.

Clinical Bottom Line:

Hospital AI shows promise but remains immature compared to radiology or pathology AI. Most applications require active physician oversight. Start with well-validated deterioration prediction and readmission models. Demand prospective local validation before deployment. Monitor continuously for alert fatigue and model drift. Do not deploy autonomous treatment recommendations without extensive pilot testing and clear clinical ownership.

Medico-Legal Considerations:

  • Document all AI-assisted decisions in clinical notes
  • Understand your institution’s AI liability policy
  • Know FDA clearance status of deployed tools
  • Maintain clinical override capability for all AI recommendations
  • Track alert fatigue metrics to prevent desensitization

Essential Reading:

Introduction: The Complexity Challenge

Hospital medicine operates under unique constraints that make AI integration particularly challenging:

The perfect storm for AI failure: 1. High complexity patients: Average hospitalized patient has 5+ comorbidities, 10+ medications, multiple consultants 2. Fragmented care: Rotating attendings, cross-covering residents, multiple shifts, discontinuity across transitions 3. Time pressure: 12-20 patients per hospitalist, interrupted workflows, limited time for AI model interrogation 4. EHR workflow constraints: AI alerts compete with 100+ other daily alerts 5. Heterogeneous data: Vital signs every 4 hours, labs sporadic, clinical notes unstructured

These constraints explain why hospital AI deployment has been slower and more problematic than in specialties with standardized, high-volume, single-modal data (radiology, pathology, dermatology).

This chapter focuses on what actually works in real hospital environments, not what works in retrospective datasets.

Part 1: Patient Deterioration Prediction

The Clinical Problem

Hospitalized patients deteriorate gradually before cardiac arrest, respiratory failure, or septic shock. Traditional early warning scores (Modified Early Warning Score [MEWS], National Early Warning Score [NEWS]) use threshold-based rules that miss early subtle changes.

Could AI detect deterioration earlier and more accurately?

Epic Deterioration Index (EDI)

What it is: Machine learning model embedded in Epic EHR that continuously calculates deterioration risk using: - Vital signs (heart rate, BP, respiratory rate, temperature, SpO2) - Lab values - Medications administered - Demographics and comorbidities - Prior risk scores

How it’s deployed: - Risk score 0-100 displayed in Epic flowsheet - Updates every 15 minutes with new data - Thresholds trigger alerts to RRT (Rapid Response Team) - Used at 150+ hospitals in Epic networks

Evidence:

Successes: - Retrospective studies show high discrimination (C-statistic 0.76-0.82) for predicting cardiac arrest, ICU transfer, or death within 24 hours (Singh et al., 2018) - Detects deterioration 6-12 hours earlier than traditional NEWS scores - Implementation at University of Michigan showed 35% reduction in cardiac arrests outside ICU (Green et al., 2019)

Limitations: - High false positive rate (70-80%): For every true deterioration, 3-4 false alerts - Alert fatigue leads to desensitization; nurses/residents begin ignoring alerts - External validation shows performance drops significantly at different hospitals - Requires active response protocol (RRT activation) to be effective - No proven mortality benefit in RCTs: process improvement without outcome improvement

The Epic Sepsis Model Disaster:

A cautionary tale for hospital AI deployment. Epic’s sepsis prediction model was: - Trained on retrospective data from hundreds of hospitals - Deployed widely across Epic networks (2016-2020) - Used to trigger sepsis bundles (fluids, antibiotics, lactate measurement)

What went wrong Wong et al., 2021: 1. Terrible sensitivity (33%): Missed 67% of actual sepsis cases 2. Overwhelming false positives: 88% of alerts were false 3. Alert fatigue: Clinicians stopped responding to alerts 4. Delayed care: Some institutions relied on model instead of clinical judgment 5. Legal liability: Patients with missed sepsis, families sued

Why it failed: - Model optimized for specificity (reducing false positives) at expense of sensitivity - Sepsis definition varies across institutions (“Sepsis-2” vs “Sepsis-3” criteria) - Training data quality issues (mislabeled cases, selection bias) - Implemented without prospective validation at each site - No feedback loop for model updating

Lessons learned: - Require prospective validation before deployment - Monitor real-world performance continuously - Have clinical override capability - Track alert fatigue metrics - Don’t deploy widely without site-specific testing

Alternative Early Warning Systems

WAVE Clinical Platform: - Continuous monitoring of vital signs + waveform data (ECG, plethysmography) - Predicts deterioration 6+ hours in advance - FDA-cleared Class II medical device - Used in some academic medical centers - Better specificity than EDI but requires continuous monitoring hardware

Rothman Index: - Proprietary algorithm from PeraHealth - Uses nursing assessments + labs + vitals - Trend-based rather than threshold-based - Some evidence for predicting deterioration Rothman et al., 2013 - Integration challenges with non-Epic EHRs

Implementation Framework for Deterioration Prediction

Before deploying:

  1. Prospective validation at your institution
    • Run model silently for 3-6 months
    • Compare predictions to actual outcomes
    • Calculate sensitivity, specificity, PPV, NPV for your patient population
    • Determine appropriate alert thresholds
  2. Establish clear response protocols
    • Who responds to alerts? (RRT, primary team, nurse manager)
    • What triggers immediate response vs. enhanced monitoring?
    • How to document AI-triggered evaluations?
    • Feedback mechanism when alerts are false positives
  3. Monitor alert fatigue
    • Track alert frequency per nurse shift
    • Measure time from alert to response
    • Survey frontline staff quarterly
    • Adjust thresholds if >60% are false positives
  4. Plan for model updating
    • How often will model be retrained?
    • Who monitors for model drift?
    • Process for incorporating local data

Red flags: - Vendor won’t provide local validation data - Can’t adjust alert thresholds for your population - No clear workflow integration plan - Can’t turn off model if performance deteriorates

Part 2: Readmission Risk Prediction

The Clinical Problem

30-day hospital readmissions cost Medicare $17 billion annually. CMS penalizes hospitals with high readmission rates. Targeting high-risk patients for enhanced discharge planning and follow-up theoretically reduces readmissions.

Traditional approach: LACE index (Length of stay, Acuity, Comorbidities, Emergency visits) or HOSPITAL score

AI enhancement: Add hundreds of variables from EHR to improve prediction accuracy

Evidence for AI-Enhanced Readmission Prediction

Major studies:

Rajkomar et al. (2018), Google/UCSF/Stanford/Chicago collaboration Rajkomar et al., 2018: - Deep learning on EHR data from 216,000 hospitalizations - Predicted readmissions with C-statistic 0.75-0.76 - Used 100,000+ variables (vs. 10 in traditional scores) - Better than traditional models but only modestly (0.75 vs. 0.70)

Key findings: - Modest accuracy improvement over simple models - But no evidence improved predictions lead to fewer readmissions - Model interpretability poor, making it hard to know why someone is high-risk - External validation showed performance drop at different institutions

The intervention problem:

Even perfect prediction doesn’t reduce readmissions if we lack effective interventions. Meta-analyses show: - Care transitions programs: Small benefit (2-3% absolute reduction) - Post-discharge phone calls: No consistent benefit - Medication reconciliation: Prevents adverse drug events but not readmissions - Home visits: Expensive, modest benefit in heart failure only

Bottom line: AI predicts readmissions reasonably well, but we still don’t know how to prevent them effectively.

Practical Use of Readmission Models

What works: 1. Risk stratification for care transitions programs - Target highest 10% risk patients for intensive discharge planning - Assign to care transition nurses - Ensure 7-day follow-up appointment

  1. Identifying modifiable risk factors
    • Polypharmacy (>10 medications)
    • Lack of primary care
    • Uncontrolled symptoms at discharge
    • Poor health literacy
  2. Documentation for value-based care
    • Support risk-adjusted quality metrics
    • Identify patients for bundled payment programs

What doesn’t work: - Predicting readmissions to avoid admitting patients (unethical, illegal in many cases) - Automatic discharge delays for high-risk patients (no evidence of benefit) - Generic “readmission reduction interventions” without individualization

Part 3: Chronic Disease Management

Diabetes Management AI

Continuous glucose monitoring (CGM) + AI:

Closed-loop insulin systems (“artificial pancreas”): - Medtronic 670G, Tandem Control-IQ, Omnipod 5 - FDA-cleared hybrid closed-loop systems - AI algorithms adjust basal insulin based on CGM data - Evidence: Improve time-in-range by 10-15%, reduce hypoglycemia (Brown et al., 2019)

For hospitalized patients: - AI-enhanced insulin dosing protocols - Predicts hypoglycemia from CGM trends - Some evidence for reducing severe hypoglycemia in ICU (Chase et al., 2018) - Not yet widely deployed: most hospitals use traditional sliding scale or protocol-driven dosing

Outpatient diabetes AI: - Pattern recognition in CGM data (meal response, exercise, sleep impact) - Predictive alerts for hypoglycemia (Dexcom G6 “Urgent Low Soon”) - Insulin dose titration recommendations - Evidence: Modest HbA1c improvements (0.3-0.5% vs. standard CGM) (Weisman et al., 2017)

Heart Failure Remote Monitoring

Implantable device data + AI:

CardioMEMS system (Abbott): - Implanted pulmonary artery pressure sensor - Daily measurements transmitted wirelessly - AI algorithms detect early decompensation (pressure trends) - Alerts clinicians before symptoms develop

Evidence from CHAMPION trial Abraham et al., 2011: - 37% reduction in heart failure hospitalizations - Benefit sustained over 5+ years - FDA-approved, covered by CMS - Cost-effective (~$20,000 device vs. $40,000 per HF hospitalization)

Other remote monitoring: - Weight + symptoms apps (mixed evidence, many abandoned by patients) - Wearable sensors (Apple Watch, Fitbit): investigational - EKG patches: promising for arrhythmia detection, unclear for HF

Why CardioMEMS works but home monitoring often doesn’t: - Objective physiologic data (PA pressure) vs. subjective symptoms - No patient adherence required (automatic transmission) - Clear clinical action (adjust diuretics) vs. vague “see doctor” - Proven RCT evidence before widespread deployment

COPD Exacerbation Prediction

Approaches: - Daily symptom questionnaires + AI pattern recognition - Spirometry + machine learning - Wearable sensors (activity, respiratory rate, oxygen saturation)

Evidence: - Most studies small, single-center, retrospective - Prediction accuracy C-statistic 0.65-0.75 (modest) - High false positive rates: every cold triggers “impending exacerbation” alert - No RCT evidence that prediction prevents exacerbations or hospitalizations

Why COPD AI lags behind HF: - Exacerbations more heterogeneous (infectious vs. non-infectious, cardiac vs. pulmonary) - Patient adherence to monitoring is poor - No clear “rescue intervention” like diuretic adjustment in HF - Many exacerbations resolve spontaneously without intervention

Part 4: Medication Safety and Management

AI-Enhanced Drug-Drug Interaction (DDI) Checking

The problem with traditional DDI alerts: - 90-95% override rate due to excessive, irrelevant alerts Phansalkar et al., 2010 - Alert fatigue leads to dangerous overrides (missing critical interactions) - Rule-based systems don’t consider clinical context

AI enhancements: - Machine learning predicts which DDI alerts are clinically significant - Context-aware filtering (considers dose, duration, patient factors) - Phenotype-based risk stratification - Natural language processing of clinical notes to identify relevant contraindications

Evidence: - Some reduction in alert burden (30-50%) while maintaining safety (Jung et al., 2021) - No RCT evidence yet for improved patient outcomes - Implementation challenges with legacy EHR systems

Deprescribing Recommendations

AI to identify inappropriate polypharmacy: - Screen for Beers Criteria medications in elderly - Identify duplicate therapies - Detect medications without clear indication - Suggest deprescribing based on life expectancy, goals of care

Example tools: - MedSafer (Canada): deprescribing decision support - TRIM (Tool to Reduce Inappropriate Medication): ML-based - Epic-embedded alerts for high-risk medications

Evidence: - Reduces inappropriate prescriptions by 15-25% when paired with pharmacist review (Farrell et al., 2021) - No consistent mortality benefit - Patient/family education critical for acceptance

Personalized Dosing

Pharmacokinetic/pharmacodynamic (PK/PD) modeling + AI:

Promising areas: 1. Vancomycin dosing: AI predicts trough levels, adjusts dosing for renal function 2. Warfarin dosing: Integrates genetic variants (VKORC1, CYP2C9) + clinical factors 3. Chemotherapy: BSA-based dosing vs. AI-optimized dosing for toxicity reduction

Reality check: - Therapeutic drug monitoring still requires clinical judgment - Most “AI dosing” is just better PK models, not true machine learning - Genetic testing availability limits personalized dosing adoption - Cost-benefit unclear for most medications

Part 5: Diagnostic Decision Support

Differential Diagnosis Generation

Traditional tools: - Isabel DDx: symptom + finding input → differential diagnosis list - DXplain (MGH): Bayesian inference - UpToDate “Clinical Decision Support”

AI-enhanced tools: - Google Health studies on differential diagnosis from clinical vignettes (not publicly available) - Babylon Health symptom checker (UK-based, telemedicine) - Ada Health (app-based symptom assessment)

Evidence: - Most perform poorly compared to experienced physicians Semigran et al., 2015 - Useful for junior residents as learning tools - NOT safe for autonomous diagnosis: require physician oversight - Medicolegal risk if relied upon exclusively

When these tools fail: - Rare diseases (not in training data) - Atypical presentations - Multiple simultaneous problems (multimorbidity) - Social determinants not captured in structured data

Lab Result Interpretation AI

Current capabilities: - Flag abnormal results (traditional rule-based, not true AI) - Suggest follow-up testing based on patterns - Predict lab values (e.g., predict tomorrow’s creatinine from trend) - Identify critical results requiring immediate action

Challenges: - Reference ranges vary by lab, population, clinical context - What’s “normal” for one patient may be abnormal for another - Trend analysis more valuable than single values - Overreliance leads to overordering tests

Part 6: Implementation Challenges Specific to Hospital Medicine

EHR Integration Complexity

Why hospital AI deployment is harder than radiology AI:

Challenge Radiology AI Hospital Medicine AI
Data format Standardized (DICOM) Heterogeneous (HL7, FHIR, proprietary)
Workflow Single modality review Multiple interruptions, fragmented
Decision timeframe Minutes to hours Seconds to minutes
Alert volume 5-10 per shift 100+ per shift
Teams involved Radiologist + ordering MD Primary team + consultants + nursing + pharmacy
Liability Clear (radiologist reads) Diffuse (who owns AI alert?)

Real-world EHR challenges: 1. Alert fatigue: Physicians already override 90% of traditional alerts 2. Data quality: Missing vitals, delayed lab entry, copy-paste notes 3. Workflow disruption: No time to investigate why AI flagged patient as high-risk 4. Handoff communication: Day team’s AI alerts lost in sign-out 5. Institutional variation: Same model performs differently across hospitals

The Alert Fatigue Crisis

Quantifying the problem: - Average hospitalized patient generates 100-700 alerts per day Sendelbach & Funk, 2013 - 85-99% are false positives or clinically irrelevant - Nurses silence alarms without assessment (desensitization) - Associated with adverse events when real alarms missed

AI’s contribution: - Potential: Reduce false alerts by intelligent filtering - Reality: Often adds to alert burden without solving root problem

Solutions: 1. Tiered alert system: Critical vs. warning vs. informational 2. Intelligent alert grouping: Combine related alerts 3. Automatic alert resolution: Silence when trigger resolves 4. Customizable thresholds: Adjust for patient/unit baseline 5. Regular alert audits: Disable low-value alerts

Handoff and Team Communication

AI model handoff problems: - Morning deterioration model score doesn’t carry over to night team - Consult teams don’t see hospitalist team’s AI alerts - Readmission model runs at discharge, too late for intervention

Proposed solutions: - Integrate AI alerts into structured handoff tools (I-PASS) - Display model scores prominently in EHR summary views - Alert primary team AND consultants for relevant findings

Liability for AI-Assisted Decisions

Who is liable when AI makes a wrong recommendation?

Current legal framework (U.S.): - Physicians remain liable for all clinical decisions - “AI told me to” is not a malpractice defense - Must use independent clinical judgment to override when appropriate - Must document rationale if overriding AI recommendations

Hospital liability: - Institutions liable for deploying untested or unvalidated AI - Must have governance structure for AI oversight - Must provide training on appropriate use - Vicarious liability for resident/fellow errors using AI

Vendor liability: - Generally shielded by “decision support” designation - FDA Class II devices have higher liability standard - Breach of warranty if performance misrepresented

Risk mitigation: 1. Use only FDA-cleared tools for high-stakes decisions 2. Document all AI-influenced decisions 3. Maintain clinical override capability 4. Validate locally before deployment 5. Continuous performance monitoring

Part 7: Cost-Benefit Analysis

Does Hospital AI Save Money?

Theoretical cost savings: - Prevent 1 cardiac arrest → save $100,000 (ICU stay, complications) - Prevent 1 readmission → save $15,000 (penalty avoidance + costs) - Reduce hospital length of stay 0.5 days → save $2,000 per patient

Actual financial reality:

Epic Deterioration Index: - License cost: $50,000-250,000 annually (scaled to bed size) - RRT activation cost: $500-1,000 per activation - False positive RRT activations: 70-80% - Net cost per true positive: $15,000-25,000 - Cost-effective only if prevents deterioration or reduces ICU days

Readmission prediction models: - Care transitions program costs $500-1,500 per high-risk patient - Readmission reduction: 2-3 absolute percentage points - Number needed to treat: 30-50 - Cost per readmission prevented: $15,000-75,000 - May not be cost-effective if readmissions below CMS penalty threshold

CardioMEMS (heart failure): - Device + implant cost: $20,000 - Monitoring service: $1,000-2,000/year - Average HF hospitalization cost: $15,000-40,000 - Cost-effective if prevents 1+ hospitalization over 3 years (which it does)

Bottom line: Hospital AI may improve care quality but often doesn’t save money due to: - High implementation costs - Low intervention effectiveness even with good prediction - False positives consuming resources

Part 8: The Future of Hospital AI

Promising Emerging Applications

1. Natural Language Processing for Clinical Notes - Auto-complete discharge summaries - Extract relevant information for handoffs - Identify documentation gaps - Status: Experimental, some vendor pilots

2. Computer Vision for Patient Monitoring - Fall detection from room cameras - Delirium assessment from facial expressions/movement - Pressure ulcer risk from posture analysis - Status: Investigational, privacy concerns

3. Reinforcement Learning for Treatment Optimization - Optimal fluid management in sepsis - Mechanical ventilator weaning protocols - Antibiotic stewardship decision support - Status: Research phase, not ready for clinical deployment

4. LLM Integration - ChatGPT-style interfaces for clinical questions - Automated medical necessity documentation - Patient education materials generation - Status: Active area of vendor development, see Chapter 23

What’s Not Coming (Despite the Hype)

Fully autonomous hospital AI. Too many uncontrolled variables, too much liability.

AI replacing hospitalists. Hospital medicine requires nuanced clinical judgment, patient communication, care coordination.

Perfect readmission prediction. Social determinants and patient behavior unpredictable.

Zero alert fatigue. Adding AI without removing low-value traditional alerts just shifts the problem.

Professional Society Guidelines on AI in Internal Medicine

ACP Position on AI in Healthcare (2024)

The American College of Physicians published “Artificial Intelligence in the Provision of Health Care” in Annals of Internal Medicine (June 2024), outlining 10 recommendations (Daneshvar et al., 2024):

Core Principles:

  1. Augmented, not replaced decision-making: AI-enabled technology should be limited to a supportive role. ACP prefers the term “augmented intelligence” since tools should assist clinicians, not replace them.

  2. Transparency required: AI tools must be developed, tested, and used transparently while prioritizing privacy, clinical safety, and effectiveness.

  3. Health equity priority: AI should actively work to reduce, not exacerbate, health disparities.

  4. Federal oversight needed: Coordinated federal strategy involving governmental and non-governmental regulatory entities for AI oversight.

  5. Medical education integration: Training on AI in medicine should be provided at all levels. Physicians must be able to use technology AND make appropriate clinical decisions independently if AI becomes unavailable.

  6. Patient and clinician awareness: Patients, physicians, and other clinicians should be informed when AI tools are being used in treatment and decision-making.

  7. Reduce clinician burden: AI should be utilized to lower cognitive burden (patient intake, scheduling, prior authorization).

  8. Environmental consideration: Efforts to quantify and mitigate environmental impacts of AI should continue.

ACP AI Resource Hub:

ACP maintains an AI Resource Hub with curated resources including:

  • Generative AI for Internal Medicine Physicians: Self-paced primer covering LLM capabilities, terminology, and clinical use cases
  • AI-Powered Patient Simulation Tools: Practice motivational interviewing with virtual patients (alcohol use, obesity management)
  • DynaMedex with Dyna AI: Clinical decision support with AI-surfaced, evidence-based information (free for ACP members)
  • Annals of Internal Medicine AI Publications: Including the comprehensive “Large Language Models in Medicine: The Potentials and Pitfalls” narrative review (Omiye et al., 2024)

Society of Hospital Medicine (SHM)

SHM has engaged with AI through educational programming and position development. Key areas of focus include:

  • AI applications for sepsis prediction and early warning systems
  • Clinical decision support in inpatient settings
  • Documentation and coding assistance
  • Integration of AI alerts into hospitalist workflow

Implementation guidance: SHM emphasizes that AI tools should integrate with existing EHR workflows and not create additional alert burden for hospitalists already managing complex information environments.

AMA Principles for Augmented Intelligence (Endorsed by Multiple Societies)

The American Medical Association’s “Principles for Augmented Intelligence Development, Deployment, and Use” (2023) has been endorsed by multiple internal medicine societies. Key principles:

  1. AI should be designed to enhance physician decision-making
  2. Transparency in AI development and validation
  3. Physician authority over AI recommendations
  4. Protection of patient data and privacy
  5. Mitigation of algorithmic bias

Key Takeaways

  1. Start with well-validated tools. Epic Deterioration Index and HOSPITAL score have the most evidence.

  2. Demand local validation. External validation studies consistently show performance drops at new institutions.

  3. Have clear response protocols. AI predictions worthless without clinical action plans.

  4. Monitor for alert fatigue. Track override rates, response times, clinician satisfaction.

  5. Be skeptical of autonomous recommendations. Treatment decisions require physician oversight.

  6. Understand the liability landscape. You remain responsible regardless of AI recommendations.

  7. Focus on implementation, not just accuracy. Workflow integration matters more than C-statistic improvements.

  8. Expect model drift. Hospital populations change, requiring periodic retraining.

  9. Learn from sepsis model failures. Prospective validation prevents harm.

  10. Cost-benefit is often unfavorable. AI may improve quality without reducing costs.

Clinical Scenario: Evaluating a Deterioration Prediction Tool

Your hospital is considering deploying WAVE Clinical Platform for early deterioration detection.

Questions to ask vendor:

  1. What is the sensitivity/specificity at your recommended threshold?
  2. Can we run a silent pilot for prospective validation at our institution?
  3. What is the false positive rate, and how will we manage alert fatigue?
  4. How does the model integrate with our Epic EHR?
  5. Who is responsible for responding to alerts: RRT, primary team, or both?
  6. What training is provided for frontline staff?
  7. How often is the model retrained, and with what data?
  8. What is the total cost of ownership (license + hardware + monitoring)?
  9. What is the FDA clearance status?
  10. Can you provide references from similar hospitals?

Red flags: - Vendor refuses prospective validation - Can’t provide false positive rates from comparable institutions - No clear workflow integration plan - Can’t adjust alert thresholds for your population - “Trust us, it works everywhere” attitude

Further Reading

Essential articles:

  • Wong, A. et al. (2021). External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine. doi:10.1001/jamainternmed.2021.2626
  • Rajkomar, A. et al. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine. doi:10.1038/s41746-018-0029-1
  • Bates, D.W. et al. (2014). Ten Commandments for Effective Clinical Decision Support. JAMIA.
  • Omiye, J.A. et al. (2024). Large Language Models in Medicine: The Potentials and Pitfalls: A Narrative Review. Annals of Internal Medicine, 177(2):210-220. doi:10.7326/M23-2772
  • Soleymanjahi, S. et al. (2024). Artificial Intelligence–Assisted Colonoscopy for Polyp Detection: A Systematic Review and Meta-analysis. Annals of Internal Medicine, 177:1652-1663. doi:10.7326/ANNALS-24-00981 [Meta-analysis of 44 RCTs showing AI-assisted colonoscopy increases adenoma detection rate (44.7% vs 36.7%) but also increases resection of nonneoplastic polyps]

Organizational resources:

  • ACP AI Resource Hub: Curated AI resources, courses, and clinical tools for internists
  • ACP Policy Position Paper on AI: Official ACP recommendations for AI in healthcare
  • Society of Hospital Medicine: Position statement on AI
  • CMS: Hospital readmissions reduction program data

For deeper dives:

  • See Chapter 16 (Evaluating AI Clinical Decision Support)
  • See Chapter 19 (Clinical AI Safety)
  • See Chapter 20 (Integration into Clinical Workflow)
  • See Chapter 21 (Medical Liability)

Check Your Understanding

Clinical Scenario 1: Your hospital deploys Epic Deterioration Index. After 3 months, you notice nurses frequently ignore the alerts. What should you investigate?

Answer:

This is classic alert fatigue. Investigate:

  1. False positive rate: What percentage of high-risk alerts led to actual deterioration? If >70% are false, threshold may be too sensitive.

  2. Alert volume: How many alerts per nurse per shift? If >15-20, likely overwhelming.

  3. Response burden: Does every alert require RRT activation, or is there a tiered response?

  4. Competing alerts: How many total alerts are nurses managing? EDI may be adding to existing burden.

  5. Training adequacy: Do nurses understand what the score means and when to escalate?

Solutions: - Adjust alert threshold higher (fewer alerts, higher specificity) - Implement tiered response (high-risk → RRT, medium-risk → enhanced monitoring) - Remove low-value traditional alerts to make room for AI alerts - Provide refresher training on clinical significance

Bottom line: The most accurate model is worthless if frontline staff ignore it due to alert fatigue.
Clinical Scenario 2: A 78-year-old with CHF, COPD, CKD, and diabetes is flagged as high readmission risk (85th percentile). What interventions are evidence-based?

Answer:

Despite high predicted risk, evidence-based interventions to reduce readmissions are limited:

Definitely do: - Medication reconciliation (prevents ADEs) - Schedule follow-up within 7 days (reduces ED visits) - Ensure patient has primary care physician - Optimize chronic disease management before discharge - Assess and address health literacy / social barriers

Probably helpful: - Assign to care transitions nurse for post-discharge phone call - Refer to disease management programs (CHF clinic, diabetes educator) - Consider CardioMEMS if advanced HF (NYHA III-IV)

Not evidence-based: - Delay discharge solely due to high readmission risk - Generic “readmission prevention program” without individualization - Prophylactic antibiotics or other medications - Mandatory home health (unless clinically indicated)

Key insight: Prediction is good, but interventions remain limited. Focus on addressing modifiable risk factors specific to this patient.
Clinical Scenario 3: Epic sepsis model alerts you about possible sepsis in a postoperative patient with borderline tachycardia (HR 105) and normal lactate. Patient looks well, is eating, and has no source of infection. What do you do?

Answer:

This is likely a false positive alert. Appropriate response:

  1. Perform clinical assessment: Does patient meet SIRS criteria? Is there a suspected source of infection?

  2. Context matters: Postoperative tachycardia is common and often not sepsis.

  3. Don’t reflexively start sepsis bundle: IV fluids, broad-spectrum antibiotics not indicated without clinical suspicion.

  4. Document your reasoning: “Epic sepsis alert reviewed. Patient does not meet clinical criteria for sepsis. HR likely postoperative/pain-related. Will monitor. Discussed with attending.”

  5. Provide feedback: Report false positive to informatics team to adjust model threshold.

What NOT to do: - Ignore alert without assessment - Start antibiotics “just to be safe” - Order excessive workup (blood cultures, imaging) without clinical indication

Lesson: AI alerts require clinical context. Blindly following alerts is as dangerous as blindly ignoring them.

References