[Internal Medicine and Hospital Medicine]{.chapter-title}

doi:10.5281/zenodo.18251405

Internal Medicine and Hospital Medicine

Internal medicine AI confronts messy reality. Hospital patients have five or more comorbidities, ten medications, fragmented care across rotating teams, and time-pressured workflows interrupted by constant alerts. Unlike radiology’s standardized images or pathology’s discrete specimens, hospital medicine generates heterogeneous data under constraints that make AI deployment particularly challenging.

Learning Objectives

After reading this chapter, you will be able to:

Evaluate AI-powered early warning systems for clinical deterioration
Critically assess readmission prediction models and their limitations
Understand AI applications in chronic disease management (diabetes, heart failure, CKD)
Compare automated insulin delivery systems and their regulatory status
Assess AI-based screening tools for diabetic retinopathy and thyroid nodules
Navigate EHR-integrated clinical decision support systems
Recognize failure modes specific to hospital AI implementations
Apply evidence-based frameworks for selecting hospital AI tools

Chapter Summary (TL;DR)

The Clinical Context:

Internal medicine and hospital medicine face unique AI challenges: complex multimorbid patients, fragmented care across multiple teams, time-pressured decision-making, and workflow constraints that make implementation far more difficult than in procedure-based specialties.

Key Applications:

Hospital early warning systems: Predict deterioration, cardiac arrest, sepsis (Epic Deterioration Index, WAVE Clinical Platform)
Readmission risk prediction: Identify high-risk patients for enhanced discharge planning (HOSPITAL score enhanced, LACE index AI versions)
Sepsis prediction: Controversial; Epic sepsis model had major implementation issues
Chronic disease management: Diabetes (closed-loop insulin, CGM AI), heart failure, COPD monitoring with AI-enhanced alerts
Endocrine AI: Automated insulin delivery systems (Omnipod 5, Diabeloop DBLG2), thyroid nodule evaluation (Koios DS), diabetic retinopathy point-of-care screening (LumineticsCore)
Nephrology AI: CKD progression prediction (Kidney Klinrisk Algorithm, CE Mark 2025); ASN framework for responsible AI use
Medication safety: Drug-drug interactions, deprescribing recommendations, personalized dosing
Outpatient/ambulatory AI: Diabetes apps, hypertension management, remote patient monitoring (see Part 6)
Autonomous treatment recommendations: Mostly experimental; liability and safety concerns

What Actually Works:

Epic Deterioration Index (EDI): Predicts clinical deterioration 6-12 hours before traditional measures, but with high false positive rates (70-80%)
Readmission models: Good prediction (C-statistic 0.65-0.75), but interventions to reduce readmissions remain elusive
Heart failure remote monitoring: AI analysis of implantable device data reduces hospitalizations by 30-40% in selected patients
AI-led diabetes prevention: Noninferior to human coaching in JAMA 2025 trial; scalable for population health
Automated insulin delivery: Omnipod 5 and Diabeloop DBLG2 cleared in late 2025 with improved time-in-range and reduced meal announcement burden
Autonomous DR screening: LumineticsCore achieves 100% screening completion at point of care versus standard referral (ACCESS RCT)

What Doesn’t Work (Yet):

Epic Sepsis Model: Deployed widely, then found to have only 33% sensitivity (missed 67% of sepsis cases) with 12% PPV (88% false alerts) Wong et al., 2021
Autonomous discharge planning: Too many uncontrolled variables; requires physician oversight
AI-automated consult recommendations: Workflow disruption outweighs benefits

Critical Insights:

Alert fatigue is the primary failure mode. Hospital systems generate 100+ alerts per patient per day; AI adds to this burden without solving it.

External validation often fails. Hospital AI trained at one institution performs poorly at others due to EHR differences, patient populations, and workflows.

Implementation > accuracy. A 75% accurate model that integrates smoothly beats a 90% model that disrupts workflow.

Liability remains with physicians. AI recommendations don’t change malpractice responsibility.

Clinical Bottom Line:

Hospital AI shows promise but remains immature compared to radiology or pathology AI. Most applications require active physician oversight. Start with well-validated deterioration prediction and readmission models. Demand prospective local validation before deployment. Monitor continuously for alert fatigue and model drift. Do not deploy autonomous treatment recommendations without extensive pilot testing and clear clinical ownership.

Medico-Legal Considerations:

Document all AI-assisted decisions in clinical notes
Understand your institution’s AI liability policy
Know FDA clearance status of deployed tools
Maintain clinical override capability for all AI recommendations
Track alert fatigue metrics to prevent desensitization

Essential Reading:

Wong et al. (2021) on Epic sepsis model external validation failures Wong et al., 2021
Nanji et al. (2014) on alert override rates in clinical decision support Nanji et al., 2014
Rajkomar et al. (2018) on EHR deep learning for readmission prediction Rajkomar et al., 2018

Introduction: The Complexity Challenge

Hospital medicine operates under unique constraints that make AI integration particularly challenging:

The perfect storm for AI failure: 1. High complexity patients: Average hospitalized patient has 5+ comorbidities, 10+ medications, multiple consultants 2. Fragmented care: Rotating attendings, cross-covering residents, multiple shifts, discontinuity across transitions 3. Time pressure: 12-20 patients per hospitalist, interrupted workflows, limited time for AI model interrogation 4. EHR workflow constraints: AI alerts compete with 100+ other daily alerts 5. Heterogeneous data: Vital signs every 4 hours, labs sporadic, clinical notes unstructured

These constraints explain why hospital AI deployment has been slower and more problematic than in specialties with standardized, high-volume, single-modal data (radiology, pathology, dermatology).

The evidence that follows focuses on what actually works in real hospital environments, not what works in retrospective datasets.

Part 1: Patient Deterioration Prediction

The Clinical Problem

Hospitalized patients deteriorate gradually before cardiac arrest, respiratory failure, or septic shock. Traditional early warning scores (Modified Early Warning Score [MEWS], National Early Warning Score [NEWS]) use threshold-based rules that miss early subtle changes.

AI-based early warning systems aim to detect deterioration earlier and more accurately than traditional scores.

Epic Deterioration Index (EDI)

What it is: Machine learning model embedded in Epic EHR that continuously calculates deterioration risk using: - Vital signs (heart rate, BP, respiratory rate, temperature, SpO2) - Lab values - Medications administered - Demographics and comorbidities - Prior risk scores

How it’s deployed: - Risk score 0-100 displayed in Epic flowsheet - Updates every 15 minutes with new data - Thresholds trigger alerts to RRT (Rapid Response Team) - Used at 150+ hospitals in Epic networks

Evidence:

Successes: - Validation studies show acceptable discrimination (AUROC 0.76 at observation level) for predicting cardiac arrest, ICU transfer, or death within 24 hours (Byrd et al., 2023) - Detects deterioration 6-12 hours earlier than traditional NEWS scores - Meta-analyses of AI early warning systems show 25-35% reductions in major deterioration events when paired with active response protocols

Limitations: - High false positive rate (70-80%): For every true deterioration, 3-4 false alerts - Alert fatigue leads to desensitization; nurses/residents begin ignoring alerts - External validation shows performance drops significantly at different hospitals - Requires active response protocol (RRT activation) to be effective - No proven mortality benefit in RCTs: process improvement without outcome improvement

The Epic Sepsis Model Disaster:

A cautionary tale for hospital AI deployment. Epic’s sepsis prediction model was: - Trained on retrospective data from hundreds of hospitals - Deployed widely across Epic networks (2016-2020) - Used to trigger sepsis bundles (fluids, antibiotics, lactate measurement)

What went wrong Wong et al., 2021: 1. Terrible sensitivity (33%): Missed 67% of actual sepsis cases 2. Overwhelming false positives: 88% of alerts were false 3. Alert fatigue: Clinicians stopped responding to alerts 4. Delayed care: Some institutions relied on model instead of clinical judgment 5. Legal liability: Patients with missed sepsis, families sued

First FDA-authorized AI sepsis diagnostic: Prenosis Sepsis ImmunoScore (April 2024)

The FDA granted De Novo authorization (DEN230036, April 3, 2024) to Prenosis’s Sepsis ImmunoScore, the first FDA-authorized AI diagnostic tool for sepsis. Using 22 parameters including procalcitonin and CRP, it simultaneously identifies sepsis presence and predicts 24-hour progression risk. Trained on 100,000+ samples from 25,000 patients across 10 hospitals. AUROC 0.85 in derivation, 0.80-0.81 in validation. This is distinct from the Epic Sepsis Model: it is a De Novo-authorized device (establishing a new device classification), not an EHR-embedded prediction algorithm (Prenosis, NEJM AI, 2024).

A second mRNA-based tool, TriVerity (Inflammatix), received FDA 510(k) clearance (K241676, January 10, 2025): a molecular test amplifying 29 mRNAs from patient blood to distinguish bacterial vs. viral infection and predict need for critical care within 7 days. Validated in the SEPSIS-SHIELD study (n=1,222, 22 ED sites). This is the first FDA-cleared host-response mRNA ML test for infection.

Why it failed: - Model optimized for specificity (reducing false positives) at expense of sensitivity - Sepsis definition varies across institutions (“Sepsis-2” vs “Sepsis-3” criteria) - Training data quality issues (mislabeled cases, selection bias) - Implemented without prospective validation at each site - No feedback loop for model updating

Lessons learned: - Require prospective validation before deployment - Monitor real-world performance continuously - Have clinical override capability - Track alert fatigue metrics - Don’t deploy widely without site-specific testing

Alternative Early Warning Systems

WAVE Clinical Platform: - Continuous monitoring of vital signs + waveform data (ECG, plethysmography) - Predicts deterioration 6+ hours in advance - FDA-cleared Class II medical device - Used in some academic medical centers - Better specificity than EDI but requires continuous monitoring hardware

Rothman Index: - Proprietary algorithm from PeraHealth - Uses nursing assessments + labs + vitals - Trend-based rather than threshold-based - Some evidence for predicting deterioration Rothman et al., 2013 - Integration challenges with non-Epic EHRs

CONCERN Early Warning System: - Uses real-time nursing surveillance documentation patterns rather than relying only on vitals and labs - Pragmatic cluster-randomized trial across 74 clinical units and 60,893 hospital encounters - Intervention encounters had lower instantaneous mortality risk (adjusted HR 0.64), shorter length of stay (adjusted incidence rate ratio 0.91), and higher unanticipated ICU transfer risk (adjusted HR 1.25), consistent with earlier escalation rather than fewer escalations (Rossetti et al., 2025) - Clinical lesson: nursing documentation can contain early deterioration signal, but the model works only if paired with a response pathway that can absorb more escalations

Implementation Framework for Deterioration Prediction

Before deploying:

Prospective validation at your institution
- Run model silently for 3-6 months
- Compare predictions to actual outcomes
- Calculate sensitivity, specificity, PPV, NPV for your patient population
- Determine appropriate alert thresholds
Establish clear response protocols
- Who responds to alerts? (RRT, primary team, nurse manager)
- What triggers immediate response vs. enhanced monitoring?
- How to document AI-triggered evaluations?
- Feedback mechanism when alerts are false positives
Monitor alert fatigue
- Track alert frequency per nurse shift
- Measure time from alert to response
- Survey frontline staff quarterly
- Adjust thresholds if >60% are false positives
Plan for model updating
- How often will model be retrained?
- Who monitors for model drift?
- Process for incorporating local data

Red flags: - Vendor won’t provide local validation data - Can’t adjust alert thresholds for your population - No clear workflow integration plan - Can’t turn off model if performance deteriorates

Part 2: Readmission Risk Prediction

The Clinical Problem

30-day hospital readmissions cost Medicare $17 billion annually. CMS penalizes hospitals with high readmission rates. Targeting high-risk patients for enhanced discharge planning and follow-up theoretically reduces readmissions.

Traditional approach: LACE index (Length of stay, Acuity, Comorbidities, Emergency visits) or HOSPITAL score

AI enhancement: Add hundreds of variables from EHR to improve prediction accuracy

Evidence for AI-Enhanced Readmission Prediction

Major studies:

Rajkomar et al. (2018), Google/UCSF/Stanford/Chicago collaboration Rajkomar et al., 2018: - Deep learning on EHR data from 216,221 hospitalizations - Predicted readmissions with C-statistic 0.76-0.77 - Used 100,000+ variables (vs. 10 in traditional scores) - Better than traditional models but only modestly (0.76-0.77 vs. 0.68-0.70 for the HOSPITAL score)

Key findings: - Modest accuracy improvement over simple models - But no evidence improved predictions lead to fewer readmissions - Model interpretability poor, making it hard to know why someone is high-risk - External validation showed performance drop at different institutions

The intervention problem:

Even perfect prediction doesn’t reduce readmissions if we lack effective interventions. Meta-analyses show: - Care transitions programs: Small benefit (2-3% absolute reduction) - Post-discharge phone calls: No consistent benefit - Medication reconciliation: Prevents adverse drug events but not readmissions - Home visits: Expensive, modest benefit in heart failure only

Bottom line: AI predicts readmissions reasonably well, but we still don’t know how to prevent them effectively.

Practical Use of Readmission Models

What works: 1. Risk stratification for care transitions programs - Target highest 10% risk patients for intensive discharge planning - Assign to care transition nurses - Ensure 7-day follow-up appointment

Identifying modifiable risk factors
- Polypharmacy (>10 medications)
- Lack of primary care
- Uncontrolled symptoms at discharge
- Poor health literacy
Documentation for value-based care
- Support risk-adjusted quality metrics
- Identify patients for bundled payment programs

What doesn’t work: - Predicting readmissions to avoid admitting patients (unethical, illegal in many cases) - Automatic discharge delays for high-risk patients (no evidence of benefit) - Generic “readmission reduction interventions” without individualization

Part 3: Chronic Disease Management

Diabetes Management AI

Continuous glucose monitoring (CGM) + AI:

Closed-loop insulin systems (“artificial pancreas”):

The automated insulin delivery (AID) landscape expanded significantly in late 2025:

System	Company	FDA Clearance	Key Advance
Omnipod 5 (algorithm update)	Insulet	510(k), December 2025	Lower 100 mg/dL target glucose option; 11.8% time-in-range improvement
DBLG2	Diabeloop	510(k), December 2025	First AID where meal announcements are no longer mandatory; smartphone app-based
Control-IQ	Tandem	FDA-cleared	Hybrid closed-loop with Dexcom G6/G7
Medtronic 780G	Medtronic	FDA-cleared	Automatic correction boluses

AI algorithms adjust basal insulin delivery based on CGM data
Evidence: Improve time-in-range by approximately 11 percentage points (61% to 71%), reduce hypoglycemia (Brown et al., 2019)
ADA 2025/2026 Standards of Care Section 7 now includes expanded guidance on technology-assisted diabetes management, recommending AID systems for appropriate candidates with type 1 diabetes

For hospitalized patients: - AI-enhanced insulin dosing protocols - Predicts hypoglycemia from CGM trends - Some evidence for reducing severe hypoglycemia in ICU (Stewart et al., 2016) - Not yet widely deployed: most hospitals use traditional sliding scale or protocol-driven dosing

A 2025 multicenter randomized clinical trial tested a real-time AI insulin clinical decision support system against senior endocrinologist titration in 149 hospitalized adults with type 2 diabetes. The AI system met the prespecified noninferiority criterion for time in target glucose range (76.4% vs. 73.6%) with no significant adverse-event difference (Ying et al., 2025). This supports supervised inpatient dosing assistance, not autonomous glycemic management.

Outpatient diabetes AI: - Pattern recognition in CGM data (meal response, exercise, sleep impact) - Predictive alerts for hypoglycemia (Dexcom G6 “Urgent Low Soon”) - Insulin dose titration recommendations - Evidence: Modest HbA1c improvements (0.3-0.5% vs. standard CGM) (Weisman et al., 2017)

Diabetic retinopathy (DR) screening AI:

Despite universal screening recommendations, only 50–70% of patients with diabetes complete recommended annual eye exams. AI-based point-of-care screening aims to close this gap:

LumineticsCore (formerly IDx-DR): first FDA-authorized autonomous AI diagnostic (De Novo DEN180001, 2018), providing instant referral decisions without specialist interpretation
Sensitivity 87.2%, specificity 90.7% for more than mild DR in a prospective primary care trial
The ACCESS RCT demonstrated 100% screening completion in the autonomous AI group versus standard referral among diabetic youth (Wolf et al., 2024)
Adoption remains limited despite 5+ years of commercialization; fewer than 200 AI-equipped cameras deployed across 5 U.S. health systems
Integration opportunity: Point-of-care retinal screening at the same visit as HbA1c monitoring eliminates a separate ophthalmology referral barrier

Heart Failure Remote Monitoring

Implantable device data + AI:

CardioMEMS system (Abbott): - Implanted pulmonary artery pressure sensor - Daily measurements transmitted wirelessly - AI algorithms detect early decompensation (pressure trends) - Alerts clinicians before symptoms develop

Evidence from CHAMPION trial Abraham et al., 2011: - 37% reduction in heart failure hospitalizations - Benefit sustained over 5+ years - FDA-approved, covered by CMS - Cost-effective (~$20,000 device vs. $40,000 per HF hospitalization)

Other remote monitoring: - Weight + symptoms apps (mixed evidence, many abandoned by patients) - Wearable sensors (Apple Watch, Fitbit): investigational - EKG patches: promising for arrhythmia detection, unclear for HF

Why CardioMEMS works but home monitoring often doesn’t: - Objective physiologic data (PA pressure) vs. subjective symptoms - No patient adherence required (automatic transmission) - Clear clinical action (adjust diuretics) vs. vague “see doctor” - Proven RCT evidence before widespread deployment

CardioMEMS HERO: updated PA pressure reader (FDA approved February 2026):

Abbott received FDA approval on February 27, 2026 for the CardioMEMS HERO device, an updated pulmonary artery pressure reader that replaces the prior external reader unit within the CardioMEMS HF System (original PMA: P100045; Abbott, February 2026). The HERO reader is compatible with the existing implanted CardioMEMS PA Sensor already deployed in patients.

Key design changes over the prior reader:

Reported 60% lighter than its predecessor (Abbott, February 2026)
Integrated Wi-Fi and cellular connectivity, allowing readings without proximity to a dedicated home electronics unit
Consistent reading position enabled by the new form factor, improving daily PA pressure trend reliability
Abbott reports reading time under 60 seconds

The HERO reader does not modify the implanted sensor or the indication for the CardioMEMS HF System. The underlying clinical evidence, including a 37% reduction in HF hospitalizations in the CHAMPION trial (Abraham et al., Lancet, 2011) and GUIDE-HF as the largest randomized trial of PA pressure monitoring (Lindenfeld et al., Lancet, 2021; note: primary endpoint was not statistically significant in the overall randomized cohort), apply to the system as a whole. Abbott reports commercial release in the United States beginning in 2026.

Clinical implications: For patients already implanted with a CardioMEMS PA Sensor, the HERO reader offers a practical upgrade path that may improve daily adherence to remote monitoring, particularly for patients who travel frequently or have difficulty with the bulkier prior reader. The convenience improvement addresses a real-world barrier to consistent PA pressure transmission: many patients skip readings when traveling. Whether improved reader design translates to better hemodynamic data completeness and downstream outcomes will require prospective evaluation.

COPD Exacerbation Prediction

Approaches: - Daily symptom questionnaires + AI pattern recognition - Spirometry + machine learning - Wearable sensors (activity, respiratory rate, oxygen saturation)

Evidence: - Most studies small, single-center, retrospective - Prediction accuracy C-statistic 0.65-0.75 (modest) - High false positive rates: every cold triggers “impending exacerbation” alert - No RCT evidence that prediction prevents exacerbations or hospitalizations

Why COPD AI lags behind HF: - Exacerbations more heterogeneous (infectious vs. non-infectious, cardiac vs. pulmonary) - Patient adherence to monitoring is poor - No clear “rescue intervention” like diuretic adjustment in HF - Many exacerbations resolve spontaneously without intervention

Thyroid Nodule AI

Thyroid nodule evaluation generates significant unnecessary biopsies: only 5–15% of biopsied nodules prove malignant. AI-assisted ultrasound aims to improve risk stratification.

Koios DS (Koios Medical):

FDA-cleared AI platform for thyroid and breast ultrasound (510(k) K242130)
Aligns AI-generated findings to ACR TI-RADS and ATA classification systems
In reported studies, physicians using Koios DS showed 14% higher thyroid cancer detection with 35% fewer unnecessary biopsies and 50% lower interpretation variability
Analysis completes in approximately 2 seconds per case
Built from ultrasound data across 48 international sites

Limitations:

Performance data primarily from vendor-supported studies; large-scale independent validation pending
Requires integration with compatible ultrasound systems and PACS
Does not replace fine needle aspiration when clinically indicated

Nephrology and CKD AI

Chronic kidney disease affects approximately 1 in 7 U.S. adults, yet early-stage disease is frequently undiagnosed. AI applications in nephrology focus on progression prediction and risk stratification.

ASN Position Statement:

The American Society of Nephrology AI Workgroup published a framework for responsible AI use in nephrology (Tangri et al., 2025), establishing core principles: patient benefit priority, mandatory clinician oversight, and innovation in high-burden kidney disease areas. The framework covers AI applications across CKD, AKI, dialysis, and transplantation.

CKD Progression Prediction:

Kidney Klinrisk Algorithm (Roche/KlinRisk): first AI-based CKD risk stratification tool to receive CE Mark (October 2025); predicts kidney function decline in CKD stages G1–G4 using routine blood and urine tests
Greater than 80% accuracy predicting CKD progression over 5 years in a validation cohort of over 4 million U.S. adults
Available in Europe and UK via Roche’s navify Algorithm Suite; U.S. launch pending

Current Gaps:

No FDA-cleared AI tools specifically for nephrology in the U.S. market
AKI prediction models (see Critical Care and Emergency Medicine) are the most mature nephrology-adjacent AI application
Dialysis adequacy prediction and transplant matching algorithms remain investigational

Part 4: Medication Safety and Management

AI-Enhanced Drug-Drug Interaction (DDI) Checking

The problem with traditional DDI alerts: - 90-95% override rate due to excessive, irrelevant alerts Phansalkar et al., 2010 - Alert fatigue leads to dangerous overrides (missing critical interactions) - Rule-based systems don’t consider clinical context

AI enhancements: - Machine learning predicts which DDI alerts are clinically significant - Context-aware filtering (considers dose, duration, patient factors) - Phenotype-based risk stratification - Natural language processing of clinical notes to identify relevant contraindications

Evidence: - Machine learning approaches can reduce low-value alerts while maintaining safety (Baron et al., JAMIA Open, 2021) - No RCT evidence yet for improved patient outcomes - Implementation challenges with legacy EHR systems

Deprescribing Recommendations

AI to identify inappropriate polypharmacy: - Screen for Beers Criteria medications in elderly - Identify duplicate therapies - Detect medications without clear indication - Suggest deprescribing based on life expectancy, goals of care

Example tools: - MedSafer (Canada): deprescribing decision support - TRIM (Tool to Reduce Inappropriate Medication): ML-based - Epic-embedded alerts for high-risk medications

Evidence: - Reduces inappropriate prescriptions in studies when paired with pharmacist review; evidence is mixed and effect sizes vary by setting - No consistent mortality benefit - Patient/family education critical for acceptance

Personalized Dosing

Pharmacokinetic/pharmacodynamic (PK/PD) modeling + AI:

Promising areas: 1. Vancomycin dosing: AI predicts trough levels, adjusts dosing for renal function 2. Warfarin dosing: Integrates genetic variants (VKORC1, CYP2C9) + clinical factors 3. Chemotherapy: BSA-based dosing vs. AI-optimized dosing for toxicity reduction

Reality check: - Therapeutic drug monitoring still requires clinical judgment - Most “AI dosing” is just better PK models, not true machine learning - Genetic testing availability limits personalized dosing adoption - Cost-benefit unclear for most medications

Part 5: Diagnostic Decision Support

Differential Diagnosis Generation

Traditional tools: - Isabel DDx: symptom + finding input → differential diagnosis list - DXplain (MGH): Bayesian inference - UpToDate “Clinical Decision Support”

AI-enhanced tools: - Google Health studies on differential diagnosis from clinical vignettes (not publicly available) - Babylon Health symptom checker (UK-based, telemedicine) - Ada Health (app-based symptom assessment)

Evidence: - Most perform poorly compared to experienced physicians Semigran et al., 2015 - Useful for junior residents as learning tools - NOT safe for autonomous diagnosis: require physician oversight - Medicolegal risk if relied upon exclusively

When these tools fail: - Rare diseases (not in training data) - Atypical presentations - Multiple simultaneous problems (multimorbidity) - Social determinants not captured in structured data

Lab Result Interpretation AI

Current capabilities: - Flag abnormal results (traditional rule-based, not true AI) - Suggest follow-up testing based on patterns - Predict lab values (e.g., predict tomorrow’s creatinine from trend) - Identify critical results requiring immediate action

Challenges: - Reference ranges vary by lab, population, clinical context - What’s “normal” for one patient may be abnormal for another - Trend analysis more valuable than single values - Overreliance leads to overordering tests

Part 6: Outpatient and Ambulatory AI

Most internal medicine AI literature focuses on inpatient settings, yet the majority of internists practice primarily in outpatient environments. Ambulatory AI presents distinct challenges and opportunities that differ substantially from hospital-based applications.

The Ambulatory AI Landscape

Why outpatient AI differs from inpatient:

Dimension	Inpatient AI	Outpatient AI
Data density	Continuous monitoring, frequent labs	Intermittent visits, patient-reported data
Time horizon	Hours to days	Weeks to months
Intervention urgency	Immediate	Scheduled, gradual
Patient role	Passive recipient	Active self-manager
Alert fatigue source	EHR alerts	App notifications
Adherence monitoring	Direct observation	Self-reported, device-based

AI for Chronic Disease Self-Management

Diabetes management apps and platforms:

A 2025 meta-analysis of 41 RCTs found that mobile health apps with AI components significantly improved glycemic control, with a mean HbA1c reduction of 0.49% (95% CI: −0.65 to −0.32%) compared to standard care (Yu et al., 2025). However, effect sizes are modest and heterogeneity across studies is high.

The Johns Hopkins AI-led Diabetes Prevention Program trial (Mathioudakis et al., 2025):

Randomized 368 adults with prediabetes to AI-led vs. human-coached lifestyle intervention
Primary endpoint achieved by 31.8% in both groups: AI was noninferior to human coaching
Risk difference: −0.2% (1-sided 95% CI, −8.2%), meeting noninferiority margin
Clinical implication: AI can deliver evidence-based lifestyle interventions at scale, potentially addressing the shortage of trained diabetes prevention coaches

Hypertension management:

AI-powered lifestyle coaching shows promise but evidence remains limited. A single-arm trial found AI-based precision lifestyle guidance reduced systolic BP by 8.1 mmHg overall (14.2 mmHg for stage-2 hypertension) over 12 weeks (Leitner et al., 2024, JMIR Cardio). However, a 2024 pilot systematic review concluded that evidence for AI-supported clinical practice lowering blood pressure in real-world settings remains scarce (Merai et al., 2024).

Remote Patient Monitoring: Beyond Implanted Devices

Consumer wearables and connected devices:

Unlike CardioMEMS (covered in Part 3), consumer-grade remote monitoring uses smartphones, wearables, and connected devices for chronic disease management. A 2024 systematic review of 29 RPM studies across 16 countries found seven distinct intervention types: communication tools, computer-based systems, smartphone apps, web portals, augmented clinical devices, wearables, and standard clinical tools (Tan et al., 2024).

What the evidence shows:

COPD monitoring: 57% of RCTs for RPM in COPD exacerbations fail to achieve required evidence level for better outcomes (Whelan et al., 2024)
Heart failure: Non-implanted RPM (weight scales, symptom apps) shows inconsistent benefit compared to CardioMEMS
Diabetes: CGM-based apps with AI pattern recognition show modest benefit (0.3-0.5% HbA1c improvement)
Hypertension: Home BP monitoring with algorithmic medication titration is endorsed by 2024 ESC/ESH guidelines (Class IIa, Level B)

Why Consumer RPM Often Fails

The adherence cliff:

Initial enthusiasm, rapid dropout: Most health apps see 80% user attrition within 30 days
Alert fatigue at home: Patients disable notifications that interrupt daily life
No clear action pathway: “Your blood pressure is high” without clinical follow-up is unhelpful
Device burden: Multiple apps, devices, and chargers overwhelm patients

Successful RPM characteristics:

Passive data collection (no daily tasks required)
Clear escalation pathways to clinical care
Integration with EHR for clinician visibility
Actionable recommendations, not just data display
Proven intervention if monitoring detects problems

Practical Implementation for Ambulatory Internists

Currently evidence-supported:

Continuous glucose monitoring: Dexcom, Libre, and closed-loop insulin systems have robust RCT evidence
Home BP monitoring programs: When integrated with medication titration protocols
AI-based diabetes prevention: Johns Hopkins trial validates scalable lifestyle coaching
Virtual diabetes education: Supplements in-person care, does not replace it

Not yet evidence-supported:

Autonomous chronic disease management without physician oversight
Wearable-based diagnosis of new conditions
AI replacing clinical decision-making for medication changes
Generalized “health monitoring” apps without disease-specific validation

Questions to ask before recommending patient-facing AI tools:

Is there RCT evidence for the specific condition and population?
How is data shared with the clinical team?
What happens when the app detects a problem?
What is the patient’s burden (daily logging, charging, calibration)?
How does this integrate with your EHR workflow?

Part 7: Implementation Challenges Specific to Hospital Medicine

Hospitalist AI Use Is Outpacing Governance

A 2026 survey study of 70 hospitalists at a large urban academic center found that 66.7% reported using AI in clinical practice despite no system-wide AI integration or institutional endorsement (Bagla et al., 2026). Use was concentrated in supplementary decision support tasks such as answering clinical questions, generating differential diagnoses, and exploring management options. Most users reported AI use in fewer than 25% of encounters, which indicates early but meaningful workflow penetration rather than full dependence. For hospital leaders, the operational signal is clear: adoption can expand before governance, validation, and training frameworks are in place.

EHR Integration Complexity

Why hospital AI deployment is harder than radiology AI:

Challenge	Radiology AI	Hospital Medicine AI
Data format	Standardized (DICOM)	Heterogeneous (HL7, FHIR, proprietary)
Workflow	Single modality review	Multiple interruptions, fragmented
Decision timeframe	Minutes to hours	Seconds to minutes
Alert volume	5-10 per shift	100+ per shift
Teams involved	Radiologist + ordering MD	Primary team + consultants + nursing + pharmacy
Liability	Clear (radiologist reads)	Diffuse (who owns AI alert?)

Real-world EHR challenges: 1. Alert fatigue: Physicians already override 90% of traditional alerts 2. Data quality: Missing vitals, delayed lab entry, copy-paste notes 3. Workflow disruption: No time to investigate why AI flagged patient as high-risk 4. Handoff communication: Day team’s AI alerts lost in sign-out 5. Institutional variation: Same model performs differently across hospitals

The Alert Fatigue Crisis

Quantifying the problem: - Average hospitalized patient generates 100-700 alerts per day Sendelbach & Funk, 2013 - 85-99% are false positives or clinically irrelevant - Nurses silence alarms without assessment (desensitization) - Associated with adverse events when real alarms missed

AI’s contribution: - Potential: Reduce false alerts by intelligent filtering - Reality: Often adds to alert burden without solving root problem

Solutions: 1. Tiered alert system: Critical vs. warning vs. informational 2. Intelligent alert grouping: Combine related alerts 3. Automatic alert resolution: Silence when trigger resolves 4. Customizable thresholds: Adjust for patient/unit baseline 5. Regular alert audits: Disable low-value alerts

Handoff and Team Communication

AI model handoff problems: - Morning deterioration model score doesn’t carry over to night team - Consult teams don’t see hospitalist team’s AI alerts - Readmission model runs at discharge, too late for intervention

Proposed solutions: - Integrate AI alerts into structured handoff tools (I-PASS) - Display model scores prominently in EHR summary views - Alert primary team AND consultants for relevant findings

Liability for AI-Assisted Decisions

Who is liable when AI makes a wrong recommendation?

Current legal framework (U.S.): - Physicians remain liable for all clinical decisions - “AI told me to” is not a malpractice defense - Must use independent clinical judgment to override when appropriate - Must document rationale if overriding AI recommendations

Hospital liability: - Institutions liable for deploying untested or unvalidated AI - Must have governance structure for AI oversight - Must provide training on appropriate use - Vicarious liability for resident/fellow errors using AI

Vendor liability: - Generally shielded by “decision support” designation - FDA Class II devices have higher liability standard - Breach of warranty if performance misrepresented

Risk mitigation: 1. Use only FDA-cleared tools for high-stakes decisions 2. Document all AI-influenced decisions 3. Maintain clinical override capability 4. Validate locally before deployment 5. Continuous performance monitoring

Part 8: Cost-Benefit Analysis

Does Hospital AI Save Money?

Theoretical cost savings: - Prevent 1 cardiac arrest → save $100,000 (ICU stay, complications) - Prevent 1 readmission → save $15,000 (penalty avoidance + costs) - Reduce hospital length of stay 0.5 days → save $2,000 per patient

Actual financial reality:

Epic Deterioration Index: - License cost: $50,000-250,000 annually (scaled to bed size) - RRT activation cost: $500-1,000 per activation - False positive RRT activations: 70-80% - Net cost per true positive: $15,000-25,000 - Cost-effective only if prevents deterioration or reduces ICU days

Readmission prediction models: - Care transitions program costs $500-1,500 per high-risk patient - Readmission reduction: 2-3 absolute percentage points - Number needed to treat: 30-50 - Cost per readmission prevented: $15,000-75,000 - May not be cost-effective if readmissions below CMS penalty threshold

CardioMEMS (heart failure): - Device + implant cost: $20,000 - Monitoring service: $1,000-2,000/year - Average HF hospitalization cost: $15,000-40,000 - Cost-effective if prevents 1+ hospitalization over 3 years (which it does)

Bottom line: Hospital AI may improve care quality but often doesn’t save money due to: - High implementation costs - Low intervention effectiveness even with good prediction - False positives consuming resources

Part 9: The Future of Hospital AI

Promising Emerging Applications

1. Natural Language Processing for Clinical Notes - Auto-complete discharge summaries - Extract relevant information for handoffs - Identify documentation gaps - Status: Experimental, some vendor pilots

2. Computer Vision for Patient Monitoring - Fall detection from room cameras - Delirium assessment from facial expressions/movement - Pressure ulcer risk from posture analysis - Status: Investigational, privacy concerns

3. Reinforcement Learning for Treatment Optimization - Optimal fluid management in sepsis - Mechanical ventilator weaning protocols - Antibiotic stewardship decision support - Status: Research phase, not ready for clinical deployment

4. LLM Integration - ChatGPT-style interfaces for clinical questions - Automated medical necessity documentation - Patient education materials generation - Status: Active area of vendor development, see Large Language Models in Clinical Practice

What’s Not Coming (Despite the Hype)

Fully autonomous hospital AI. Too many uncontrolled variables, too much liability.

AI replacing hospitalists. Hospital medicine requires nuanced clinical judgment, patient communication, care coordination.

Perfect readmission prediction. Social determinants and patient behavior unpredictable.

Zero alert fatigue. Adding AI without removing low-value traditional alerts just shifts the problem.

Professional Society Guidelines on AI in Internal Medicine

ACP Position on AI in Healthcare (2024)

The American College of Physicians published “Artificial Intelligence in the Provision of Health Care” in Annals of Internal Medicine (June 2024), outlining 10 recommendations (Daneshvar et al., 2024):

The 10 Recommendations:

Complement, not supplant: AI-enabled technologies should complement and not supplant the logic and decision-making of physicians. A physician’s training and observations must remain the main focus of patient care.
Align with medical ethics: Development, testing, and use of AI must serve to enhance patient care, clinical decision making, the patient-physician relationship, and health care equity and justice.
Transparency in development and use: ACP reaffirms its call for transparency in the development, testing, and use of AI to promote trust in the patient-physician relationship.
Patient and clinician awareness: Patients, physicians, and other clinicians should be made aware, when possible, that AI tools are being used in medical treatment and decision making.
Privacy and confidentiality: AI developers, implementers, and researchers should prioritize the privacy and confidentiality of patient and clinician data.
Clinical safety and health equity: Clinical safety, effectiveness, and health equity must be top priorities for developers, implementers, researchers, and regulators.
Reduce disparities: AI should reduce rather than exacerbate disparities. ACP calls for AI model development data to include data from diverse populations.
Research on disparate effects: Congress, HHS, and other entities should support research to identify any disparate or discriminatory effects of AI systems.
Developer accountability: Developers of AI must be accountable for the performance of their models.
Coordinated federal strategy: There should be a coordinated federal AI strategy built upon a unified governance framework.

ACP AI Resource Hub:

ACP maintains an AI Resource Hub with curated resources including:

Generative AI for Internal Medicine Physicians: Self-paced primer covering LLM capabilities, terminology, and clinical use cases
AI-Powered Patient Simulation Tools: Practice motivational interviewing with virtual patients (alcohol use, obesity management)
DynaMedex with Dyna AI: Clinical decision support with AI-surfaced, evidence-based information (free for ACP members)
Annals of Internal Medicine AI Publications: Including the comprehensive “Large Language Models in Medicine: The Potentials and Pitfalls” narrative review (Omiye et al., 2024)

Society of General Internal Medicine (SGIM)

SGIM published the first formal AI position statement specifically for general internists: “Recommendations for Clinicians, Technologists, and Healthcare Organizations on the Use of Generative Artificial Intelligence in Medicine” (Crowe et al., J Gen Intern Med, 2025). Approved by SGIM Council April 2024, published 2025. Covers three domains: clinical decision-making, health systems optimization, and the patient-physician relationship. Complements the ACP 10-recommendation framework by focusing specifically on generative AI.

Society of Hospital Medicine (SHM)

SHM addresses AI through conference programming, educational sessions, and discussions in Journal of Hospital Medicine. While SHM has not issued a formal position statement on AI, the society has engaged with these topics through:

Conference sessions on AI applications in hospital medicine
Educational content on clinical decision support systems
Journal of Hospital Medicine publications on deterioration prediction and early warning systems
Discussions of workflow integration challenges for hospitalists

Practical focus: SHM’s coverage emphasizes that AI tools must integrate with existing EHR workflows without adding to the alert burden hospitalists already manage.

American Society of Nephrology (ASN)

The ASN AI Workgroup published “Responsible Use of Artificial Intelligence to Improve Kidney Care” in JASN (November 2025), the first nephrology society framework for AI use (Tangri et al., 2025). Core principles: patient benefit priority, mandatory clinician oversight, equity, and transparency across CKD, AKI, dialysis, and transplantation applications. See Nephrology and CKD AI for clinical details.

AMA Principles for Augmented Intelligence (Endorsed by Multiple Societies)

The American Medical Association’s “Principles for Augmented Intelligence Development, Deployment, and Use” (2023) has been endorsed by multiple internal medicine societies. Key principles:

AI should be designed to enhance physician decision-making
Transparency in AI development and validation
Physician authority over AI recommendations
Protection of patient data and privacy
Mitigation of algorithmic bias

Key Takeaways

Start with well-validated tools. Epic Deterioration Index and HOSPITAL score have the most evidence.
Demand local validation. External validation studies consistently show performance drops at new institutions.
Have clear response protocols. AI predictions worthless without clinical action plans.
Monitor for alert fatigue. Track override rates, response times, clinician satisfaction.
Be skeptical of autonomous recommendations. Treatment decisions require physician oversight.
Understand the liability landscape. You remain responsible regardless of AI recommendations.
Focus on implementation, not just accuracy. Workflow integration matters more than C-statistic improvements.
Expect model drift. Hospital populations change, requiring periodic retraining.
Learn from sepsis model failures. Prospective validation prevents harm.
Cost-benefit is often unfavorable. AI may improve quality without reducing costs.
Automated insulin delivery is maturing rapidly. Omnipod 5 and DBLG2 approvals (late 2025) bring lower glucose targets and reduced meal announcement requirements. Follow ADA Standards of Care for technology guidance.
Point-of-care DR screening closes a real gap. Only 50–70% of diabetic patients get recommended annual eye exams; autonomous AI screening at the primary care visit removes the referral barrier.
Nephrology AI is early-stage but advancing. No FDA-cleared tools exist for the U.S. market yet, but the ASN AI Workgroup framework and Kidney Klinrisk CE Mark signal growing institutional momentum.

Clinical Scenario: Evaluating a Deterioration Prediction Tool

Your hospital is considering deploying WAVE Clinical Platform for early deterioration detection.

Questions to ask vendor:

What is the sensitivity/specificity at your recommended threshold?
Can we run a silent pilot for prospective validation at our institution?
What is the false positive rate, and how will we manage alert fatigue?
How does the model integrate with our Epic EHR?
Who is responsible for responding to alerts: RRT, primary team, or both?
What training is provided for frontline staff?
How often is the model retrained, and with what data?
What is the total cost of ownership (license + hardware + monitoring)?
What is the FDA clearance status?
Can you provide references from similar hospitals?

Red flags: - Vendor refuses prospective validation - Can’t provide false positive rates from comparable institutions - No clear workflow integration plan - Can’t adjust alert thresholds for your population - “Trust us, it works everywhere” attitude

Check Your Understanding

Clinical Scenario 1: Your hospital deploys Epic Deterioration Index. After 3 months, you notice nurses frequently ignore the alerts. What should you investigate?

Answer:

This is classic alert fatigue. Investigate:

False positive rate: What percentage of high-risk alerts led to actual deterioration? If >70% are false, threshold may be too sensitive.
Alert volume: How many alerts per nurse per shift? If >15-20, likely overwhelming.
Response burden: Does every alert require RRT activation, or is there a tiered response?
Competing alerts: How many total alerts are nurses managing? EDI may be adding to existing burden.
Training adequacy: Do nurses understand what the score means and when to escalate?

Solutions: - Adjust alert threshold higher (fewer alerts, higher specificity) - Implement tiered response (high-risk → RRT, medium-risk → enhanced monitoring) - Remove low-value traditional alerts to make room for AI alerts - Provide refresher training on clinical significance

Bottom line: The most accurate model is worthless if frontline staff ignore it due to alert fatigue.

Clinical Scenario 2: A 78-year-old with CHF, COPD, CKD, and diabetes is flagged as high readmission risk (85th percentile). What interventions are evidence-based?

Answer:

Despite high predicted risk, evidence-based interventions to reduce readmissions are limited:

Definitely do: - Medication reconciliation (prevents ADEs) - Schedule follow-up within 7 days (reduces ED visits) - Ensure patient has primary care physician - Optimize chronic disease management before discharge - Assess and address health literacy / social barriers

Probably helpful: - Assign to care transitions nurse for post-discharge phone call - Refer to disease management programs (CHF clinic, diabetes educator) - Consider CardioMEMS if advanced HF (NYHA III-IV)

Not evidence-based: - Delay discharge solely due to high readmission risk - Generic “readmission prevention program” without individualization - Prophylactic antibiotics or other medications - Mandatory home health (unless clinically indicated)

Key insight: Prediction is good, but interventions remain limited. Focus on addressing modifiable risk factors specific to this patient.

Clinical Scenario 3: Epic sepsis model alerts you about possible sepsis in a postoperative patient with borderline tachycardia (HR 105) and normal lactate. Patient looks well, is eating, and has no source of infection. What do you do?

Answer:

This is likely a false positive alert. Appropriate response:

Perform clinical assessment: Does patient meet SIRS criteria? Is there a suspected source of infection?
Context matters: Postoperative tachycardia is common and often not sepsis.
Don’t reflexively start sepsis bundle: IV fluids, broad-spectrum antibiotics not indicated without clinical suspicion.
Document your reasoning: “Epic sepsis alert reviewed. Patient does not meet clinical criteria for sepsis. HR likely postoperative/pain-related. Will monitor. Discussed with attending.”
Provide feedback: Report false positive to informatics team to adjust model threshold.

What NOT to do: - Ignore alert without assessment - Start antibiotics “just to be safe” - Order excessive workup (blood cultures, imaging) without clinical indication

Lesson: AI alerts require clinical context. Blindly following alerts is as dangerous as blindly ignoring them.

Introduction: The Complexity Challenge

Part 1: Patient Deterioration Prediction

The Clinical Problem

Epic Deterioration Index (EDI)

Alternative Early Warning Systems

Implementation Framework for Deterioration Prediction

Part 2: Readmission Risk Prediction

The Clinical Problem

Evidence for AI-Enhanced Readmission Prediction

Practical Use of Readmission Models

Part 3: Chronic Disease Management

Diabetes Management AI

Heart Failure Remote Monitoring

COPD Exacerbation Prediction

Thyroid Nodule AI

Nephrology and CKD AI

Part 4: Medication Safety and Management

AI-Enhanced Drug-Drug Interaction (DDI) Checking

Deprescribing Recommendations

Personalized Dosing

Part 5: Diagnostic Decision Support

Differential Diagnosis Generation

Lab Result Interpretation AI

Part 6: Outpatient and Ambulatory AI

The Ambulatory AI Landscape

AI for Chronic Disease Self-Management

Remote Patient Monitoring: Beyond Implanted Devices

Why Consumer RPM Often Fails

Practical Implementation for Ambulatory Internists

Part 7: Implementation Challenges Specific to Hospital Medicine

Hospitalist AI Use Is Outpacing Governance

EHR Integration Complexity

The Alert Fatigue Crisis

Handoff and Team Communication

Liability for AI-Assisted Decisions

Part 8: Cost-Benefit Analysis

Does Hospital AI Save Money?

Part 9: The Future of Hospital AI

Promising Emerging Applications

What’s Not Coming (Despite the Hype)

Professional Society Guidelines on AI in Internal Medicine

Society of General Internal Medicine (SGIM)

Society of Hospital Medicine (SHM)

American Society of Nephrology (ASN)

AMA Principles for Augmented Intelligence (Endorsed by Multiple Societies)

Key Takeaways

Clinical Scenario: Evaluating a Deterioration Prediction Tool

Further Reading

Check Your Understanding