Healthcare Policy and AI Governance

The FDA has authorized over 1,300 AI medical devices, approximately 96% via the 510(k) substantial-equivalence pathway, typically without new prospective clinical trials (FDA AI-Enabled Medical Devices). Epic’s widely deployed sepsis prediction model claimed 85% sensitivity in internal validation, but external testing revealed actual sensitivity of 33%, missing two-thirds of sepsis cases. Regulatory frameworks built for static medical devices struggle to govern AI systems that evolve through continuous learning. The EU AI Act classifies most clinical AI as high-risk requiring rigorous oversight, while U.S. regulation remains permissive. This divergence shapes what AI physicians can access and who bears liability when algorithms fail.

Learning Objectives

After reading this chapter, you will be able to:

  • Understand the evolving FDA regulatory framework for AI/ML-based medical devices
  • Evaluate international regulatory approaches (EU AI Act, WHO guidelines)
  • Recognize reimbursement challenges and evolving payment models for AI
  • Assess institutional governance frameworks for safe AI deployment
  • Navigate liability, accountability, and legal frameworks for medical AI
  • Implement hospital-level AI governance policies

The Regulatory Challenge:

Traditional medical device regulation assumes static products. A pacemaker cleared by FDA in 2020 is the same device in 2025. AI systems challenge this fundamentally: they learn, adapt, and evolve through retraining and algorithm updates. Regulators worldwide are adapting 20th-century frameworks to 21st-century learning systems.

Key Regulatory Developments:

  • FDA (U.S.): Over 1,300 AI/ML devices authorized as of late 2025, with approximately 96% cleared via the less rigorous 510(k) pathway (FDA AI-Enabled Medical Devices). The FDA’s Predetermined Change Control Plan (PCCP) framework allows pre-approved algorithm updates (FDA PCCP Guidance, 2024)
  • EU AI Act (2024): First comprehensive AI regulation globally. Medical AI classified as “high-risk” requiring transparency, human oversight, robustness testing, and bias audits (EU AI Act)
  • WHO: Published global guidance on AI ethics (2021, 2024), but guidelines are aspirational, not binding (WHO Ethics and Governance of AI for Health, 2021)
  • Reimbursement as forcing function: EHRs scaled after Meaningful Use ($27B in incentives). Telehealth scaled after CMS reimbursed it. Clinical AI will follow the same pattern or won’t scale. Only two AI applications exceed 10,000 CPT claims despite 1,300+ FDA clearances (Wu et al., NEJM AI, 2024). IDx-DR (CPT 92229) is a rare Medicare-covered exception
  • Operational AI economics: Administrative activities consume an estimated $950 billion annually in U.S. healthcare. NBER estimates AI adoption could save $200–360 billion annually, but achieving returns requires redesigning workflows around AI, not retrofitting AI into existing processes (Sahni et al., NBER, 2023)
  • 2025 Federal Policy Shift: Executive Order 14179 (January 2025) rescinded Biden-era AI oversight frameworks, prioritizing innovation over precaution. December 2025 EO established federal preemption of state AI laws. Federal regulatory appetite for new AI requirements has decreased; institutional governance becomes more critical.

What Works and What Doesn’t:

The Epic sepsis model case illustrates regulatory failures: despite FDA 510(k) clearance, external validation at Michigan Medicine found sensitivity of only 33% (not 85% claimed), with 67% of sepsis cases never triggering an alert (Wong et al., JAMA Intern Med, 2021). FDA issued no recall or enforcement action.

The Clinical Bottom Line:

  • FDA clearance ≠ clinical validation. Demand prospective, external validation before adopting AI
  • If your institution lacks a multidisciplinary AI oversight committee, advocate for one
  • “AI told me to” is not a valid malpractice defense. Document AI recommendations and your rationale for following or overriding them

Essential Reading:

Introduction

Medicine operates within complex regulatory and policy frameworks: FDA device approvals, CMS reimbursement decisions, state medical board oversight, institutional protocols, and malpractice liability standards. These structures emerged over decades to protect patients from unsafe drugs, devices, and practices. They assume products are static: a drug approved in 2020 is chemically identical in 2025.

AI challenges this assumption fundamentally. Machine learning systems evolve through retraining on new data, algorithm updates, and performance drift as patient populations change. How should regulators approve systems that change continuously? Who is liable when AI errs: developers who built it, hospitals that deployed it, or physicians who followed its recommendations?

The stakes are high:

  • Patient safety: Poorly regulated AI can harm thousands before problems are detected
  • Innovation: Over-regulation may stifle beneficial AI development
  • Equity: Biased regulatory frameworks may entrench disparities
  • Legal liability: Unclear accountability creates defensive medicine

Part 1: The Major Policy Failure: Epic Sepsis Model

The Case Study

What was promised: Epic’s sepsis prediction model (embedded in EHR) would detect sepsis 6-12 hours before clinical recognition. Vendor claimed 85% sensitivity based on internal validation. FDA cleared the model via 510(k) pathway.

What happened:

In 2021, Wong et al. published an external validation study in JAMA Internal Medicine testing the Epic sepsis model on 27,697 patients at Michigan Medicine (Wong et al., 2021):

  • Sensitivity: 33% (not 85% claimed)
  • 67% of sepsis cases never triggered an alert at any point
  • Positive predictive value: 12% (88% false positive rate among alerts)
  • Area under the curve: 0.63 (poor discrimination)

Why FDA clearance failed to prevent this:

  1. Retrospective validation only: Epic’s 510(k) submission was based on retrospective chart review, not prospective deployment
  2. Look-ahead bias: Training data included labs and vitals ordered after clinicians suspected sepsis, so the model learned to detect suspicion, not actual sepsis
  3. No external validation requirement: FDA did not mandate testing at independent hospitals before clearance
  4. 510(k) predicate pathway: Model cleared as “substantially equivalent” to existing decision support without requiring clinical trial evidence

Regulatory response: FDA issued no recall, no warning letter, no enforcement action. The model remains FDA-cleared.

Lessons:

  • FDA clearance ≠ clinical validation
  • Retrospective studies mislead due to look-ahead bias and confounding
  • External validation at independent institutions is essential
  • Post-market surveillance is inadequate

Part 2: FDA Regulation of AI/ML Medical Devices

Regulatory Pathways

510(k) Clearance (Substantial Equivalence):

  • Device is “substantially equivalent” to a predicate device already on market
  • Fastest, least burdensome pathway (median 151 days review time) (MDPI Biomedicines, 2024)
  • 97% of AI devices cleared via 510(k) pathway (2024) (MedTech Dive, 2024)

Premarket Approval (PMA):

  • Rigorous review requiring clinical trials demonstrating safety and effectiveness
  • Reserved for high-risk devices (median 372 days for De Novo pathway)
  • Example: IDx-DR diabetic retinopathy screening, the first autonomous AI diagnostic, received De Novo authorization in April 2018 (FDA De Novo Decision Summary)

De Novo Classification:

  • New device type with no predicate
  • Establishes new regulatory pathway for similar future devices

Current State (2024)

By the numbers:

Examples of FDA-cleared AI:

Category Examples
Radiology CAD Intracranial hemorrhage (Aidoc, Viz.ai), pulmonary embolism, lung nodules
Cardiology ECG AFib detection (Apple Watch, AliveCor), echocardiogram EF estimation
Ophthalmology IDx-DR/LumineticsCore diabetic retinopathy screening
Clinical Decision Support Sepsis prediction, deterioration algorithms

Predetermined Change Control Plans (PCCP)

Traditional devices are “locked” after approval. The FDA’s PCCP framework addresses this for AI systems that need continuous updates (FDA PCCP Guidance, 2024):

What PCCP allows:

  • Manufacturer specifies anticipated changes (retraining, performance improvements)
  • FDA reviews and approves plan upfront
  • Specified changes proceed without new submissions

Components required:

  1. Description of modifications: Itemization of proposed changes with justifications
  2. Modification protocol: Methods for developing, validating, and implementing changes
  3. Impact assessment: Benefits, risks, and mitigations

Final guidance issued December 2024 broadened scope to all AI-enabled devices, not just ML-enabled devices.

General Wellness Products: What Escapes FDA Oversight

Not all health-related software and devices require FDA clearance. The FDA’s General Wellness guidance (updated January 2026) defines products that fall outside medical device regulation entirely (FDA General Wellness Guidance, 2026).

The two-factor test:

A product qualifies as a general wellness product (not a medical device) if it meets BOTH criteria:

  1. Intended for general wellness use only: Claims relate to maintaining or encouraging a healthy lifestyle (weight management, physical fitness, relaxation, sleep management, mental acuity) OR relate healthy lifestyle choices to reducing risk of chronic diseases where this association is well-established
  2. Low risk: Not invasive, not implanted, does not involve technology posing safety risks without regulatory controls (lasers, radiation)

January 2026 update on physiologic sensing:

The updated guidance explicitly addresses non-invasive optical sensing for physiologic parameters, directly relevant to consumer wearables:

Products using optical sensing (photoplethysmography) to estimate blood pressure, oxygen saturation, blood glucose, or heart rate variability may qualify as general wellness products when outputs are intended solely for wellness uses, provided they:

  • Are non-invasive and not implanted
  • Are not intended for diagnosis, treatment, or management of disease
  • Do not claim clinical equivalence to FDA-cleared devices
  • Do not prompt specific clinical actions or medical management
  • Do not include clinical thresholds or diagnostic alerts
  • Have validated values if displaying physiologic measurements

What makes a product NOT a wellness device:

Characteristic Example Regulatory Status
Disease diagnosis claims “Detects atrial fibrillation” Medical device, requires FDA clearance
Treatment guidance “Adjust insulin based on glucose reading” Medical device
Clinical equivalence claims “Medical-grade blood pressure” Medical device
Diagnostic thresholds/alerts “Heart rate dangerous, seek care now” Medical device
Invasive measurement Microneedle glucose sensor Medical device (even if wellness claims)

Illustrative examples from FDA guidance:

Product Wellness (No FDA) Medical Device (FDA Required)
Wrist-worn activity tracker with heart rate, sleep, blood pressure for “recovery assessment” Yes (if validated values, no disease claims)
Same device claiming to “monitor hypertension” Yes (disease management claim)
Pulse oximeter for “monitoring during hiking” Yes
Pulse oximeter for “detecting hypoxemia” Yes (diagnostic claim)
App playing music for “relaxation and stress management” Yes
App claiming to “treat anxiety disorder” Yes (treatment claim)

What this means for physicians:

Consumer wearables operating under wellness exemptions have no FDA validation requirements. When patients present data from Oura, Whoop, Apple Watch (non-FDA features), or similar devices:

  • Treat outputs as informational, not diagnostic. These devices may display blood pressure, SpO2, or glucose estimates without demonstrating clinical accuracy
  • Validation status varies by feature. Apple Watch ECG and irregular rhythm notification ARE FDA-cleared; estimated blood pressure is NOT
  • “Validated values” requirement is self-certified. The guidance requires manufacturers to validate physiologic values, but FDA does not review this validation for wellness products
  • Marketing language matters. The same hardware can be a wellness product or medical device depending on how it’s marketed and what claims are made

January 2026 Clinical Decision Support Software Changes

On January 6, 2026, the FDA announced significant changes to its Clinical Decision Support (CDS) software guidance, substantially loosening oversight for AI tools that provide diagnostic or treatment recommendations (FDA CDS Guidance, 2026).

Key policy changes:

Change Previous Policy January 2026 Policy
Single recommendation Software providing one recommendation = medical device Exempt if recommendation is the only “clinically appropriate” option
Time-critical decisions Automatic exclusion from CDS exemption Repositioned as factor, not automatic trigger
SaMD Clinical Evaluation Guidance document in effect Withdrawn January 7, 2026

What this means practically:

Generative AI tools providing diagnostic suggestions or performing history-taking may now reach clinics without FDA review if they meet exemption criteria under the updated guidance. The FDA explicitly stated that software “simply providing information like ChatGPT or Google” would not require FDA regulation.

“Clinically appropriate” is undefined. The FDA declined to define what counts as “clinically appropriate,” leaving manufacturers to determine when a single recommendation is justified. This creates room for aggressive interpretation driven by commercial pressure (Epstein Becker Green analysis).

Expert concerns:

  • Authority shift: “The risk is not that AI replaces clinicians outright, but that authority subtly shifts, with recommendations acquiring an aura of objectivity that exceeds their evidentiary foundation” (KevinMD analysis)
  • Cognitive offloading: Time-pressed physicians may not review AI logic, particularly when outputs appear reasonable
  • Validation gap: Withdrawal of SaMD Clinical Evaluation guidance creates uncertainty on how to validate AI systems

What this means for physicians:

The January 2026 changes increase the importance of institutional governance and independent clinical validation. AI tools reaching your practice may not have undergone FDA safety review. This shifts responsibility for validation to health systems and individual physicians. The guidance explicitly preserves FDA authority over software that “substitutes for clinical judgment” or analyzes medical images for diagnostic recommendations, but the line between “providing information” and “substituting for judgment” remains unclear.

Challenges and Needed Reforms

Problem Evidence Needed Reform
No prospective validation required Epic sepsis model cleared with retrospective data, failed prospectively Mandate prospective validation for high-risk AI
Inadequate post-market surveillance FDA relies on voluntary adverse event reporting Require quarterly performance reports
Generalizability not assessed AI approved on one population may fail in others Require demographic subgroup analysis
Transparency vs. trade secrets Physicians cannot validate black-box AI Mandate disclosure of training data demographics

2025 Federal AI Policy Shift

The federal approach to AI regulation changed significantly in 2025. On January 20, 2025, President Trump rescinded Executive Order 14110 (“Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”), the Biden administration’s framework emphasizing AI safety and risk mitigation (Federal Register, 2025). Three days later, Executive Order 14179 (“Removing Barriers to American Leadership in Artificial Intelligence”) established a deregulatory framework prioritizing innovation and global competitiveness over precautionary oversight.

The July 2025 America’s AI Action Plan proposes regulatory sandboxes where FDA and other agencies could allow rapid deployment and testing of AI tools with streamlined oversight. On December 11, 2025, a subsequent Executive Order (“Ensuring a National Policy Framework for Artificial Intelligence”) established federal preemption of state AI laws, directing the Attorney General to challenge state regulations that conflict with federal policy (White House, December 2025).

What this means for physicians: The direction of federal policy favors faster AI deployment with fewer pre-market requirements. State-level AI protections (like Colorado’s AI Act) face federal preemption challenges. This increases the importance of institutional governance and independent clinical validation, since federal regulatory scrutiny may decrease. The FDA’s existing medical device framework remains in place, but the broader policy environment signals reduced appetite for new AI-specific oversight requirements.

State Regulatory Sandboxes

While federal policy shifts toward deregulation, several states have created “regulatory sandboxes” for healthcare AI, enabling controlled testing of innovations that would otherwise face regulatory barriers. These programs provide temporary relief from specific state requirements while maintaining safety monitoring.

Utah: First State-Approved AI Prescribing Pilot

In January 2026, Utah became the first state to approve autonomous AI participation in prescription decision-making. The Utah Department of Commerce’s Office of Artificial Intelligence Policy (established 2024) authorized a partnership with Doctronic, an AI health platform, to handle routine prescription renewals for patients with chronic conditions (Utah Department of Commerce, January 2026).

How the pilot works:

Component Details
Scope Routine refills for 190 chronic condition medications
Exclusions Pain management, ADHD medications, injectables
Safety threshold First 250 prescriptions per medication class require physician review before AI operates independently
Override authority Physicians retain ability to override all AI decisions
Reported concordance 99.2% agreement between AI treatment plans and physician decisions in testing (Deseret News, January 2026)
Cost $4 per renewal initially
Duration 12-month demonstration agreement

Rationale: Medication non-adherence costs an estimated $100-289 billion annually in avoidable U.S. healthcare spending (Cutler et al., BMJ Open, 2018). Approximately 78% of prescription activity involves routine refills rather than new prescriptions (Optum, 2017). Administrative delays in renewals contribute to gaps in medication adherence, particularly for chronic conditions requiring continuous therapy.

Professional response: The American Medical Association has expressed caution. AMA CEO Dr. John Whyte stated that “without physician input [AI] also poses serious risks to patients and physicians alike” (Becker’s Hospital Review, January 2026).

States with AI Regulatory Sandbox Programs:

State Status Key Features
Utah Operational (2024) Office of AI Policy with regulatory mitigation authority; healthcare AI pilots including mental health (ElizaChat), dental (Dentacor), prescription renewals (Doctronic)
Texas Enacted (2024) 36-month testing periods; quarterly reporting; AG prohibited from prosecuting participants for waived regulations (Texas DIR)
Delaware Enacted (2025) Focus on biotech, healthcare, corporate governance; supervised testing environment
Arizona Operational (2019) Original fintech sandbox expanded to include AI applications
Wyoming Developing Legislation under review for AI sandbox with temporary regulatory exemptions

Federal sandbox proposals: Senator Ted Cruz (R-TX) introduced the SANDBOX Act (September 2025), which would mandate OSTP to create a federal regulatory sandbox program allowing companies to request waivers from federal regulations for up to 10 years (Fierce Healthcare, 2025).

What this means for physicians: State sandboxes create variation in what AI systems can legally do across jurisdictions. A system operating autonomously in Utah may require physician oversight in other states. Physicians practicing in sandbox states should understand the specific regulatory relief granted, the safety monitoring requirements, and their own liability exposure when AI operates with reduced oversight.


Part 3: International Regulatory Approaches

EU AI Act (2024)

The EU AI Act is the world’s first comprehensive AI regulation, entering into force August 1, 2024 (European Parliament, 2024).

Risk-based categorization:

Risk Level Requirements Medical AI Examples
Unacceptable (banned) Social scoring, subliminal manipulation Not applicable to medical AI
High risk Strict obligations All medical AI for diagnosis, treatment, or triage
Limited risk Transparency requirements Medical chatbots, symptom checkers
Minimal risk No specific obligations Not applicable to medical AI

High-risk medical AI requirements (npj Digital Medicine, 2024):

  1. Transparency: Disclose training data sources, demographics, limitations
  2. Human oversight: Physicians must retain decision authority and override
  3. Robustness testing: Independent validation across diverse populations
  4. Bias audits: Performance stratified by demographics
  5. Post-market monitoring: Continuous performance tracking, adverse event reporting within 15 days

Compliance timeline: Medical devices qualifying as high-risk AI systems have until August 2, 2027 for full compliance.

Impact: Estimated compliance cost of €500K-€2M per AI system. Small startups may struggle, potentially consolidating market toward large companies.

WHO Guidelines (2021, 2024)

WHO published Ethics and Governance of Artificial Intelligence for Health in June 2021 with six principles (WHO, 2021):

  1. Protect human autonomy: Patients and providers maintain decision-making authority
  2. Promote human well-being and safety: AI must benefit patients, minimize harm
  3. Ensure transparency and explainability: Stakeholders understand AI logic and limitations
  4. Foster responsibility and accountability: Clear assignment of responsibility when AI errs
  5. Ensure inclusiveness and equity: AI accessible to diverse populations, mitigate bias
  6. Promote responsive and sustainable AI: Long-term monitoring, adaptation to changing contexts

In 2025, WHO published additional guidance on large multi-modal models (WHO, 2025), addressing risks specific to generative AI in healthcare:

  • Hallucinations: LMMs generate confident but false medical information
  • Outdated training data: Models trained on historical data produce obsolete recommendations
  • Bias amplification: Training data from high-income countries encodes perspectives that may not generalize globally
  • Liability gaps: The AI value chain (developer → provider → deployer) creates uncertainty about accountability when harm occurs

WHO’s 2025 guidance proposes liability frameworks including presumption of causality (shifting burden of proof to deployers), strict liability considerations, and no-fault compensation funds (see Liability chapter).

Limitation: WHO guidelines are aspirational, not enforceable. Countries adopt them voluntarily.

Other Regions

Region Approach Key Characteristics
Canada (Health Canada) Collaborative Developing adaptive licensing with FDA, UK MHRA
UK (MHRA) Innovation-friendly Post-Brexit independent framework
Japan (PMDA) Conservative Extensive clinical data required
China (NMPA) Rapid approval Data localization requirements limit international collaboration

International Governance and Multilateral Coordination

National regulations alone cannot govern AI systems that operate across borders. A foundation model developed in the U.S., fine-tuned in the EU, and deployed in hospitals across 50 countries presents governance challenges no single jurisdiction can address.

The WHO 2025 LMM guidance emphasizes the need for international coordination (WHO, 2025):

Networked multilateralism: Effective AI governance requires coordination across UN agencies, international financial institutions, regional organizations, civil society, and the private sector. No single body has authority over the global AI ecosystem.

Inclusive rule-making: AI governance must be shaped by all countries, not only high-income nations and the technology companies headquartered there. Rules developed without low- and middle-income country input risk encoding biases that harm those populations.

Cross-border accountability: Companies developing foundation models must be accountable regardless of where they are incorporated. Current frameworks allow regulatory arbitrage: locating operations in permissive jurisdictions while selling globally.

Current challenges:

Gap Description
No international AI treaty Unlike nuclear, chemical, or biological domains, no binding international agreement governs AI development or deployment
Voluntary commitments lack enforcement Corporate pledges on AI safety (e.g., Frontier AI Forum) have no accountability mechanisms
Regulatory arbitrage Companies can base operations in jurisdictions with minimal oversight
Fragmented standards No harmonized requirements for safety testing, transparency, or post-market surveillance

Emerging coordination mechanisms:

  • UN High-Level Advisory Body on AI: Recommendations published September 2024, but non-binding
  • G7 Hiroshima AI Process: Voluntary code of conduct for foundation model developers
  • OECD AI Principles: Adopted by 46 countries, but no enforcement mechanism
  • Bilateral agreements: U.S.-EU Trade and Technology Council addresses AI but lacks specificity on medical applications

What this means for physicians: AI systems you use may be developed, trained, and updated by entities outside any jurisdiction’s effective control. Institutional governance and vendor due diligence become critical when regulatory frameworks are fragmented.


Part 4: Reimbursement as the Adoption Forcing Function

The Historical Pattern: Government Incentives Drive Adoption

Regulatory approval is necessary but insufficient. Reimbursement drives clinical deployment. If payers do not cover AI, providers will not use it. The evidence for this comes from two major technology adoption cycles in U.S. healthcare.

Meaningful Use and EHR Adoption

Before 2009, electronic health record adoption was minimal. The Health Information Technology for Economic and Clinical Health (HITECH) Act allocated approximately $27-35 billion in Medicare and Medicaid incentive payments to drive EHR adoption (Blumenthal, NEJM, 2011). The program offered eligible physicians up to $44,000 (Medicare) or $63,750 (Medicaid) in incentives, with penalties of 1-5% Medicare payment reductions for non-adopters beginning in 2015 (Marcotte et al., Arch Intern Med, 2012).

The results were unambiguous. Annual EHR adoption rates among eligible hospitals increased from 3.2% pre-HITECH (2008-2010) to 14.2% post-HITECH (2011-2015), a difference-in-differences of 7.9 percentage points compared to ineligible hospitals (Adler-Milstein & Jha, Health Affairs, 2017). By 2017, 86% of office-based physicians and 96% of non-federal acute care hospitals had adopted EHRs (AHA News, 2017).

Telehealth and CMS Reimbursement

Telehealth followed the same pattern. Before COVID-19, Medicare telehealth coverage was restricted to rural areas for specific services. When CMS waived geographic restrictions and reimbursed telehealth at parity with in-person visits during the public health emergency, adoption exploded. Many flexibilities have since been made permanent: behavioral health telehealth parity, FQHC and RHC authorization, and removal of frequency limits for inpatient and nursing facility visits (HHS Telehealth Policy, 2025).

Why Government Must Lead

Private payers follow Medicare’s lead. Research demonstrates that a $1.00 increase in Medicare fees increases corresponding private prices by $1.16 (Clemens & Gottlieb, J Polit Econ, 2017). Most commercial insurers benchmark payment levels to CMS’s Resource-Based Relative Value Scale (RBRVS). Approximately 50% of private payer coverage decisions align with Medicare national coverage determinations (JR Associates, 2024).

This creates a coordination problem. No individual commercial payer wants to be first to cover clinical AI without evidence of adoption elsewhere. Medicare, as the largest payer covering 65 million beneficiaries, provides the market signal that unlocks private investment.

The Capital and Talent Argument

Without clear reimbursement pathways, capital and talent flow elsewhere. Healthcare AI attracted $18 billion in venture investment in 2025, accounting for 46% of all healthcare investment (Silicon Valley Bank, 2026). Yet AI-enabled startups captured 62% of VC dollars because investors prioritize companies targeting operational efficiency over clinical decision support (Rock Health, 2025). The reason: operational AI has clearer revenue models. Clinical AI faces reimbursement uncertainty.

The consequence is predictable. Top engineering talent that could work on clinical AI instead joins companies offering equity in firms with clear paths to revenue. As one industry observer noted: “For founders, payers are the new patients.”

Current Landscape: Adoption Remains Nascent

A 2024 analysis of 11 billion CPT claims (2018-2023) found that only two AI applications have exceeded 10,000 claims: coronary artery disease assessment (67,306 claims via CPT 0501T-0504T) and diabetic retinopathy screening (15,097 claims via CPT 92229) (Wu et al., NEJM AI, 2024). Despite over 1,300 FDA-cleared AI devices, clinical utilization remains concentrated in affluent, metropolitan, academic medical centers.

AI Category Reimbursement Status CPT Code(s)
Diabetic retinopathy (IDx-DR/LumineticsCore) Medicare coverage, CPT 92229 National rate $45.36; private median $127.81
Coronary artery disease (HeartFlow) Category III codes 0501T-0504T
Radiology AI (most) No separate payment; cost absorbed into radiologist fee None
Clinical decision support No payment; hospital self-funds None
Ambient documentation Physician subscription ($1,000-1,500/month) None

Success Story: IDx-DR Diabetic Retinopathy Screening

IDx-DR (now LumineticsCore) represents the rare AI reimbursement success (FDA De Novo, 2018; CMS, 2022):

Milestone Details
FDA authorization April 2018, De Novo pathway (first autonomous AI diagnostic)
CPT code 92229 established 2021 (imaging with AI interpretation without physician review)
Medicare coverage Yes, with CMS establishing national payment in 2022
Payment $45.36 (Medicare); $127.81 median (private)

Why it succeeded:

  • Clear clinical benefit: Diabetic retinopathy screening reduces blindness
  • Solves access problem: Primary care can screen without ophthalmologist referral
  • Cost-effective: Cheaper than specialist visit
  • Prospective validation: Randomized trial evidence supported De Novo authorization

The Policy Gap: What Clinical AI Needs

Digital therapeutics and clinical AI still lack a formal Medicare benefit category. The pending Access to Prescription Digital Therapeutics Act (H.R.1458) would create a defined reimbursement class similar to pharmaceuticals (Odelle Technology, 2025). Until enacted, most clinical AI relies on Category III CPT codes (temporary, no guaranteed payment), bundled into existing procedure fees, or hospital self-funding.

Starting January 2026, CMS will formalize new AI and augmented intelligence codes for machine-assisted diagnosis, interpretation, and clinical decision support (Avalere Health, 2025). Whether these codes receive adequate valuation and coverage remains uncertain.

Emerging Payment Models

Fee-for-service does not incentivize AI adoption. Value-based models align incentives:

Value-based care contracts: Providers share risk with payers. AI that reduces hospitalizations and complications directly benefits providers financially.

Bundled payments: Single payment for entire episode of care. AI costs included in bundle; providers incentivized to use cost-effective AI that improves outcomes without separate line-item billing.

Outcomes-based contracts with vendors: Hospital pays vendor based on AI performance, not upfront license. Aligns incentives to reduce false positives and demonstrate clinical value.

The Path Forward

Clinical AI will follow the EHR and telehealth adoption pattern or it will not scale. The policy mechanisms that worked before can work again:

  1. Dedicated CPT codes with adequate valuation for validated clinical AI applications
  2. Medicare coverage decisions that establish precedent for private payers
  3. MACRA/APM integration that rewards AI-enabled quality improvement
  4. Penalty structures for non-adoption of proven AI (as Meaningful Use penalized EHR non-adopters)

The technology exists. The validation methodologies exist. The missing ingredient is the same one that was missing for EHRs in 2008 and telehealth in 2019: a clear signal from government payers that adoption will be rewarded.


Part 5: Operational AI and Healthcare Economics

The Untapped Opportunity

While regulatory and reimbursement discussions focus on clinical decision support, operational and administrative activities consume a larger share of healthcare spending. Workforce staffing, care coordination, billing, claims processing, scheduling, and customer service contributed an estimated $950 billion in U.S. healthcare costs in 2019 (Sahni et al., McKinsey, 2021). This represents approximately 25% of total healthcare spending.

The National Bureau of Economic Research estimates that wider AI adoption could generate savings of 5–10% of U.S. healthcare spending, approximately $200–360 billion annually in 2019 dollars (Sahni et al., NBER Working Paper 30857, 2023). These estimates focus on AI-enabled use cases using current technology, attainable within five years, that do not sacrifice quality or access.

Why operational AI attracts capital: As noted above, operational AI startups captured 62% of healthcare venture investment in 2025 because revenue models are clearer (Rock Health, 2025). Clinical AI faces reimbursement uncertainty. Operational AI can demonstrate ROI through reduced labor costs, improved throughput, and decreased waste without requiring payer coverage decisions.

The Productivity Paradox

Paradoxically, new clinical technologies have historically increased overall healthcare spending (Brynjolfsson et al., AEJ: Macroeconomics, 2021). This “productivity J-curve” describes a pattern where general purpose technologies initially reduce productivity before generating gains. The explanation: technology alone does not reduce costs. Organizations must redesign workflows, structures, and culture around the technology.

The implication for health systems: Rather than retrofitting AI into existing clinical workflows, effective implementation requires redesigning processes around AI capabilities. For example, follow-up appointment frequency is typically left to individual physician preference with high variability and little evidence guiding these decisions. AI-based risk stratification could reallocate appointment frequency based on patient need, substantially increasing capacity and reducing wait times. One study found that reducing follow-up frequency by a single visit per year could save $1.9 billion nationally (Ganguli et al., JAMA, 2015).

Such redesign requires institutional willingness to change practice patterns, not merely add AI to existing processes.

Cross-Industry Lessons

Other industries have adopted operational AI with documented returns on investment (Wong et al., npj Health Syst, 2026):

Sector Application Healthcare Parallel
Retail Inventory forecasting, demand prediction Hospital capacity forecasting, supply chain optimization
Aviation Predictive maintenance, weather disruption modeling Biomedical equipment maintenance, OR scheduling
Logistics Route optimization Emergency response, patient transport
Financial services Customer advisory chatbots, query resolution Patient communication, prior authorization

Key insight: These industries achieved returns by pairing technology investment with workflow redesign. UPS’s route optimization system reportedly saves $300–400 million annually on fuel costs, but required restructuring delivery operations around the algorithm’s recommendations.

Barriers to Operational AI in Healthcare

Several factors explain why healthcare has lagged other sectors in operational AI adoption:

  1. Fragmented data: Healthcare data is siloed across EHRs, claims systems, and departmental applications. AI requires integrated data access.
  2. Risk tolerance: Aviation and finance are safety-critical, yet healthcare is more risk-averse about algorithmic decision-making.
  3. Regulatory uncertainty: Most operational AI escapes FDA oversight, but institutions lack guidance on governance requirements.
  4. Workforce concerns: Staff may view operational AI as threatening rather than enabling.
  5. Misaligned incentives: Fee-for-service rewards volume, not efficiency. Value-based contracts better align incentives with operational improvement.

The Learning Health System Framework

Effective AI integration requires continuous evaluation, not one-time deployment. The learning health system model provides a framework for pairing operational goals with evidence generation (IOM, 2012):

  1. Identify operational gap (e.g., OR utilization, scheduling efficiency, documentation burden)
  2. Deploy AI intervention with prospective measurement plan
  3. Evaluate outcomes against pre-specified metrics
  4. Iterate or terminate based on evidence

This approach addresses a critical gap: a 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Many health systems lack the expertise or infrastructure to validate AI performance or assess investment value.

National collaboratives can help: The Coalition for Health AI (CHAI), the Health AI Partnership, and the AMA’s Center for Digital Health and AI enable resource-limited health systems to leverage peer expertise, troubleshoot common problems, and disseminate findings on operational AI tools.

Strategic Implications for Health Systems

Action Rationale
Tie AI initiatives to measurable value Avoid “productivity paradox” by requiring ROI demonstration before scaling
Redesign workflows around AI Retrofitting AI into existing processes yields minimal benefit
Integrate AI operations with research Learning health system model generates evidence while improving operations
Build distributed AI literacy Frontline staff must understand AI capabilities to identify opportunities
Start with operational AI Clearer ROI, less regulatory complexity than clinical decision support

Part 6: Institutional Governance

Why Hospital-Level Governance Matters

FDA clearance does not guarantee AI works at your hospital:

  • Different patient population, workflows, EHR
  • AI is cost-effective for YOUR budget
  • Physicians will use AI appropriately
  • Patients will not be harmed

Institutional governance fills gaps left by regulation.

Essential Governance Components

1. Clinical AI Governance Committee

Minimum composition:

  • Chair: CMIO or CMO
  • Physicians from specialties using AI
  • Chief Nursing Officer representative
  • CIO or IT director
  • Legal counsel with medical malpractice and AI expertise
  • Chief Quality/Patient Safety Officer
  • Health equity lead
  • Bioethicist
  • Patient advocate

Responsibilities: Pre-procurement review, pilot approval, deployment oversight, adverse event investigation, policy development, bias auditing.

2. Validation Before Deployment

Do not assume vendor validation generalizes to your hospital.

Phase Duration Purpose
Silent mode 2-4 weeks AI generates outputs not shown to clinicians; verify technical stability
Shadow mode 4-8 weeks AI outputs shown as “informational only”; gather physician feedback
Active pilot 3-6 months Limited deployment with pre-defined success criteria

Success criteria should include:

  • Technical: Sensitivity, specificity, PPV thresholds
  • Clinical: Primary outcome improvement vs. baseline
  • User: Physician satisfaction, response rate
  • Safety: Zero preventable patient harm
  • Equity: No performance disparities >10% across demographics

3. Bias Monitoring

Quarterly audits measuring AI performance across:

  • Race/ethnicity
  • Age
  • Sex
  • Insurance status
  • Language

If performance difference >10%: investigate, mitigate, or deactivate.

4. Vendor Contracts

Require:

  • Hospital retains ownership of patient data
  • Performance guarantees with termination rights if thresholds not met
  • Disclosure of training data demographics, validation studies, limitations
  • Vendor indemnification for AI errors and data breaches
  • HIPAA Business Associate Agreement

5. Value Alignment Assessment

Beyond technical performance and bias, institutions should assess what values AI systems embed. The RAISE consortium’s “Values In the Model” (VIM) framework proposes that AI systems disclose how they navigate value-laden trade-offs: intervention vs. conservative management, patient autonomy vs. paternalism, individual benefit vs. resource constraints (Goldberg et al., NEJM AI, 2026).

When evaluating AI systems, governance committees should ask:

  • What optimization target was this system trained on?
  • How does it handle scenarios where reasonable experts disagree?
  • Does behavior differ between fee-for-service and capitated contexts?

See Ethics chapter: Value Alignment Frameworks for detailed guidance on assessing embedded values.

Enterprise AI Lifecycle Frameworks

Beyond committee composition and validation protocols, institutions need structured frameworks for how AI solutions progress from concept to deployment to monitoring. Stanford Medicine’s experience provides a model.

RAIL (Responsible AI Lifecycle):

Stanford Health Care established the Responsible AI Lifecycle (RAIL) framework in 2023 to codify institutional workflows for AI solution development (Shah et al., 2026):

Stage Key Activities
Proposal Use case definition, risk tiering, stakeholder alignment
Development Model building, integration with clinical data, prompt engineering
Validation FURM assessment (see below), truth set curation, benchmark testing
Pilot Controlled deployment with defined success criteria
Monitoring System integrity, performance, and impact tracking

FURM (Fair Useful Reliable Models):

The FURM framework specifies required assessments before AI deployment:

  • Fair: Performance tested across demographic subgroups; disparities documented and mitigated
  • Useful: Clear clinical or operational benefit demonstrated; workflows redesigned for integration
  • Reliable: Consistent performance across settings; failure modes identified and documented

Why structured frameworks matter:

Most health systems lack the expertise or infrastructure to validate AI performance or assess investment value. A 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Structured frameworks convert ad hoc adoption decisions into systematic processes.

Implementation lesson from Stanford ChatEHR:

Stanford’s approach required embedding data science teams within IT organizations, providing direct access to personnel maintaining network security, EHR integrations, and cloud resources. This integration enabled what began as a sandbox (April 2023) to scale to health-system-wide deployment (September 2025) in approximately 2.5 years. The “build-from-within” strategy provides institutional agency: model-agnostic infrastructure that matches clinical tasks to appropriate LLMs, institutional data governance, and custom monitoring aligned to organizational priorities.

For institutions without Stanford’s resources:

National collaboratives provide pathways for resource-limited health systems:

  • Coalition for Health AI (CHAI): Responsible AI Guide (RAIG) with developer/implementer accountability frameworks (see CHAI section)
  • Health AI Partnership: Peer expertise sharing and troubleshooting
  • AMA Center for Digital Health and AI: Policy guidance and educational resources

Part 8: Policy Recommendations from Expert Bodies

AMA Principles for Augmented Intelligence (2018)

The American Medical Association prefers “augmented intelligence” to emphasize physician judgment is augmented, not replaced (AMA AI Principles, 2018).

Six principles:

  1. AI should augment, not replace, the physician-patient relationship
  2. AI must be developed and deployed with transparency
  3. AI must meet rigorous standards of effectiveness
  4. AI must mitigate bias and promote health equity
  5. AI must protect patient privacy and data security
  6. Physicians must be educated on AI

WHO Framework (2021)

Six principles: protect human autonomy, promote well-being and safety, ensure transparency, foster accountability, ensure equity, promote sustainability (WHO, 2021).

Key Professional Society Positions

Society Key Recommendations
American College of Radiology AI-LAB accreditation program for vendor transparency; validate AI at your institution before clinical use
American Heart Association Cardiovascular AI must be validated on populations where deployed
College of American Pathologists Pathologists must review all AI-flagged cases; no autonomous AI diagnosis

Coalition for Health AI (CHAI) and Joint Commission Partnership

The Coalition for Health AI (CHAI) represents a shift from principle-based guidance to use-case-specific implementation frameworks. In September 2025, CHAI partnered with the Joint Commission to release the first installment of practical guidance for responsible AI deployment (Joint Commission, 2025).

Why this matters: The Joint Commission accredits over 22,000 healthcare organizations. A voluntary AI certification program based on CHAI playbooks is planned, potentially creating de facto standards for institutional AI governance.

Use-Case-Specific Work Groups:

CHAI’s approach differs from generic AI principles by providing guidance tailored to specific clinical applications (CHAI Use Cases):

Use Case Focus
Clinical Decision Support (LLM + RAG) Scope definition, escalation rules for when AI defers to humans, continuous monitoring, evidence traceability
EHR Information Retrieval Grounding retrieved information, verification in real-world contexts, handling fragmented patient data
Prior Authorization Criteria Matching Explainability of match/non-match decisions, human review triggers, preventing “denial drift”
Direct-to-Consumer Health Chatbots Accessibility (5th-6th grade reading level, multilingual), error handling, authoritative source grounding, citations

Developer vs. Implementer Accountability:

The CHAI Responsible AI Guide (RAIG) distinguishes between “Developer Teams” (data scientists, engineers who build AI solutions) and “Implementer Teams” (providers, IT staff, leadership who deploy them). Each stage of the AI lifecycle specifies which team bears primary responsibility (CHAI RAIG):

Stage Developer Responsibility Implementer Responsibility
Define Problem & Plan Collaborate on technical feasibility Define business requirements, clinical context
Design Model architecture, training approach Workflow integration design
Engineer Build, train, validate solution Provide real-world data, clinical input
Assess Performance metrics, bias testing Local validation, population fit assessment
Pilot Technical support, iteration Controlled deployment, clinician feedback
Deploy & Monitor Ongoing maintenance, updates Adverse event tracking, governance reporting

Governance Structure Recommendations:

CHAI guidance emphasizes:

  • Written AI policies: Establish explicit governance with technically experienced leadership
  • Transparency to patients: Disclosures and educational tools about AI use
  • Data protection: Minimum necessary data principles, audit rights in vendor agreements
  • Quality monitoring: Regular validation, performance dashboards
  • Bias assessment: Audit whether AI was developed with datasets representative of served populations
  • Blinded reporting: Cross-institutional learning from AI-related events

Limitations to Acknowledge:

CHAI guidance is process-oriented but lacks quantitative thresholds. The guidance emphasizes “regularly monitor” and “regularly validate” without defining:

  • What performance floor triggers intervention?
  • What retrieval accuracy makes EHR summarization safe?
  • How many false denials cross from efficiency to patient harm?

Without quantitative thresholds, “continuous monitoring” risks becoming “monitor until something bad happens, then determine what the threshold should have been.” This gap is significant for institutions seeking actionable standards.

Resources:


Conclusion

AI regulation and policy are evolving rapidly. The frameworks designed for static products do not fit dynamic, learning systems that update continuously. Challenges include unclear evidence standards, insufficient post-market surveillance, reimbursement barriers, unsettled liability, and fragmented international regulations.

Key principles for physician-centered AI policy:

  1. Patient safety first: Prospective validation, external testing
  2. Evidence-based regulation: Demand prospective trials for high-risk AI
  3. Transparent accountability: Clear liability when AI errs
  4. Equity mandatory: Performance tested across demographics; biased AI not deployed
  5. Physician autonomy preserved: AI supports, never replaces judgment
  6. Reimbursement aligned with value: Pay for AI that improves outcomes

What physicians must do:

Individually: Demand evidence, validate AI locally, document AI use meticulously, report errors, maintain clinical independence.

Institutionally: Establish AI governance committees, implement bias audits, create accountability frameworks, provide training.

Professionally: Engage specialty societies, lobby for evidence-based regulation and reimbursement, publish validation studies.

The future of AI in medicine will be shaped by the choices made today: the regulations demanded, the reimbursement models advocated for, the governance structures built, and the standards held.