[Healthcare Policy and AI Governance]{.chapter-title}

doi:10.5281/zenodo.18251405

Healthcare Policy and AI Governance

The FDA has authorized over 1,300 AI medical devices, approximately 96% via the 510(k) substantial-equivalence pathway, typically without new prospective clinical trials (FDA AI-Enabled Medical Devices). Epic’s widely deployed sepsis prediction model claimed 85% sensitivity in internal validation, but external testing revealed actual sensitivity of 33%, missing two-thirds of sepsis cases. Regulatory frameworks built for static medical devices struggle to govern AI systems that evolve through continuous learning. The EU AI Act classifies most clinical AI as high-risk requiring rigorous oversight, while U.S. regulation remains permissive. This divergence shapes what AI physicians can access and who bears liability when algorithms fail.

Learning Objectives

After reading this chapter, you will be able to:

Understand the evolving FDA regulatory framework for AI/ML-based medical devices
Evaluate international regulatory approaches (EU AI Act, WHO guidelines)
Recognize reimbursement challenges and evolving payment models for AI
Assess institutional governance frameworks for safe AI deployment
Navigate liability, accountability, and legal frameworks for medical AI
Implement hospital-level AI governance policies

Chapter Summary (TL;DR)

The Regulatory Challenge:

Traditional medical device regulation assumes static products. A pacemaker cleared by FDA in 2020 is the same device in 2025. AI systems challenge this fundamentally: they learn, adapt, and evolve through retraining and algorithm updates. Regulators worldwide are adapting 20th-century frameworks to 21st-century learning systems.

Key Regulatory Developments:

FDA (U.S.): Over 1,300 AI/ML devices authorized (FDA database), with approximately 96% cleared via the less rigorous 510(k) pathway (FDA AI-Enabled Medical Devices). The FDA’s Predetermined Change Control Plan (PCCP) framework allows pre-approved algorithm updates (FDA PCCP Guidance, 2024)
EU AI Act (2024): First comprehensive AI regulation globally. Medical AI classified as “high-risk” requiring transparency, human oversight, robustness testing, and bias audits (EU AI Act)
WHO: Published global guidance on AI ethics (2021, 2024), but guidelines are aspirational, not binding (WHO Ethics and Governance of AI for Health, 2021)
Reimbursement as forcing function: EHRs scaled after Meaningful Use ($27B in incentives). Telehealth scaled after CMS reimbursed it. Clinical AI will follow the same pattern or won’t scale. Only two AI applications exceed 10,000 CPT claims despite 1,300+ FDA clearances (Wu et al., NEJM AI, 2024). IDx-DR (CPT 92229) is a rare Medicare-covered exception
Operational AI economics: Administrative activities consume an estimated $950 billion annually in U.S. healthcare. NBER estimates AI adoption could save $200–360 billion annually, but achieving returns requires redesigning workflows around AI, not retrofitting AI into existing processes (Sahni et al., NBER, 2023)
2025 Federal Policy Shift: Executive Order 14179 (January 2025) rescinded Biden-era AI oversight frameworks, prioritizing innovation over precaution. December 2025 EO established federal preemption of state AI laws. Federal regulatory appetite for new AI requirements has decreased; institutional governance becomes more critical.

What Works and What Doesn’t:

The Epic sepsis model case illustrates regulatory failures: despite FDA 510(k) clearance, external validation at Michigan Medicine found sensitivity of only 33% (not 85% claimed), with 67% of sepsis cases never triggering an alert (Wong et al., JAMA Intern Med, 2021). FDA issued no recall or enforcement action.

The Clinical Bottom Line:

FDA clearance ≠ clinical validation. Demand prospective, external validation before adopting AI
If your institution lacks a multidisciplinary AI oversight committee, advocate for one
“AI told me to” is not a valid malpractice defense. Document AI recommendations and your rationale for following or overriding them

Essential Reading:

FDA. (2024). Artificial Intelligence-Enabled Medical Devices. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-enabled-medical-devices
European Parliament. (2024). EU AI Act. https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence
Wong A, et al. (2021). External Validation of a Widely Implemented Proprietary Sepsis Prediction Model. JAMA Intern Med. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2781307

Introduction

Medicine operates within complex regulatory and policy frameworks: FDA device approvals, CMS reimbursement decisions, state medical board oversight, institutional protocols, and malpractice liability standards. These structures emerged over decades to protect patients from unsafe drugs, devices, and practices. They assume products are static: a drug approved in 2020 is chemically identical in 2025.

AI challenges this assumption fundamentally. Machine learning systems evolve through retraining on new data, algorithm updates, and performance drift as patient populations change. How should regulators approve systems that change continuously? Who is liable when AI errs: developers who built it, hospitals that deployed it, or physicians who followed its recommendations?

The stakes are high:

Patient safety: Poorly regulated AI can harm thousands before problems are detected
Innovation: Over-regulation may stifle beneficial AI development
Equity: Biased regulatory frameworks may entrench disparities
Legal liability: Unclear accountability creates defensive medicine

Part 1: The Major Policy Failure: Epic Sepsis Model

The Case Study

What was promised: Epic’s sepsis prediction model (embedded in EHR) would detect sepsis 6-12 hours before clinical recognition. Vendor claimed 85% sensitivity based on internal validation. FDA cleared the model via 510(k) pathway.

What happened:

In 2021, Wong et al. published an external validation study in JAMA Internal Medicine testing the Epic sepsis model on 27,697 patients at Michigan Medicine (Wong et al., 2021):

Sensitivity: 33% (not 85% claimed)
67% of sepsis cases never triggered an alert at any point
Positive predictive value: 12% (88% false positive rate among alerts)
Area under the curve: 0.63 (poor discrimination)

Why FDA clearance failed to prevent this:

Retrospective validation only: Epic’s 510(k) submission was based on retrospective chart review, not prospective deployment
Look-ahead bias: Training data included labs and vitals ordered after clinicians suspected sepsis, so the model learned to detect suspicion, not actual sepsis
No external validation requirement: FDA did not mandate testing at independent hospitals before clearance
510(k) predicate pathway: Model cleared as “substantially equivalent” to existing decision support without requiring clinical trial evidence

Regulatory response: FDA issued no recall, no warning letter, no enforcement action. The model remains FDA-cleared.

Lessons:

FDA clearance ≠ clinical validation
Retrospective studies mislead due to look-ahead bias and confounding
External validation at independent institutions is essential
Post-market surveillance is inadequate

Part 2: FDA Regulation of AI/ML Medical Devices

Regulatory Pathways

510(k) Clearance (Substantial Equivalence):

Device is “substantially equivalent” to a predicate device already on market
Fastest, least burdensome pathway (median 151 days review time) (MDPI Biomedicines, 2024)
97% of AI devices cleared via 510(k) pathway (2024) (MedTech Dive, 2024)

Premarket Approval (PMA):

Rigorous review requiring clinical trials demonstrating safety and effectiveness
Reserved for high-risk devices (median 372 days for De Novo pathway)
Example: IDx-DR diabetic retinopathy screening, the first autonomous AI diagnostic, received De Novo authorization in April 2018 (FDA De Novo Decision Summary)

De Novo Classification:

New device type with no predicate
Establishes new regulatory pathway for similar future devices

Current State

By the numbers:

Over 1,300 AI/ML medical devices authorized (FDA AI-Enabled Medical Devices)
168 devices cleared in 2024 alone, with 94.6% via 510(k) (MDPI Biomedicines, 2024)
Radiology dominates: 74.4% of 2024 clearances were imaging-related

Examples of FDA-cleared AI:

Category	Examples
Radiology CAD	Intracranial hemorrhage (Aidoc, Viz.ai), pulmonary embolism, lung nodules
Cardiology	ECG AFib detection (Apple Watch, AliveCor), echocardiogram EF estimation
Ophthalmology	IDx-DR/LumineticsCore diabetic retinopathy screening
Clinical Decision Support	Sepsis prediction, deterioration algorithms

Predetermined Change Control Plans (PCCP)

Traditional devices are “locked” after approval. The FDA’s PCCP framework addresses this for AI systems that need continuous updates (FDA PCCP Guidance, 2024):

What PCCP allows:

Manufacturer specifies anticipated changes (retraining, performance improvements)
FDA reviews and approves plan upfront
Specified changes proceed without new submissions

Components required:

Description of modifications: Itemization of proposed changes with justifications
Modification protocol: Methods for developing, validating, and implementing changes
Impact assessment: Benefits, risks, and mitigations

Final guidance issued December 2024 broadened scope to all AI-enabled devices, not just ML-enabled devices.

General Wellness Products: What Escapes FDA Oversight

Not all health-related software and devices require FDA clearance. The FDA’s General Wellness guidance (updated January 2026) defines products that fall outside medical device regulation entirely (FDA General Wellness Guidance, 2026).

The two-factor test:

A product qualifies as a general wellness product (not a medical device) if it meets BOTH criteria:

Intended for general wellness use only: Claims relate to maintaining or encouraging a healthy lifestyle (weight management, physical fitness, relaxation, sleep management, mental acuity) OR relate healthy lifestyle choices to reducing risk of chronic diseases where this association is well-established
Low risk: Not invasive, not implanted, does not involve technology posing safety risks without regulatory controls (lasers, radiation)

January 2026 update on physiologic sensing:

The updated guidance explicitly addresses non-invasive optical sensing for physiologic parameters, directly relevant to consumer wearables:

Products using optical sensing (photoplethysmography) to estimate blood pressure, oxygen saturation, blood glucose, or heart rate variability may qualify as general wellness products when outputs are intended solely for wellness uses, provided they:

Are non-invasive and not implanted
Are not intended for diagnosis, treatment, or management of disease
Do not claim clinical equivalence to FDA-cleared devices
Do not prompt specific clinical actions or medical management
Do not include clinical thresholds or diagnostic alerts
Have validated values if displaying physiologic measurements

What makes a product NOT a wellness device:

Characteristic	Example	Regulatory Status
Disease diagnosis claims	“Detects atrial fibrillation”	Medical device, requires FDA clearance
Treatment guidance	“Adjust insulin based on glucose reading”	Medical device
Clinical equivalence claims	“Medical-grade blood pressure”	Medical device
Diagnostic thresholds/alerts	“Heart rate dangerous, seek care now”	Medical device
Invasive measurement	Microneedle glucose sensor	Medical device (even if wellness claims)

Illustrative examples from FDA guidance:

Product	Wellness (No FDA)	Medical Device (FDA Required)
Wrist-worn activity tracker with heart rate, sleep, blood pressure for “recovery assessment”	Yes (if validated values, no disease claims)
Same device claiming to “monitor hypertension”		Yes (disease management claim)
Pulse oximeter for “monitoring during hiking”	Yes
Pulse oximeter for “detecting hypoxemia”		Yes (diagnostic claim)
App playing music for “relaxation and stress management”	Yes
App claiming to “treat anxiety disorder”		Yes (treatment claim)

What this means for physicians:

Consumer wearables operating under wellness exemptions have no FDA validation requirements. When patients present data from Oura, Whoop, Apple Watch (non-FDA features), or similar devices:

Treat outputs as informational, not diagnostic. These devices may display blood pressure, SpO2, or glucose estimates without demonstrating clinical accuracy
Validation status varies by feature. Apple Watch ECG and irregular rhythm notification ARE FDA-cleared; estimated blood pressure is NOT
“Validated values” requirement is self-certified. The guidance requires manufacturers to validate physiologic values, but FDA does not review this validation for wellness products
Marketing language matters. The same hardware can be a wellness product or medical device depending on how it’s marketed and what claims are made

January 2026 Clinical Decision Support Software Changes

On January 6, 2026, the FDA announced significant changes to its Clinical Decision Support (CDS) software guidance, substantially loosening oversight for AI tools that provide diagnostic or treatment recommendations (FDA CDS Guidance, 2026).

Key policy changes:

Change	Previous Policy	January 2026 Policy
Single recommendation	Software providing one recommendation = medical device	Exempt if recommendation is the only “clinically appropriate” option
Time-critical decisions	Automatic exclusion from CDS exemption	Repositioned as factor, not automatic trigger
SaMD Clinical Evaluation	Guidance document in effect	Withdrawn January 7, 2026

What this means practically:

Generative AI tools providing diagnostic suggestions or performing history-taking may now reach clinics without FDA review if they meet exemption criteria under the updated guidance. The FDA explicitly stated that software “simply providing information like ChatGPT or Google” would not require FDA regulation.

“Clinically appropriate” is undefined. The FDA declined to define what counts as “clinically appropriate,” leaving manufacturers to determine when a single recommendation is justified. This creates room for aggressive interpretation driven by commercial pressure (Epstein Becker Green analysis).

Expert concerns:

Authority shift: “The risk is not that AI replaces clinicians outright, but that authority subtly shifts, with recommendations acquiring an aura of objectivity that exceeds their evidentiary foundation” (KevinMD analysis)
Cognitive offloading: Time-pressed physicians may not review AI logic, particularly when outputs appear reasonable
Validation gap: Withdrawal of SaMD Clinical Evaluation guidance creates uncertainty on how to validate AI systems

What this means for physicians:

The January 2026 changes increase the importance of institutional governance and independent clinical validation. AI tools reaching your practice may not have undergone FDA safety review. This shifts responsibility for validation to health systems and individual physicians. The guidance explicitly preserves FDA authority over software that “substitutes for clinical judgment” or analyzes medical images for diagnostic recommendations, but the line between “providing information” and “substituting for judgment” remains unclear.

Challenges and Needed Reforms

Problem	Evidence	Needed Reform
No prospective validation required	Epic sepsis model cleared with retrospective data, failed prospectively	Mandate prospective validation for high-risk AI
Inadequate post-market surveillance	Voluntary MAUDE reporting missed AI-related surgical navigation malfunctions (8 pre-AI → 100+ post-AI update, 10+ patient injuries; Reuters, February 2026)	Require active post-market surveillance with mandatory AI-specific adverse event reporting
Generalizability not assessed	AI approved on one population may fail in others	Require demographic subgroup analysis
Transparency vs. trade secrets	Physicians cannot validate black-box AI	Mandate disclosure of training data demographics

2025 Federal AI Policy Shift

The federal approach to AI regulation changed significantly in 2025. On January 20, 2025, President Trump rescinded Executive Order 14110 (“Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”), the Biden administration’s framework emphasizing AI safety and risk mitigation (Federal Register, 2025). Three days later, Executive Order 14179 (“Removing Barriers to American Leadership in Artificial Intelligence”) established a deregulatory framework prioritizing innovation and global competitiveness over precautionary oversight.

The July 2025 America’s AI Action Plan proposes regulatory sandboxes where FDA and other agencies could allow rapid deployment and testing of AI tools with streamlined oversight. On December 11, 2025, a subsequent Executive Order (“Ensuring a National Policy Framework for Artificial Intelligence”) established federal preemption of state AI laws, directing the Attorney General to challenge state regulations that conflict with federal policy (White House, December 2025).

What this means for physicians: The direction of federal policy favors faster AI deployment with fewer pre-market requirements. State-level AI protections (like Colorado’s AI Act) face federal preemption challenges. This increases the importance of institutional governance and independent clinical validation, since federal regulatory scrutiny may decrease. The FDA’s existing medical device framework remains in place, but the broader policy environment signals reduced appetite for new AI-specific oversight requirements.

State Regulatory Sandboxes

While federal policy shifts toward deregulation, several states have created “regulatory sandboxes” for healthcare AI, enabling controlled testing of innovations that would otherwise face regulatory barriers. These programs provide temporary relief from specific state requirements while maintaining safety monitoring.

Utah: First State-Approved AI Prescribing Pilot

In January 2026, Utah became the first state to approve autonomous AI participation in prescription decision-making. The Utah Department of Commerce’s Office of Artificial Intelligence Policy (established 2024) authorized a partnership with Doctronic, an AI health platform, to handle routine prescription renewals for patients with chronic conditions (Utah Department of Commerce, January 2026).

How the pilot works:

Component	Details
Scope	Routine refills for 190 chronic condition medications
Exclusions	Pain management, ADHD medications, injectables
Safety threshold	First 250 prescriptions per medication class require physician review before AI operates independently
Override authority	Physicians retain ability to override all AI decisions
Reported concordance	99.2% agreement between AI treatment plans and physician decisions in testing (Deseret News, January 2026)
Cost	$4 per renewal initially
Duration	12-month demonstration agreement

Rationale: Medication non-adherence costs an estimated $100-289 billion annually in avoidable U.S. healthcare spending (Cutler et al., BMJ Open, 2018). Approximately 78% of prescription activity involves routine refills rather than new prescriptions (Optum, 2017). Administrative delays in renewals contribute to gaps in medication adherence, particularly for chronic conditions requiring continuous therapy.

Professional response: The American Medical Association has expressed caution. AMA CEO Dr. John Whyte stated that “without physician input [AI] also poses serious risks to patients and physicians alike” (Becker’s Hospital Review, January 2026).

States with AI Regulatory Sandbox Programs:

State	Status	Key Features
Utah	Operational (2024)	Office of AI Policy with regulatory mitigation authority; healthcare AI pilots including mental health (ElizaChat), dental (Dentacor), prescription renewals (Doctronic)
Texas	Enacted (2024)	36-month testing periods; quarterly reporting; AG prohibited from prosecuting participants for waived regulations (Texas DIR)
Delaware	Enacted (2025)	Focus on biotech, healthcare, corporate governance; supervised testing environment
Arizona	Operational (2019)	Original fintech sandbox expanded to include AI applications
Wyoming	Developing	Legislation under review for AI sandbox with temporary regulatory exemptions

Federal sandbox proposals: Senator Ted Cruz (R-TX) introduced the SANDBOX Act (September 2025), which would mandate OSTP to create a federal regulatory sandbox program allowing companies to request waivers from federal regulations for up to 10 years (Fierce Healthcare, 2025).

What this means for physicians: State sandboxes create variation in what AI systems can legally do across jurisdictions. A system operating autonomously in Utah may require physician oversight in other states. Physicians practicing in sandbox states should understand the specific regulatory relief granted, the safety monitoring requirements, and their own liability exposure when AI operates with reduced oversight.

Part 3: International Regulatory Approaches

EU AI Act (2024)

The EU AI Act is the world’s first comprehensive AI regulation, entering into force August 1, 2024 (European Parliament, 2024).

Risk-based categorization:

Risk Level	Requirements	Medical AI Examples
Unacceptable (banned)	Social scoring, subliminal manipulation	Not applicable to medical AI
High risk	Strict obligations	All medical AI for diagnosis, treatment, or triage
Limited risk	Transparency requirements	Medical chatbots, symptom checkers
Minimal risk	No specific obligations	Not applicable to medical AI

High-risk medical AI requirements (npj Digital Medicine, 2024):

Transparency: Disclose training data sources, demographics, limitations
Human oversight: Physicians must retain decision authority and override
Robustness testing: Independent validation across diverse populations
Bias audits: Performance stratified by demographics
Post-market monitoring: Continuous performance tracking, adverse event reporting within 15 days

Compliance timeline: Medical devices qualifying as high-risk AI systems have until August 2, 2027 for full compliance.

Impact: Estimated compliance cost of €500K-€2M per AI system. Small startups may struggle, potentially consolidating market toward large companies.

WHO Guidelines (2021, 2024)

WHO published Ethics and Governance of Artificial Intelligence for Health in June 2021 with six principles (WHO, 2021):

Protect human autonomy: Patients and providers maintain decision-making authority
Promote human well-being and safety: AI must benefit patients, minimize harm
Ensure transparency and explainability: Stakeholders understand AI logic and limitations
Foster responsibility and accountability: Clear assignment of responsibility when AI errs
Ensure inclusiveness and equity: AI accessible to diverse populations, mitigate bias
Promote responsive and sustainable AI: Long-term monitoring, adaptation to changing contexts

In 2025, WHO published additional guidance on large multi-modal models (WHO, 2025), addressing risks specific to generative AI in healthcare:

Hallucinations: LMMs generate confident but false medical information
Outdated training data: Models trained on historical data produce obsolete recommendations
Bias amplification: Training data from high-income countries encodes perspectives that may not generalize globally
Liability gaps: The AI value chain (developer → provider → deployer) creates uncertainty about accountability when harm occurs

WHO’s 2025 guidance proposes liability frameworks including presumption of causality (shifting burden of proof to deployers), strict liability considerations, and no-fault compensation funds (see Liability chapter).

Limitation: WHO guidelines are aspirational, not enforceable. Countries adopt them voluntarily.

Other Regions

Region	Approach	Key Characteristics
Canada (Health Canada)	Collaborative	Developing adaptive licensing with FDA, UK MHRA
UK (MHRA)	Innovation-friendly	Post-Brexit independent framework
Japan (PMDA)	Conservative	Extensive clinical data required
China (NMPA)	Rapid approval	Data localization requirements limit international collaboration

International Governance and Multilateral Coordination

National regulations alone cannot govern AI systems that operate across borders. A foundation model developed in the U.S., fine-tuned in the EU, and deployed in hospitals across 50 countries presents governance challenges no single jurisdiction can address.

The WHO 2025 LMM guidance emphasizes the need for international coordination (WHO, 2025):

Networked multilateralism: Effective AI governance requires coordination across UN agencies, international financial institutions, regional organizations, civil society, and the private sector. No single body has authority over the global AI ecosystem.

Inclusive rule-making: AI governance must be shaped by all countries, not only high-income nations and the technology companies headquartered there. Rules developed without low- and middle-income country input risk encoding biases that harm those populations.

Cross-border accountability: Companies developing foundation models must be accountable regardless of where they are incorporated. Current frameworks allow regulatory arbitrage: locating operations in permissive jurisdictions while selling globally.

Current challenges:

Gap	Description
No international AI treaty	Unlike nuclear, chemical, or biological domains, no binding international agreement governs AI development or deployment
Voluntary commitments lack enforcement	Corporate pledges on AI safety (e.g., Frontier AI Forum) have no accountability mechanisms
Regulatory arbitrage	Companies can base operations in jurisdictions with minimal oversight
Fragmented standards	No harmonized requirements for safety testing, transparency, or post-market surveillance

Emerging coordination mechanisms:

UN High-Level Advisory Body on AI: Recommendations published September 2024, but non-binding
G7 Hiroshima AI Process: Voluntary code of conduct for foundation model developers
OECD AI Principles: Adopted by 46 countries, but no enforcement mechanism
Bilateral agreements: U.S.-EU Trade and Technology Council addresses AI but lacks specificity on medical applications

What this means for physicians: AI systems you use may be developed, trained, and updated by entities outside any jurisdiction’s effective control. Institutional governance and vendor due diligence become critical when regulatory frameworks are fragmented.

Part 4: Reimbursement as the Adoption Forcing Function

The Historical Pattern: Government Incentives Drive Adoption

Regulatory approval is necessary but insufficient. Reimbursement drives clinical deployment. If payers do not cover AI, providers will not use it. The evidence for this comes from two major technology adoption cycles in U.S. healthcare.

Meaningful Use and EHR Adoption

Before 2009, electronic health record adoption was minimal. The Health Information Technology for Economic and Clinical Health (HITECH) Act allocated approximately $27-35 billion in Medicare and Medicaid incentive payments to drive EHR adoption (Blumenthal, NEJM, 2011). The program offered eligible physicians up to $44,000 (Medicare) or $63,750 (Medicaid) in incentives, with penalties of 1-5% Medicare payment reductions for non-adopters beginning in 2015 (Marcotte et al., Arch Intern Med, 2012).

The results were unambiguous. Annual EHR adoption rates among eligible hospitals increased from 3.2% pre-HITECH (2008-2010) to 14.2% post-HITECH (2011-2015), a difference-in-differences of 7.9 percentage points compared to ineligible hospitals (Adler-Milstein & Jha, Health Affairs, 2017). By 2017, 86% of office-based physicians and 96% of non-federal acute care hospitals had adopted EHRs (AHA News, 2017).

Telehealth and CMS Reimbursement

Telehealth followed the same pattern. Before COVID-19, Medicare telehealth coverage was restricted to rural areas for specific services. When CMS waived geographic restrictions and reimbursed telehealth at parity with in-person visits during the public health emergency, adoption exploded. Many flexibilities have since been made permanent: behavioral health telehealth parity, FQHC and RHC authorization, and removal of frequency limits for inpatient and nursing facility visits (HHS Telehealth Policy, 2025).

Why Government Must Lead

Private payers follow Medicare’s lead. Research demonstrates that a $1.00 increase in Medicare fees increases corresponding private prices by $1.16 (Clemens & Gottlieb, J Polit Econ, 2017). Most commercial insurers benchmark payment levels to CMS’s Resource-Based Relative Value Scale (RBRVS). Approximately 50% of private payer coverage decisions align with Medicare national coverage determinations (JR Associates, 2024).

This creates a coordination problem. No individual commercial payer wants to be first to cover clinical AI without evidence of adoption elsewhere. Medicare, as the largest payer covering 65 million beneficiaries, provides the market signal that unlocks private investment.

The Capital and Talent Argument

Without clear reimbursement pathways, capital and talent flow elsewhere. Healthcare AI attracted $18 billion in venture investment in 2025, accounting for 46% of all healthcare investment (Silicon Valley Bank, 2026). Yet AI-enabled startups captured 62% of VC dollars because investors prioritize companies targeting operational efficiency over clinical decision support (Rock Health, 2025). The reason: operational AI has clearer revenue models. Clinical AI faces reimbursement uncertainty.

The consequence is predictable. Top engineering talent that could work on clinical AI instead joins companies offering equity in firms with clear paths to revenue. As one industry observer noted: “For founders, payers are the new patients.”

Current Landscape: Adoption Remains Nascent

A 2024 analysis of 11 billion CPT claims (2018-2023) found that only two AI applications have exceeded 10,000 claims: coronary artery disease assessment (67,306 claims via CPT 0501T-0504T) and diabetic retinopathy screening (15,097 claims via CPT 92229) (Wu et al., NEJM AI, 2024). Despite over 1,300 FDA-cleared AI devices, clinical utilization remains concentrated in affluent, metropolitan, academic medical centers.

AI Category	Reimbursement Status	CPT Code(s)
Diabetic retinopathy (IDx-DR/LumineticsCore)	Medicare coverage, CPT 92229	National rate $45.36; private median $127.81
Coronary artery disease (HeartFlow)	Category III codes	0501T-0504T
Radiology AI (most)	No separate payment; cost absorbed into radiologist fee	None
Clinical decision support	No payment; hospital self-funds	None
Ambient documentation	Physician subscription ($1,000-1,500/month)	None

Success Story: IDx-DR Diabetic Retinopathy Screening

IDx-DR (now LumineticsCore) represents the rare AI reimbursement success (FDA De Novo, 2018; CMS, 2022):

Milestone	Details
FDA authorization	April 2018, De Novo pathway (first autonomous AI diagnostic)
CPT code	92229 established 2021 (imaging with AI interpretation without physician review)
Medicare coverage	Yes, with CMS establishing national payment in 2022
Payment	$45.36 (Medicare); $127.81 median (private)

Why it succeeded:

Clear clinical benefit: Diabetic retinopathy screening reduces blindness
Solves access problem: Primary care can screen without ophthalmologist referral
Cost-effective: Cheaper than specialist visit
Prospective validation: Randomized trial evidence supported De Novo authorization

The Policy Gap: What Clinical AI Needs

Digital therapeutics and clinical AI still lack a formal Medicare benefit category. The pending Access to Prescription Digital Therapeutics Act (H.R.1458) would create a defined reimbursement class similar to pharmaceuticals (Odelle Technology, 2025). Until enacted, most clinical AI relies on Category III CPT codes (temporary, no guaranteed payment), bundled into existing procedure fees, or hospital self-funding.

In January 2026, CMS formalized new AI and augmented intelligence CPT codes for machine-assisted diagnosis, interpretation, and clinical decision support (Avalere Health, 2025). Whether these codes receive adequate valuation and payer coverage remains an open question.

Emerging Payment Models

Fee-for-service does not incentivize AI adoption. Value-based models align incentives:

Value-based care contracts: Providers share risk with payers. AI that reduces hospitalizations and complications directly benefits providers financially.

Bundled payments: Single payment for entire episode of care. AI costs included in bundle; providers incentivized to use cost-effective AI that improves outcomes without separate line-item billing.

Outcomes-based contracts with vendors: Hospital pays vendor based on AI performance, not upfront license. Aligns incentives to reduce false positives and demonstrate clinical value.

The Path Forward

Clinical AI will follow the EHR and telehealth adoption pattern or it will not scale. The policy mechanisms that worked before can work again:

Dedicated CPT codes with adequate valuation for validated clinical AI applications
Medicare coverage decisions that establish precedent for private payers
MACRA/APM integration that rewards AI-enabled quality improvement
Penalty structures for non-adoption of proven AI (as Meaningful Use penalized EHR non-adopters)

The technology exists. The validation methodologies exist. The missing ingredient is the same one that was missing for EHRs in 2008 and telehealth in 2019: a clear signal from government payers that adoption will be rewarded.

Part 5: Operational AI and Healthcare Economics

The Untapped Opportunity

While regulatory and reimbursement discussions focus on clinical decision support, operational and administrative activities consume a larger share of healthcare spending. Workforce staffing, care coordination, billing, claims processing, scheduling, and customer service contributed an estimated $950 billion in U.S. healthcare costs in 2019 (Sahni et al., McKinsey, 2021). This represents approximately 25% of total healthcare spending.

The National Bureau of Economic Research estimates that wider AI adoption could generate savings of 5–10% of U.S. healthcare spending, approximately $200–360 billion annually in 2019 dollars (Sahni et al., NBER Working Paper 30857, 2023). These estimates focus on AI-enabled use cases using current technology, attainable within five years, that do not sacrifice quality or access.

Why operational AI attracts capital: As noted above, operational AI startups captured 62% of healthcare venture investment in 2025 because revenue models are clearer (Rock Health, 2025). Clinical AI faces reimbursement uncertainty. Operational AI can demonstrate ROI through reduced labor costs, improved throughput, and decreased waste without requiring payer coverage decisions.

The Productivity Paradox

Paradoxically, new clinical technologies have historically increased overall healthcare spending (Brynjolfsson et al., AEJ: Macroeconomics, 2021). This “productivity J-curve” describes a pattern where general purpose technologies initially reduce productivity before generating gains. The explanation: technology alone does not reduce costs. Organizations must redesign workflows, structures, and culture around the technology.

The implication for health systems: Rather than retrofitting AI into existing clinical workflows, effective implementation requires redesigning processes around AI capabilities. For example, follow-up appointment frequency is typically left to individual physician preference with high variability and little evidence guiding these decisions. AI-based risk stratification could reallocate appointment frequency based on patient need, substantially increasing capacity and reducing wait times. One study found that reducing follow-up frequency by a single visit per year could save $1.9 billion nationally (Ganguli et al., JAMA, 2015).

Such redesign requires institutional willingness to change practice patterns, not merely add AI to existing processes.

Cross-Industry Lessons

Other industries have adopted operational AI with documented returns on investment (Wong et al., npj Health Syst, 2026):

Sector	Application	Healthcare Parallel
Retail	Inventory forecasting, demand prediction	Hospital capacity forecasting, supply chain optimization
Aviation	Predictive maintenance, weather disruption modeling	Biomedical equipment maintenance, OR scheduling
Logistics	Route optimization	Emergency response, patient transport
Financial services	Customer advisory chatbots, query resolution	Patient communication, prior authorization

Key insight: These industries achieved returns by pairing technology investment with workflow redesign. UPS’s route optimization system reportedly saves $300–400 million annually on fuel costs, but required restructuring delivery operations around the algorithm’s recommendations.

Barriers to Operational AI in Healthcare

Several factors explain why healthcare has lagged other sectors in operational AI adoption:

Fragmented data: Healthcare data is siloed across EHRs, claims systems, and departmental applications. AI requires integrated data access.
Risk tolerance: Aviation and finance are safety-critical, yet healthcare is more risk-averse about algorithmic decision-making.
Regulatory uncertainty: Most operational AI escapes FDA oversight, but institutions lack guidance on governance requirements.
Workforce concerns: Staff may view operational AI as threatening rather than enabling.
Misaligned incentives: Fee-for-service rewards volume, not efficiency. Value-based contracts better align incentives with operational improvement.

The Learning Health System Framework

Effective AI integration requires continuous evaluation, not one-time deployment. The learning health system model provides a framework for pairing operational goals with evidence generation (IOM, 2012):

Identify operational gap (e.g., OR utilization, scheduling efficiency, documentation burden)
Deploy AI intervention with prospective measurement plan
Evaluate outcomes against pre-specified metrics
Iterate or terminate based on evidence

This approach addresses a critical gap: a 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Many health systems lack the expertise or infrastructure to validate AI performance or assess investment value.

National collaboratives can help: The Coalition for Health AI (CHAI), the Health AI Partnership, and the AMA’s Center for Digital Health and AI enable resource-limited health systems to leverage peer expertise, troubleshoot common problems, and disseminate findings on operational AI tools.

Strategic Implications for Health Systems

Action	Rationale
Tie AI initiatives to measurable value	Avoid “productivity paradox” by requiring ROI demonstration before scaling
Redesign workflows around AI	Retrofitting AI into existing processes yields minimal benefit
Integrate AI operations with research	Learning health system model generates evidence while improving operations
Build distributed AI literacy	Frontline staff must understand AI capabilities to identify opportunities
Start with operational AI	Clearer ROI, less regulatory complexity than clinical decision support

Part 6: Institutional Governance

Why Hospital-Level Governance Matters

FDA clearance does not guarantee AI works at your hospital:

Different patient population, workflows, EHR
AI is cost-effective for YOUR budget
Physicians will use AI appropriately
Patients will not be harmed

Institutional governance fills gaps left by regulation.

Essential Governance Components

1. Clinical AI Governance Committee

Minimum composition:

Chair: CMIO or CMO
Physicians from specialties using AI
Chief Nursing Officer representative
CIO or IT director
Legal counsel with medical malpractice and AI expertise
Chief Quality/Patient Safety Officer
Health equity lead
Bioethicist
Patient advocate

Responsibilities: Pre-procurement review, pilot approval, deployment oversight, adverse event investigation, policy development, bias auditing.

2. Validation Before Deployment

Do not assume vendor validation generalizes to your hospital.

Phase	Duration	Purpose
Silent mode	2-4 weeks	AI generates outputs not shown to clinicians; verify technical stability
Shadow mode	4-8 weeks	AI outputs shown as “informational only”; gather physician feedback
Active pilot	3-6 months	Limited deployment with pre-defined success criteria

A scoping review of 75 silent evaluations found no standardized methodology and wide variance in what teams actually test during this phase; most measure only AUROC while neglecting subgroup equity, workflow integration, and human factors (Tikhomirov et al., 2026). Silent mode should evaluate all success criteria below, not just technical stability.

Success criteria should include:

Technical: Sensitivity, specificity, PPV thresholds
Clinical: Primary outcome improvement vs. baseline
User: Physician satisfaction, response rate
Safety: Zero preventable patient harm
Equity: No performance disparities >10% across demographics

3. Bias Monitoring

Quarterly audits measuring AI performance across:

Race/ethnicity
Age
Sex
Insurance status
Language

If performance difference >10%: investigate, mitigate, or deactivate.

4. Vendor Contracts

Require:

Hospital retains ownership of patient data
Performance guarantees with termination rights if thresholds not met
Disclosure of training data demographics, validation studies, limitations
Vendor indemnification for AI errors and data breaches
HIPAA Business Associate Agreement

5. Value Alignment Assessment

Beyond technical performance and bias, institutions should assess what values AI systems embed. The RAISE consortium’s “Values In the Model” (VIM) framework proposes that AI systems disclose how they navigate value-laden trade-offs: intervention vs. conservative management, patient autonomy vs. paternalism, individual benefit vs. resource constraints (Goldberg et al., NEJM AI, 2026).

When evaluating AI systems, governance committees should ask:

What optimization target was this system trained on?
How does it handle scenarios where reasonable experts disagree?
Does behavior differ between fee-for-service and capitated contexts?

See Ethics chapter: Value Alignment Frameworks for detailed guidance on assessing embedded values.

Enterprise AI Lifecycle Frameworks

Beyond committee composition and validation protocols, institutions need structured frameworks for how AI solutions progress from concept to deployment to monitoring. Stanford Medicine’s experience provides a model.

RAIL (Responsible AI Lifecycle):

Stanford Health Care established the Responsible AI Lifecycle (RAIL) framework in 2023 to codify institutional workflows for AI solution development (Shah et al., 2026):

Stage	Key Activities
Proposal	Use case definition, risk tiering, stakeholder alignment
Development	Model building, integration with clinical data, prompt engineering
Validation	FURM assessment (see below), truth set curation, benchmark testing
Pilot	Controlled deployment with defined success criteria
Monitoring	System integrity, performance, and impact tracking

FURM (Fair Useful Reliable Models):

The FURM framework specifies required assessments before AI deployment:

Fair: Performance tested across demographic subgroups; disparities documented and mitigated
Useful: Clear clinical or operational benefit demonstrated; workflows redesigned for integration
Reliable: Consistent performance across settings; failure modes identified and documented

Why structured frameworks matter:

Most health systems lack the expertise or infrastructure to validate AI performance or assess investment value. A 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Structured frameworks convert ad hoc adoption decisions into systematic processes.

Implementation lesson from Stanford ChatEHR:

Stanford’s approach required embedding data science teams within IT organizations, providing direct access to personnel maintaining network security, EHR integrations, and cloud resources. This integration enabled what began as a sandbox (April 2023) to scale to health-system-wide deployment (September 2025) in approximately 2.5 years. The “build-from-within” strategy provides institutional agency: model-agnostic infrastructure that matches clinical tasks to appropriate LLMs, institutional data governance, and custom monitoring aligned to organizational priorities.

For institutions without Stanford’s resources:

National collaboratives provide pathways for resource-limited health systems:

Coalition for Health AI (CHAI): Responsible AI Guide (RAIG) with developer/implementer accountability frameworks (see CHAI section)
Health AI Partnership: Peer expertise sharing and troubleshooting
AMA Center for Digital Health and AI: Policy guidance and educational resources

Part 7: Liability and Legal Frameworks

Liability Scenarios

Scenario 1: Physician follows AI recommendation, patient harmed

Example: Radiology AI flags lung nodule as “benign.” Radiologist concurs. Six months later, nodule diagnosed as cancer.

Likely outcome: Physician liable if jury finds failure to exercise independent judgment
Lesson: AI use does not absolve physician responsibility. Document rationale beyond “AI said benign”

Scenario 2: Physician overrides AI, patient harmed

Example: Sepsis AI alerts high risk (8.5/10). Physician evaluates, finds normal vitals and labs, documents rationale, discharges. Patient returns in septic shock.

Likely outcome: Physician NOT liable if override was reasonable and documented
Lesson: AI override is acceptable when clinically justified. Documentation is essential.

Scenario 3: Systematic AI error harms multiple patients

Example: ECG AI systematically underestimates QT interval. Multiple patients given QT-prolonging medications develop arrhythmias.

Potential defendants: Manufacturer (product liability), hospital (negligent deployment), physicians (negligent use)
Likely outcome: Shared liability with jury determining percentage fault

Unsettled Legal Questions

Black-box algorithms: How to prove negligence when AI logic is inscrutable?
Continuously learning AI: Who is liable for harms from updated algorithms?
Off-label AI use: Physician uses AI outside approved indications
Training data bias: AI systematically harms certain demographic groups

The Evolving Standard of Care: When NOT Using AI Becomes Negligence

The Liability Landscape Is Shifting

Current state (2024): Failure to use AI is generally NOT considered negligence. Standard of care remains defined by human physician practice.

Emerging risk (2025+): Failure to use proven AI tools may soon carry liability.

High-risk areas where this transition is happening:

Application	Evidence Base	Liability Risk Trajectory
LVO stroke detection (Viz.ai)	Multiple RCTs showing 30-60 min faster treatment	HIGH: Not using may soon be negligent given mortality impact
ICH detection (Aidoc, others)	Strong validation, adopted as standard at major centers	MEDIUM-HIGH: Becoming expected at facilities with radiology AI
Diabetic retinopathy screening (IDx-DR)	FDA-authorized autonomous AI, prospective validation	MEDIUM: May become standard for primary care with diabetic patients
Sepsis prediction	LOW: Epic sepsis model failures demonstrate tools not yet reliable	Low risk of negligence for non-use given poor validation

The legal argument is simple: If proven AI catches strokes faster and hospital doesn’t deploy it, the plaintiff’s attorney will ask: “Why did you choose to let my client’s mother die when technology existed to save her?”

Defensive recommendations:

Document your rationale if your institution decides NOT to deploy well-validated AI tools
Track when AI becomes standard: Monitor specialty society guidelines for AI adoption recommendations
Institutional governance matters: Ensure AI adoption decisions are made by committees, not individuals, to distribute liability
Stay current: What is “optional” today may be “expected” in 2-3 years for proven applications

Risk Reduction for Physicians

Document everything: Record AI recommendations, your reasoning, whether you followed or overrode
Understand AI limitations: Know validation populations, sensitivity/specificity, failure modes
Maintain clinical independence: AI is decision support, not decision maker
Obtain informed consent when appropriate: For high-stakes AI decisions, discuss with patients
Report AI errors: If AI makes systematic errors, report to Quality/Safety and AI governance

Part 8: Policy Recommendations from Expert Bodies

AMA Principles for Augmented Intelligence (2018)

The American Medical Association prefers “augmented intelligence” to emphasize physician judgment is augmented, not replaced (AMA AI Principles, 2018).

Six principles:

AI should augment, not replace, the physician-patient relationship
AI must be developed and deployed with transparency
AI must meet rigorous standards of effectiveness
AI must mitigate bias and promote health equity
AI must protect patient privacy and data security
Physicians must be educated on AI

WHO Framework (2021)

Six principles: protect human autonomy, promote well-being and safety, ensure transparency, foster accountability, ensure equity, promote sustainability (WHO, 2021).

Key Professional Society Positions

Society	Key Recommendations
American College of Radiology	AI-LAB accreditation program for vendor transparency; validate AI at your institution before clinical use
American Heart Association	Cardiovascular AI must be validated on populations where deployed
College of American Pathologists	Pathologists must review all AI-flagged cases; no autonomous AI diagnosis

Coalition for Health AI (CHAI) and Joint Commission Partnership

The Coalition for Health AI (CHAI) represents a shift from principle-based guidance to use-case-specific implementation frameworks. In September 2025, CHAI partnered with the Joint Commission to release the first installment of practical guidance for responsible AI deployment (Joint Commission, 2025).

Why this matters: The Joint Commission accredits over 22,000 healthcare organizations. A voluntary AI certification program based on CHAI playbooks is planned, potentially creating de facto standards for institutional AI governance.

Use-Case-Specific Work Groups:

CHAI’s approach differs from generic AI principles by providing guidance tailored to specific clinical applications (CHAI Use Cases):

Use Case	Focus
Clinical Decision Support (LLM + RAG)	Scope definition, escalation rules for when AI defers to humans, continuous monitoring, evidence traceability
EHR Information Retrieval	Grounding retrieved information, verification in real-world contexts, handling fragmented patient data
Prior Authorization Criteria Matching	Explainability of match/non-match decisions, human review triggers, preventing “denial drift”
Direct-to-Consumer Health Chatbots	Accessibility (5th-6th grade reading level, multilingual), error handling, authoritative source grounding, citations

Developer vs. Implementer Accountability:

The CHAI Responsible AI Guide (RAIG) distinguishes between “Developer Teams” (data scientists, engineers who build AI solutions) and “Implementer Teams” (providers, IT staff, leadership who deploy them). Each stage of the AI lifecycle specifies which team bears primary responsibility (CHAI RAIG):

Stage	Developer Responsibility	Implementer Responsibility
Define Problem & Plan	Collaborate on technical feasibility	Define business requirements, clinical context
Design	Model architecture, training approach	Workflow integration design
Engineer	Build, train, validate solution	Provide real-world data, clinical input
Assess	Performance metrics, bias testing	Local validation, population fit assessment
Pilot	Technical support, iteration	Controlled deployment, clinician feedback
Deploy & Monitor	Ongoing maintenance, updates	Adverse event tracking, governance reporting

Governance Structure Recommendations:

CHAI guidance emphasizes:

Written AI policies: Establish explicit governance with technically experienced leadership
Transparency to patients: Disclosures and educational tools about AI use
Data protection: Minimum necessary data principles, audit rights in vendor agreements
Quality monitoring: Regular validation, performance dashboards
Bias assessment: Audit whether AI was developed with datasets representative of served populations
Blinded reporting: Cross-institutional learning from AI-related events

Limitations to Acknowledge:

CHAI guidance is process-oriented but lacks quantitative thresholds. The guidance emphasizes “regularly monitor” and “regularly validate” without defining:

What performance floor triggers intervention?
What retrieval accuracy makes EHR summarization safe?
How many false denials cross from efficiency to patient harm?

Without quantitative thresholds, “continuous monitoring” risks becoming “monitor until something bad happens, then determine what the threshold should have been.” This gap is significant for institutions seeking actionable standards.

Resources:

Conclusion

AI regulation and policy are evolving rapidly. The frameworks designed for static products do not fit dynamic, learning systems that update continuously. Challenges include unclear evidence standards, insufficient post-market surveillance, reimbursement barriers, unsettled liability, and fragmented international regulations.

Key principles for physician-centered AI policy:

Patient safety first: Prospective validation, external testing
Evidence-based regulation: Demand prospective trials for high-risk AI
Transparent accountability: Clear liability when AI errs
Equity mandatory: Performance tested across demographics; biased AI not deployed
Physician autonomy preserved: AI supports, never replaces judgment
Reimbursement aligned with value: Pay for AI that improves outcomes

What physicians must do:

Individually: Demand evidence, validate AI locally, document AI use meticulously, report errors, maintain clinical independence.

Institutionally: Establish AI governance committees, implement bias audits, create accountability frameworks, provide training.

Professionally: Engage specialty societies, lobby for evidence-based regulation and reimbursement, publish validation studies.

The future of AI in medicine will be shaped by the choices made today: the regulations demanded, the reimbursement models advocated for, the governance structures built, and the standards held.