Healthcare Policy and AI Governance
The FDA has authorized over 1,300 AI medical devices, approximately 96% via the 510(k) substantial-equivalence pathway, typically without new prospective clinical trials (FDA AI-Enabled Medical Devices). Epic’s widely deployed sepsis prediction model claimed 85% sensitivity in internal validation, but external testing revealed actual sensitivity of 33%, missing two-thirds of sepsis cases. Regulatory frameworks built for static medical devices struggle to govern AI systems that evolve through continuous learning. The EU AI Act classifies most clinical AI as high-risk requiring rigorous oversight, while U.S. regulation remains permissive. This divergence shapes what AI physicians can access and who bears liability when algorithms fail.
After reading this chapter, you will be able to:
- Understand the evolving FDA regulatory framework for AI/ML-based medical devices
- Evaluate international regulatory approaches (EU AI Act, WHO guidelines)
- Recognize reimbursement challenges and evolving payment models for AI
- Assess institutional governance frameworks for safe AI deployment
- Navigate liability, accountability, and legal frameworks for medical AI
- Implement hospital-level AI governance policies
Introduction
Medicine operates within complex regulatory and policy frameworks: FDA device approvals, CMS reimbursement decisions, state medical board oversight, institutional protocols, and malpractice liability standards. These structures emerged over decades to protect patients from unsafe drugs, devices, and practices. They assume products are static: a drug approved in 2020 is chemically identical in 2025.
AI challenges this assumption fundamentally. Machine learning systems evolve through retraining on new data, algorithm updates, and performance drift as patient populations change. How should regulators approve systems that change continuously? Who is liable when AI errs: developers who built it, hospitals that deployed it, or physicians who followed its recommendations?
The stakes are high:
- Patient safety: Poorly regulated AI can harm thousands before problems are detected
- Innovation: Over-regulation may stifle beneficial AI development
- Equity: Biased regulatory frameworks may entrench disparities
- Legal liability: Unclear accountability creates defensive medicine
Part 1: The Major Policy Failure: Epic Sepsis Model
The Case Study
What was promised: Epic’s sepsis prediction model (embedded in EHR) would detect sepsis 6-12 hours before clinical recognition. Vendor claimed 85% sensitivity based on internal validation. FDA cleared the model via 510(k) pathway.
What happened:
In 2021, Wong et al. published an external validation study in JAMA Internal Medicine testing the Epic sepsis model on 27,697 patients at Michigan Medicine (Wong et al., 2021):
- Sensitivity: 33% (not 85% claimed)
- 67% of sepsis cases never triggered an alert at any point
- Positive predictive value: 12% (88% false positive rate among alerts)
- Area under the curve: 0.63 (poor discrimination)
Why FDA clearance failed to prevent this:
- Retrospective validation only: Epic’s 510(k) submission was based on retrospective chart review, not prospective deployment
- Look-ahead bias: Training data included labs and vitals ordered after clinicians suspected sepsis, so the model learned to detect suspicion, not actual sepsis
- No external validation requirement: FDA did not mandate testing at independent hospitals before clearance
- 510(k) predicate pathway: Model cleared as “substantially equivalent” to existing decision support without requiring clinical trial evidence
Regulatory response: FDA issued no recall, no warning letter, no enforcement action. The model remains FDA-cleared.
Lessons:
- FDA clearance ≠ clinical validation
- Retrospective studies mislead due to look-ahead bias and confounding
- External validation at independent institutions is essential
- Post-market surveillance is inadequate
Part 2: FDA Regulation of AI/ML Medical Devices
Regulatory Pathways
510(k) Clearance (Substantial Equivalence):
- Device is “substantially equivalent” to a predicate device already on market
- Fastest, least burdensome pathway (median 151 days review time) (MDPI Biomedicines, 2024)
- 97% of AI devices cleared via 510(k) pathway (2024) (MedTech Dive, 2024)
Premarket Approval (PMA):
- Rigorous review requiring clinical trials demonstrating safety and effectiveness
- Reserved for high-risk devices (median 372 days for De Novo pathway)
- Example: IDx-DR diabetic retinopathy screening, the first autonomous AI diagnostic, received De Novo authorization in April 2018 (FDA De Novo Decision Summary)
De Novo Classification:
- New device type with no predicate
- Establishes new regulatory pathway for similar future devices
Current State (2024)
By the numbers:
- Over 1,300 AI/ML medical devices authorized as of late 2025 (FDA AI-Enabled Medical Devices)
- 168 devices cleared in 2024 alone, with 94.6% via 510(k) (MDPI Biomedicines, 2024)
- Radiology dominates: 74.4% of 2024 clearances were imaging-related
Examples of FDA-cleared AI:
| Category | Examples |
|---|---|
| Radiology CAD | Intracranial hemorrhage (Aidoc, Viz.ai), pulmonary embolism, lung nodules |
| Cardiology | ECG AFib detection (Apple Watch, AliveCor), echocardiogram EF estimation |
| Ophthalmology | IDx-DR/LumineticsCore diabetic retinopathy screening |
| Clinical Decision Support | Sepsis prediction, deterioration algorithms |
Predetermined Change Control Plans (PCCP)
Traditional devices are “locked” after approval. The FDA’s PCCP framework addresses this for AI systems that need continuous updates (FDA PCCP Guidance, 2024):
What PCCP allows:
- Manufacturer specifies anticipated changes (retraining, performance improvements)
- FDA reviews and approves plan upfront
- Specified changes proceed without new submissions
Components required:
- Description of modifications: Itemization of proposed changes with justifications
- Modification protocol: Methods for developing, validating, and implementing changes
- Impact assessment: Benefits, risks, and mitigations
Final guidance issued December 2024 broadened scope to all AI-enabled devices, not just ML-enabled devices.
General Wellness Products: What Escapes FDA Oversight
Not all health-related software and devices require FDA clearance. The FDA’s General Wellness guidance (updated January 2026) defines products that fall outside medical device regulation entirely (FDA General Wellness Guidance, 2026).
The two-factor test:
A product qualifies as a general wellness product (not a medical device) if it meets BOTH criteria:
- Intended for general wellness use only: Claims relate to maintaining or encouraging a healthy lifestyle (weight management, physical fitness, relaxation, sleep management, mental acuity) OR relate healthy lifestyle choices to reducing risk of chronic diseases where this association is well-established
- Low risk: Not invasive, not implanted, does not involve technology posing safety risks without regulatory controls (lasers, radiation)
January 2026 update on physiologic sensing:
The updated guidance explicitly addresses non-invasive optical sensing for physiologic parameters, directly relevant to consumer wearables:
Products using optical sensing (photoplethysmography) to estimate blood pressure, oxygen saturation, blood glucose, or heart rate variability may qualify as general wellness products when outputs are intended solely for wellness uses, provided they:
- Are non-invasive and not implanted
- Are not intended for diagnosis, treatment, or management of disease
- Do not claim clinical equivalence to FDA-cleared devices
- Do not prompt specific clinical actions or medical management
- Do not include clinical thresholds or diagnostic alerts
- Have validated values if displaying physiologic measurements
What makes a product NOT a wellness device:
| Characteristic | Example | Regulatory Status |
|---|---|---|
| Disease diagnosis claims | “Detects atrial fibrillation” | Medical device, requires FDA clearance |
| Treatment guidance | “Adjust insulin based on glucose reading” | Medical device |
| Clinical equivalence claims | “Medical-grade blood pressure” | Medical device |
| Diagnostic thresholds/alerts | “Heart rate dangerous, seek care now” | Medical device |
| Invasive measurement | Microneedle glucose sensor | Medical device (even if wellness claims) |
Illustrative examples from FDA guidance:
| Product | Wellness (No FDA) | Medical Device (FDA Required) |
|---|---|---|
| Wrist-worn activity tracker with heart rate, sleep, blood pressure for “recovery assessment” | Yes (if validated values, no disease claims) | |
| Same device claiming to “monitor hypertension” | Yes (disease management claim) | |
| Pulse oximeter for “monitoring during hiking” | Yes | |
| Pulse oximeter for “detecting hypoxemia” | Yes (diagnostic claim) | |
| App playing music for “relaxation and stress management” | Yes | |
| App claiming to “treat anxiety disorder” | Yes (treatment claim) |
What this means for physicians:
Consumer wearables operating under wellness exemptions have no FDA validation requirements. When patients present data from Oura, Whoop, Apple Watch (non-FDA features), or similar devices:
- Treat outputs as informational, not diagnostic. These devices may display blood pressure, SpO2, or glucose estimates without demonstrating clinical accuracy
- Validation status varies by feature. Apple Watch ECG and irregular rhythm notification ARE FDA-cleared; estimated blood pressure is NOT
- “Validated values” requirement is self-certified. The guidance requires manufacturers to validate physiologic values, but FDA does not review this validation for wellness products
- Marketing language matters. The same hardware can be a wellness product or medical device depending on how it’s marketed and what claims are made
January 2026 Clinical Decision Support Software Changes
On January 6, 2026, the FDA announced significant changes to its Clinical Decision Support (CDS) software guidance, substantially loosening oversight for AI tools that provide diagnostic or treatment recommendations (FDA CDS Guidance, 2026).
Key policy changes:
| Change | Previous Policy | January 2026 Policy |
|---|---|---|
| Single recommendation | Software providing one recommendation = medical device | Exempt if recommendation is the only “clinically appropriate” option |
| Time-critical decisions | Automatic exclusion from CDS exemption | Repositioned as factor, not automatic trigger |
| SaMD Clinical Evaluation | Guidance document in effect | Withdrawn January 7, 2026 |
What this means practically:
Generative AI tools providing diagnostic suggestions or performing history-taking may now reach clinics without FDA review if they meet exemption criteria under the updated guidance. The FDA explicitly stated that software “simply providing information like ChatGPT or Google” would not require FDA regulation.
“Clinically appropriate” is undefined. The FDA declined to define what counts as “clinically appropriate,” leaving manufacturers to determine when a single recommendation is justified. This creates room for aggressive interpretation driven by commercial pressure (Epstein Becker Green analysis).
Expert concerns:
- Authority shift: “The risk is not that AI replaces clinicians outright, but that authority subtly shifts, with recommendations acquiring an aura of objectivity that exceeds their evidentiary foundation” (KevinMD analysis)
- Cognitive offloading: Time-pressed physicians may not review AI logic, particularly when outputs appear reasonable
- Validation gap: Withdrawal of SaMD Clinical Evaluation guidance creates uncertainty on how to validate AI systems
What this means for physicians:
The January 2026 changes increase the importance of institutional governance and independent clinical validation. AI tools reaching your practice may not have undergone FDA safety review. This shifts responsibility for validation to health systems and individual physicians. The guidance explicitly preserves FDA authority over software that “substitutes for clinical judgment” or analyzes medical images for diagnostic recommendations, but the line between “providing information” and “substituting for judgment” remains unclear.
Challenges and Needed Reforms
| Problem | Evidence | Needed Reform |
|---|---|---|
| No prospective validation required | Epic sepsis model cleared with retrospective data, failed prospectively | Mandate prospective validation for high-risk AI |
| Inadequate post-market surveillance | FDA relies on voluntary adverse event reporting | Require quarterly performance reports |
| Generalizability not assessed | AI approved on one population may fail in others | Require demographic subgroup analysis |
| Transparency vs. trade secrets | Physicians cannot validate black-box AI | Mandate disclosure of training data demographics |
2025 Federal AI Policy Shift
The federal approach to AI regulation changed significantly in 2025. On January 20, 2025, President Trump rescinded Executive Order 14110 (“Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”), the Biden administration’s framework emphasizing AI safety and risk mitigation (Federal Register, 2025). Three days later, Executive Order 14179 (“Removing Barriers to American Leadership in Artificial Intelligence”) established a deregulatory framework prioritizing innovation and global competitiveness over precautionary oversight.
The July 2025 America’s AI Action Plan proposes regulatory sandboxes where FDA and other agencies could allow rapid deployment and testing of AI tools with streamlined oversight. On December 11, 2025, a subsequent Executive Order (“Ensuring a National Policy Framework for Artificial Intelligence”) established federal preemption of state AI laws, directing the Attorney General to challenge state regulations that conflict with federal policy (White House, December 2025).
What this means for physicians: The direction of federal policy favors faster AI deployment with fewer pre-market requirements. State-level AI protections (like Colorado’s AI Act) face federal preemption challenges. This increases the importance of institutional governance and independent clinical validation, since federal regulatory scrutiny may decrease. The FDA’s existing medical device framework remains in place, but the broader policy environment signals reduced appetite for new AI-specific oversight requirements.
State Regulatory Sandboxes
While federal policy shifts toward deregulation, several states have created “regulatory sandboxes” for healthcare AI, enabling controlled testing of innovations that would otherwise face regulatory barriers. These programs provide temporary relief from specific state requirements while maintaining safety monitoring.
Utah: First State-Approved AI Prescribing Pilot
In January 2026, Utah became the first state to approve autonomous AI participation in prescription decision-making. The Utah Department of Commerce’s Office of Artificial Intelligence Policy (established 2024) authorized a partnership with Doctronic, an AI health platform, to handle routine prescription renewals for patients with chronic conditions (Utah Department of Commerce, January 2026).
How the pilot works:
| Component | Details |
|---|---|
| Scope | Routine refills for 190 chronic condition medications |
| Exclusions | Pain management, ADHD medications, injectables |
| Safety threshold | First 250 prescriptions per medication class require physician review before AI operates independently |
| Override authority | Physicians retain ability to override all AI decisions |
| Reported concordance | 99.2% agreement between AI treatment plans and physician decisions in testing (Deseret News, January 2026) |
| Cost | $4 per renewal initially |
| Duration | 12-month demonstration agreement |
Rationale: Medication non-adherence costs an estimated $100-289 billion annually in avoidable U.S. healthcare spending (Cutler et al., BMJ Open, 2018). Approximately 78% of prescription activity involves routine refills rather than new prescriptions (Optum, 2017). Administrative delays in renewals contribute to gaps in medication adherence, particularly for chronic conditions requiring continuous therapy.
Professional response: The American Medical Association has expressed caution. AMA CEO Dr. John Whyte stated that “without physician input [AI] also poses serious risks to patients and physicians alike” (Becker’s Hospital Review, January 2026).
States with AI Regulatory Sandbox Programs:
| State | Status | Key Features |
|---|---|---|
| Utah | Operational (2024) | Office of AI Policy with regulatory mitigation authority; healthcare AI pilots including mental health (ElizaChat), dental (Dentacor), prescription renewals (Doctronic) |
| Texas | Enacted (2024) | 36-month testing periods; quarterly reporting; AG prohibited from prosecuting participants for waived regulations (Texas DIR) |
| Delaware | Enacted (2025) | Focus on biotech, healthcare, corporate governance; supervised testing environment |
| Arizona | Operational (2019) | Original fintech sandbox expanded to include AI applications |
| Wyoming | Developing | Legislation under review for AI sandbox with temporary regulatory exemptions |
Federal sandbox proposals: Senator Ted Cruz (R-TX) introduced the SANDBOX Act (September 2025), which would mandate OSTP to create a federal regulatory sandbox program allowing companies to request waivers from federal regulations for up to 10 years (Fierce Healthcare, 2025).
What this means for physicians: State sandboxes create variation in what AI systems can legally do across jurisdictions. A system operating autonomously in Utah may require physician oversight in other states. Physicians practicing in sandbox states should understand the specific regulatory relief granted, the safety monitoring requirements, and their own liability exposure when AI operates with reduced oversight.
Part 3: International Regulatory Approaches
EU AI Act (2024)
The EU AI Act is the world’s first comprehensive AI regulation, entering into force August 1, 2024 (European Parliament, 2024).
Risk-based categorization:
| Risk Level | Requirements | Medical AI Examples |
|---|---|---|
| Unacceptable (banned) | Social scoring, subliminal manipulation | Not applicable to medical AI |
| High risk | Strict obligations | All medical AI for diagnosis, treatment, or triage |
| Limited risk | Transparency requirements | Medical chatbots, symptom checkers |
| Minimal risk | No specific obligations | Not applicable to medical AI |
High-risk medical AI requirements (npj Digital Medicine, 2024):
- Transparency: Disclose training data sources, demographics, limitations
- Human oversight: Physicians must retain decision authority and override
- Robustness testing: Independent validation across diverse populations
- Bias audits: Performance stratified by demographics
- Post-market monitoring: Continuous performance tracking, adverse event reporting within 15 days
Compliance timeline: Medical devices qualifying as high-risk AI systems have until August 2, 2027 for full compliance.
Impact: Estimated compliance cost of €500K-€2M per AI system. Small startups may struggle, potentially consolidating market toward large companies.
WHO Guidelines (2021, 2024)
WHO published Ethics and Governance of Artificial Intelligence for Health in June 2021 with six principles (WHO, 2021):
- Protect human autonomy: Patients and providers maintain decision-making authority
- Promote human well-being and safety: AI must benefit patients, minimize harm
- Ensure transparency and explainability: Stakeholders understand AI logic and limitations
- Foster responsibility and accountability: Clear assignment of responsibility when AI errs
- Ensure inclusiveness and equity: AI accessible to diverse populations, mitigate bias
- Promote responsive and sustainable AI: Long-term monitoring, adaptation to changing contexts
In 2025, WHO published additional guidance on large multi-modal models (WHO, 2025), addressing risks specific to generative AI in healthcare:
- Hallucinations: LMMs generate confident but false medical information
- Outdated training data: Models trained on historical data produce obsolete recommendations
- Bias amplification: Training data from high-income countries encodes perspectives that may not generalize globally
- Liability gaps: The AI value chain (developer → provider → deployer) creates uncertainty about accountability when harm occurs
WHO’s 2025 guidance proposes liability frameworks including presumption of causality (shifting burden of proof to deployers), strict liability considerations, and no-fault compensation funds (see Liability chapter).
Limitation: WHO guidelines are aspirational, not enforceable. Countries adopt them voluntarily.
Other Regions
| Region | Approach | Key Characteristics |
|---|---|---|
| Canada (Health Canada) | Collaborative | Developing adaptive licensing with FDA, UK MHRA |
| UK (MHRA) | Innovation-friendly | Post-Brexit independent framework |
| Japan (PMDA) | Conservative | Extensive clinical data required |
| China (NMPA) | Rapid approval | Data localization requirements limit international collaboration |
International Governance and Multilateral Coordination
National regulations alone cannot govern AI systems that operate across borders. A foundation model developed in the U.S., fine-tuned in the EU, and deployed in hospitals across 50 countries presents governance challenges no single jurisdiction can address.
The WHO 2025 LMM guidance emphasizes the need for international coordination (WHO, 2025):
Networked multilateralism: Effective AI governance requires coordination across UN agencies, international financial institutions, regional organizations, civil society, and the private sector. No single body has authority over the global AI ecosystem.
Inclusive rule-making: AI governance must be shaped by all countries, not only high-income nations and the technology companies headquartered there. Rules developed without low- and middle-income country input risk encoding biases that harm those populations.
Cross-border accountability: Companies developing foundation models must be accountable regardless of where they are incorporated. Current frameworks allow regulatory arbitrage: locating operations in permissive jurisdictions while selling globally.
Current challenges:
| Gap | Description |
|---|---|
| No international AI treaty | Unlike nuclear, chemical, or biological domains, no binding international agreement governs AI development or deployment |
| Voluntary commitments lack enforcement | Corporate pledges on AI safety (e.g., Frontier AI Forum) have no accountability mechanisms |
| Regulatory arbitrage | Companies can base operations in jurisdictions with minimal oversight |
| Fragmented standards | No harmonized requirements for safety testing, transparency, or post-market surveillance |
Emerging coordination mechanisms:
- UN High-Level Advisory Body on AI: Recommendations published September 2024, but non-binding
- G7 Hiroshima AI Process: Voluntary code of conduct for foundation model developers
- OECD AI Principles: Adopted by 46 countries, but no enforcement mechanism
- Bilateral agreements: U.S.-EU Trade and Technology Council addresses AI but lacks specificity on medical applications
What this means for physicians: AI systems you use may be developed, trained, and updated by entities outside any jurisdiction’s effective control. Institutional governance and vendor due diligence become critical when regulatory frameworks are fragmented.
Part 4: Reimbursement as the Adoption Forcing Function
The Historical Pattern: Government Incentives Drive Adoption
Regulatory approval is necessary but insufficient. Reimbursement drives clinical deployment. If payers do not cover AI, providers will not use it. The evidence for this comes from two major technology adoption cycles in U.S. healthcare.
Meaningful Use and EHR Adoption
Before 2009, electronic health record adoption was minimal. The Health Information Technology for Economic and Clinical Health (HITECH) Act allocated approximately $27-35 billion in Medicare and Medicaid incentive payments to drive EHR adoption (Blumenthal, NEJM, 2011). The program offered eligible physicians up to $44,000 (Medicare) or $63,750 (Medicaid) in incentives, with penalties of 1-5% Medicare payment reductions for non-adopters beginning in 2015 (Marcotte et al., Arch Intern Med, 2012).
The results were unambiguous. Annual EHR adoption rates among eligible hospitals increased from 3.2% pre-HITECH (2008-2010) to 14.2% post-HITECH (2011-2015), a difference-in-differences of 7.9 percentage points compared to ineligible hospitals (Adler-Milstein & Jha, Health Affairs, 2017). By 2017, 86% of office-based physicians and 96% of non-federal acute care hospitals had adopted EHRs (AHA News, 2017).
Telehealth and CMS Reimbursement
Telehealth followed the same pattern. Before COVID-19, Medicare telehealth coverage was restricted to rural areas for specific services. When CMS waived geographic restrictions and reimbursed telehealth at parity with in-person visits during the public health emergency, adoption exploded. Many flexibilities have since been made permanent: behavioral health telehealth parity, FQHC and RHC authorization, and removal of frequency limits for inpatient and nursing facility visits (HHS Telehealth Policy, 2025).
Why Government Must Lead
Private payers follow Medicare’s lead. Research demonstrates that a $1.00 increase in Medicare fees increases corresponding private prices by $1.16 (Clemens & Gottlieb, J Polit Econ, 2017). Most commercial insurers benchmark payment levels to CMS’s Resource-Based Relative Value Scale (RBRVS). Approximately 50% of private payer coverage decisions align with Medicare national coverage determinations (JR Associates, 2024).
This creates a coordination problem. No individual commercial payer wants to be first to cover clinical AI without evidence of adoption elsewhere. Medicare, as the largest payer covering 65 million beneficiaries, provides the market signal that unlocks private investment.
The Capital and Talent Argument
Without clear reimbursement pathways, capital and talent flow elsewhere. Healthcare AI attracted $18 billion in venture investment in 2025, accounting for 46% of all healthcare investment (Silicon Valley Bank, 2026). Yet AI-enabled startups captured 62% of VC dollars because investors prioritize companies targeting operational efficiency over clinical decision support (Rock Health, 2025). The reason: operational AI has clearer revenue models. Clinical AI faces reimbursement uncertainty.
The consequence is predictable. Top engineering talent that could work on clinical AI instead joins companies offering equity in firms with clear paths to revenue. As one industry observer noted: “For founders, payers are the new patients.”
Current Landscape: Adoption Remains Nascent
A 2024 analysis of 11 billion CPT claims (2018-2023) found that only two AI applications have exceeded 10,000 claims: coronary artery disease assessment (67,306 claims via CPT 0501T-0504T) and diabetic retinopathy screening (15,097 claims via CPT 92229) (Wu et al., NEJM AI, 2024). Despite over 1,300 FDA-cleared AI devices, clinical utilization remains concentrated in affluent, metropolitan, academic medical centers.
| AI Category | Reimbursement Status | CPT Code(s) |
|---|---|---|
| Diabetic retinopathy (IDx-DR/LumineticsCore) | Medicare coverage, CPT 92229 | National rate $45.36; private median $127.81 |
| Coronary artery disease (HeartFlow) | Category III codes | 0501T-0504T |
| Radiology AI (most) | No separate payment; cost absorbed into radiologist fee | None |
| Clinical decision support | No payment; hospital self-funds | None |
| Ambient documentation | Physician subscription ($1,000-1,500/month) | None |
Success Story: IDx-DR Diabetic Retinopathy Screening
IDx-DR (now LumineticsCore) represents the rare AI reimbursement success (FDA De Novo, 2018; CMS, 2022):
| Milestone | Details |
|---|---|
| FDA authorization | April 2018, De Novo pathway (first autonomous AI diagnostic) |
| CPT code | 92229 established 2021 (imaging with AI interpretation without physician review) |
| Medicare coverage | Yes, with CMS establishing national payment in 2022 |
| Payment | $45.36 (Medicare); $127.81 median (private) |
Why it succeeded:
- Clear clinical benefit: Diabetic retinopathy screening reduces blindness
- Solves access problem: Primary care can screen without ophthalmologist referral
- Cost-effective: Cheaper than specialist visit
- Prospective validation: Randomized trial evidence supported De Novo authorization
The Policy Gap: What Clinical AI Needs
Digital therapeutics and clinical AI still lack a formal Medicare benefit category. The pending Access to Prescription Digital Therapeutics Act (H.R.1458) would create a defined reimbursement class similar to pharmaceuticals (Odelle Technology, 2025). Until enacted, most clinical AI relies on Category III CPT codes (temporary, no guaranteed payment), bundled into existing procedure fees, or hospital self-funding.
Starting January 2026, CMS will formalize new AI and augmented intelligence codes for machine-assisted diagnosis, interpretation, and clinical decision support (Avalere Health, 2025). Whether these codes receive adequate valuation and coverage remains uncertain.
Emerging Payment Models
Fee-for-service does not incentivize AI adoption. Value-based models align incentives:
Value-based care contracts: Providers share risk with payers. AI that reduces hospitalizations and complications directly benefits providers financially.
Bundled payments: Single payment for entire episode of care. AI costs included in bundle; providers incentivized to use cost-effective AI that improves outcomes without separate line-item billing.
Outcomes-based contracts with vendors: Hospital pays vendor based on AI performance, not upfront license. Aligns incentives to reduce false positives and demonstrate clinical value.
The Path Forward
Clinical AI will follow the EHR and telehealth adoption pattern or it will not scale. The policy mechanisms that worked before can work again:
- Dedicated CPT codes with adequate valuation for validated clinical AI applications
- Medicare coverage decisions that establish precedent for private payers
- MACRA/APM integration that rewards AI-enabled quality improvement
- Penalty structures for non-adoption of proven AI (as Meaningful Use penalized EHR non-adopters)
The technology exists. The validation methodologies exist. The missing ingredient is the same one that was missing for EHRs in 2008 and telehealth in 2019: a clear signal from government payers that adoption will be rewarded.
Part 5: Operational AI and Healthcare Economics
The Untapped Opportunity
While regulatory and reimbursement discussions focus on clinical decision support, operational and administrative activities consume a larger share of healthcare spending. Workforce staffing, care coordination, billing, claims processing, scheduling, and customer service contributed an estimated $950 billion in U.S. healthcare costs in 2019 (Sahni et al., McKinsey, 2021). This represents approximately 25% of total healthcare spending.
The National Bureau of Economic Research estimates that wider AI adoption could generate savings of 5–10% of U.S. healthcare spending, approximately $200–360 billion annually in 2019 dollars (Sahni et al., NBER Working Paper 30857, 2023). These estimates focus on AI-enabled use cases using current technology, attainable within five years, that do not sacrifice quality or access.
Why operational AI attracts capital: As noted above, operational AI startups captured 62% of healthcare venture investment in 2025 because revenue models are clearer (Rock Health, 2025). Clinical AI faces reimbursement uncertainty. Operational AI can demonstrate ROI through reduced labor costs, improved throughput, and decreased waste without requiring payer coverage decisions.
The Productivity Paradox
Paradoxically, new clinical technologies have historically increased overall healthcare spending (Brynjolfsson et al., AEJ: Macroeconomics, 2021). This “productivity J-curve” describes a pattern where general purpose technologies initially reduce productivity before generating gains. The explanation: technology alone does not reduce costs. Organizations must redesign workflows, structures, and culture around the technology.
The implication for health systems: Rather than retrofitting AI into existing clinical workflows, effective implementation requires redesigning processes around AI capabilities. For example, follow-up appointment frequency is typically left to individual physician preference with high variability and little evidence guiding these decisions. AI-based risk stratification could reallocate appointment frequency based on patient need, substantially increasing capacity and reducing wait times. One study found that reducing follow-up frequency by a single visit per year could save $1.9 billion nationally (Ganguli et al., JAMA, 2015).
Such redesign requires institutional willingness to change practice patterns, not merely add AI to existing processes.
Cross-Industry Lessons
Other industries have adopted operational AI with documented returns on investment (Wong et al., npj Health Syst, 2026):
| Sector | Application | Healthcare Parallel |
|---|---|---|
| Retail | Inventory forecasting, demand prediction | Hospital capacity forecasting, supply chain optimization |
| Aviation | Predictive maintenance, weather disruption modeling | Biomedical equipment maintenance, OR scheduling |
| Logistics | Route optimization | Emergency response, patient transport |
| Financial services | Customer advisory chatbots, query resolution | Patient communication, prior authorization |
Key insight: These industries achieved returns by pairing technology investment with workflow redesign. UPS’s route optimization system reportedly saves $300–400 million annually on fuel costs, but required restructuring delivery operations around the algorithm’s recommendations.
Barriers to Operational AI in Healthcare
Several factors explain why healthcare has lagged other sectors in operational AI adoption:
- Fragmented data: Healthcare data is siloed across EHRs, claims systems, and departmental applications. AI requires integrated data access.
- Risk tolerance: Aviation and finance are safety-critical, yet healthcare is more risk-averse about algorithmic decision-making.
- Regulatory uncertainty: Most operational AI escapes FDA oversight, but institutions lack guidance on governance requirements.
- Workforce concerns: Staff may view operational AI as threatening rather than enabling.
- Misaligned incentives: Fee-for-service rewards volume, not efficiency. Value-based contracts better align incentives with operational improvement.
The Learning Health System Framework
Effective AI integration requires continuous evaluation, not one-time deployment. The learning health system model provides a framework for pairing operational goals with evidence generation (IOM, 2012):
- Identify operational gap (e.g., OR utilization, scheduling efficiency, documentation burden)
- Deploy AI intervention with prospective measurement plan
- Evaluate outcomes against pre-specified metrics
- Iterate or terminate based on evidence
This approach addresses a critical gap: a 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Many health systems lack the expertise or infrastructure to validate AI performance or assess investment value.
National collaboratives can help: The Coalition for Health AI (CHAI), the Health AI Partnership, and the AMA’s Center for Digital Health and AI enable resource-limited health systems to leverage peer expertise, troubleshoot common problems, and disseminate findings on operational AI tools.
Strategic Implications for Health Systems
| Action | Rationale |
|---|---|
| Tie AI initiatives to measurable value | Avoid “productivity paradox” by requiring ROI demonstration before scaling |
| Redesign workflows around AI | Retrofitting AI into existing processes yields minimal benefit |
| Integrate AI operations with research | Learning health system model generates evidence while improving operations |
| Build distributed AI literacy | Frontline staff must understand AI capabilities to identify opportunities |
| Start with operational AI | Clearer ROI, less regulatory complexity than clinical decision support |
Part 6: Institutional Governance
Why Hospital-Level Governance Matters
FDA clearance does not guarantee AI works at your hospital:
- Different patient population, workflows, EHR
- AI is cost-effective for YOUR budget
- Physicians will use AI appropriately
- Patients will not be harmed
Institutional governance fills gaps left by regulation.
Essential Governance Components
1. Clinical AI Governance Committee
Minimum composition:
- Chair: CMIO or CMO
- Physicians from specialties using AI
- Chief Nursing Officer representative
- CIO or IT director
- Legal counsel with medical malpractice and AI expertise
- Chief Quality/Patient Safety Officer
- Health equity lead
- Bioethicist
- Patient advocate
Responsibilities: Pre-procurement review, pilot approval, deployment oversight, adverse event investigation, policy development, bias auditing.
2. Validation Before Deployment
Do not assume vendor validation generalizes to your hospital.
| Phase | Duration | Purpose |
|---|---|---|
| Silent mode | 2-4 weeks | AI generates outputs not shown to clinicians; verify technical stability |
| Shadow mode | 4-8 weeks | AI outputs shown as “informational only”; gather physician feedback |
| Active pilot | 3-6 months | Limited deployment with pre-defined success criteria |
Success criteria should include:
- Technical: Sensitivity, specificity, PPV thresholds
- Clinical: Primary outcome improvement vs. baseline
- User: Physician satisfaction, response rate
- Safety: Zero preventable patient harm
- Equity: No performance disparities >10% across demographics
3. Bias Monitoring
Quarterly audits measuring AI performance across:
- Race/ethnicity
- Age
- Sex
- Insurance status
- Language
If performance difference >10%: investigate, mitigate, or deactivate.
4. Vendor Contracts
Require:
- Hospital retains ownership of patient data
- Performance guarantees with termination rights if thresholds not met
- Disclosure of training data demographics, validation studies, limitations
- Vendor indemnification for AI errors and data breaches
- HIPAA Business Associate Agreement
5. Value Alignment Assessment
Beyond technical performance and bias, institutions should assess what values AI systems embed. The RAISE consortium’s “Values In the Model” (VIM) framework proposes that AI systems disclose how they navigate value-laden trade-offs: intervention vs. conservative management, patient autonomy vs. paternalism, individual benefit vs. resource constraints (Goldberg et al., NEJM AI, 2026).
When evaluating AI systems, governance committees should ask:
- What optimization target was this system trained on?
- How does it handle scenarios where reasonable experts disagree?
- Does behavior differ between fee-for-service and capitated contexts?
See Ethics chapter: Value Alignment Frameworks for detailed guidance on assessing embedded values.
Enterprise AI Lifecycle Frameworks
Beyond committee composition and validation protocols, institutions need structured frameworks for how AI solutions progress from concept to deployment to monitoring. Stanford Medicine’s experience provides a model.
RAIL (Responsible AI Lifecycle):
Stanford Health Care established the Responsible AI Lifecycle (RAIL) framework in 2023 to codify institutional workflows for AI solution development (Shah et al., 2026):
| Stage | Key Activities |
|---|---|
| Proposal | Use case definition, risk tiering, stakeholder alignment |
| Development | Model building, integration with clinical data, prompt engineering |
| Validation | FURM assessment (see below), truth set curation, benchmark testing |
| Pilot | Controlled deployment with defined success criteria |
| Monitoring | System integrity, performance, and impact tracking |
FURM (Fair Useful Reliable Models):
The FURM framework specifies required assessments before AI deployment:
- Fair: Performance tested across demographic subgroups; disparities documented and mitigated
- Useful: Clear clinical or operational benefit demonstrated; workflows redesigned for integration
- Reliable: Consistent performance across settings; failure modes identified and documented
Why structured frameworks matter:
Most health systems lack the expertise or infrastructure to validate AI performance or assess investment value. A 2024 study found that only 61% of U.S. hospitals performed any local performance evaluation of AI models prior to deployment (Nong et al., Health Affairs, 2025). Structured frameworks convert ad hoc adoption decisions into systematic processes.
Implementation lesson from Stanford ChatEHR:
Stanford’s approach required embedding data science teams within IT organizations, providing direct access to personnel maintaining network security, EHR integrations, and cloud resources. This integration enabled what began as a sandbox (April 2023) to scale to health-system-wide deployment (September 2025) in approximately 2.5 years. The “build-from-within” strategy provides institutional agency: model-agnostic infrastructure that matches clinical tasks to appropriate LLMs, institutional data governance, and custom monitoring aligned to organizational priorities.
For institutions without Stanford’s resources:
National collaboratives provide pathways for resource-limited health systems:
- Coalition for Health AI (CHAI): Responsible AI Guide (RAIG) with developer/implementer accountability frameworks (see CHAI section)
- Health AI Partnership: Peer expertise sharing and troubleshooting
- AMA Center for Digital Health and AI: Policy guidance and educational resources
Part 7: Liability and Legal Frameworks
Liability Scenarios
Scenario 1: Physician follows AI recommendation, patient harmed
Example: Radiology AI flags lung nodule as “benign.” Radiologist concurs. Six months later, nodule diagnosed as cancer.
- Likely outcome: Physician liable if jury finds failure to exercise independent judgment
- Lesson: AI use does not absolve physician responsibility. Document rationale beyond “AI said benign”
Scenario 2: Physician overrides AI, patient harmed
Example: Sepsis AI alerts high risk (8.5/10). Physician evaluates, finds normal vitals and labs, documents rationale, discharges. Patient returns in septic shock.
- Likely outcome: Physician NOT liable if override was reasonable and documented
- Lesson: AI override is acceptable when clinically justified. Documentation is essential.
Scenario 3: Systematic AI error harms multiple patients
Example: ECG AI systematically underestimates QT interval. Multiple patients given QT-prolonging medications develop arrhythmias.
- Potential defendants: Manufacturer (product liability), hospital (negligent deployment), physicians (negligent use)
- Likely outcome: Shared liability with jury determining percentage fault
Unsettled Legal Questions
- Black-box algorithms: How to prove negligence when AI logic is inscrutable?
- Continuously learning AI: Who is liable for harms from updated algorithms?
- Off-label AI use: Physician uses AI outside approved indications
- Training data bias: AI systematically harms certain demographic groups
The Evolving Standard of Care: When NOT Using AI Becomes Negligence
Current state (2024): Failure to use AI is generally NOT considered negligence. Standard of care remains defined by human physician practice.
Emerging risk (2025+): Failure to use proven AI tools may soon carry liability.
High-risk areas where this transition is happening:
| Application | Evidence Base | Liability Risk Trajectory |
|---|---|---|
| LVO stroke detection (Viz.ai) | Multiple RCTs showing 30-60 min faster treatment | HIGH: Not using may soon be negligent given mortality impact |
| ICH detection (Aidoc, others) | Strong validation, adopted as standard at major centers | MEDIUM-HIGH: Becoming expected at facilities with radiology AI |
| Diabetic retinopathy screening (IDx-DR) | FDA-authorized autonomous AI, prospective validation | MEDIUM: May become standard for primary care with diabetic patients |
| Sepsis prediction | LOW: Epic sepsis model failures demonstrate tools not yet reliable | Low risk of negligence for non-use given poor validation |
The legal argument is simple: If proven AI catches strokes faster and hospital doesn’t deploy it, the plaintiff’s attorney will ask: “Why did you choose to let my client’s mother die when technology existed to save her?”
Defensive recommendations:
- Document your rationale if your institution decides NOT to deploy well-validated AI tools
- Track when AI becomes standard: Monitor specialty society guidelines for AI adoption recommendations
- Institutional governance matters: Ensure AI adoption decisions are made by committees, not individuals, to distribute liability
- Stay current: What is “optional” today may be “expected” in 2-3 years for proven applications
Risk Reduction for Physicians
- Document everything: Record AI recommendations, your reasoning, whether you followed or overrode
- Understand AI limitations: Know validation populations, sensitivity/specificity, failure modes
- Maintain clinical independence: AI is decision support, not decision maker
- Obtain informed consent when appropriate: For high-stakes AI decisions, discuss with patients
- Report AI errors: If AI makes systematic errors, report to Quality/Safety and AI governance
Part 8: Policy Recommendations from Expert Bodies
AMA Principles for Augmented Intelligence (2018)
The American Medical Association prefers “augmented intelligence” to emphasize physician judgment is augmented, not replaced (AMA AI Principles, 2018).
Six principles:
- AI should augment, not replace, the physician-patient relationship
- AI must be developed and deployed with transparency
- AI must meet rigorous standards of effectiveness
- AI must mitigate bias and promote health equity
- AI must protect patient privacy and data security
- Physicians must be educated on AI
WHO Framework (2021)
Six principles: protect human autonomy, promote well-being and safety, ensure transparency, foster accountability, ensure equity, promote sustainability (WHO, 2021).
Key Professional Society Positions
| Society | Key Recommendations |
|---|---|
| American College of Radiology | AI-LAB accreditation program for vendor transparency; validate AI at your institution before clinical use |
| American Heart Association | Cardiovascular AI must be validated on populations where deployed |
| College of American Pathologists | Pathologists must review all AI-flagged cases; no autonomous AI diagnosis |
Coalition for Health AI (CHAI) and Joint Commission Partnership
The Coalition for Health AI (CHAI) represents a shift from principle-based guidance to use-case-specific implementation frameworks. In September 2025, CHAI partnered with the Joint Commission to release the first installment of practical guidance for responsible AI deployment (Joint Commission, 2025).
Why this matters: The Joint Commission accredits over 22,000 healthcare organizations. A voluntary AI certification program based on CHAI playbooks is planned, potentially creating de facto standards for institutional AI governance.
Use-Case-Specific Work Groups:
CHAI’s approach differs from generic AI principles by providing guidance tailored to specific clinical applications (CHAI Use Cases):
| Use Case | Focus |
|---|---|
| Clinical Decision Support (LLM + RAG) | Scope definition, escalation rules for when AI defers to humans, continuous monitoring, evidence traceability |
| EHR Information Retrieval | Grounding retrieved information, verification in real-world contexts, handling fragmented patient data |
| Prior Authorization Criteria Matching | Explainability of match/non-match decisions, human review triggers, preventing “denial drift” |
| Direct-to-Consumer Health Chatbots | Accessibility (5th-6th grade reading level, multilingual), error handling, authoritative source grounding, citations |
Developer vs. Implementer Accountability:
The CHAI Responsible AI Guide (RAIG) distinguishes between “Developer Teams” (data scientists, engineers who build AI solutions) and “Implementer Teams” (providers, IT staff, leadership who deploy them). Each stage of the AI lifecycle specifies which team bears primary responsibility (CHAI RAIG):
| Stage | Developer Responsibility | Implementer Responsibility |
|---|---|---|
| Define Problem & Plan | Collaborate on technical feasibility | Define business requirements, clinical context |
| Design | Model architecture, training approach | Workflow integration design |
| Engineer | Build, train, validate solution | Provide real-world data, clinical input |
| Assess | Performance metrics, bias testing | Local validation, population fit assessment |
| Pilot | Technical support, iteration | Controlled deployment, clinician feedback |
| Deploy & Monitor | Ongoing maintenance, updates | Adverse event tracking, governance reporting |
Governance Structure Recommendations:
CHAI guidance emphasizes:
- Written AI policies: Establish explicit governance with technically experienced leadership
- Transparency to patients: Disclosures and educational tools about AI use
- Data protection: Minimum necessary data principles, audit rights in vendor agreements
- Quality monitoring: Regular validation, performance dashboards
- Bias assessment: Audit whether AI was developed with datasets representative of served populations
- Blinded reporting: Cross-institutional learning from AI-related events
Limitations to Acknowledge:
CHAI guidance is process-oriented but lacks quantitative thresholds. The guidance emphasizes “regularly monitor” and “regularly validate” without defining:
- What performance floor triggers intervention?
- What retrieval accuracy makes EHR summarization safe?
- How many false denials cross from efficiency to patient harm?
Without quantitative thresholds, “continuous monitoring” risks becoming “monitor until something bad happens, then determine what the threshold should have been.” This gap is significant for institutions seeking actionable standards.
Resources:
Conclusion
AI regulation and policy are evolving rapidly. The frameworks designed for static products do not fit dynamic, learning systems that update continuously. Challenges include unclear evidence standards, insufficient post-market surveillance, reimbursement barriers, unsettled liability, and fragmented international regulations.
Key principles for physician-centered AI policy:
- Patient safety first: Prospective validation, external testing
- Evidence-based regulation: Demand prospective trials for high-risk AI
- Transparent accountability: Clear liability when AI errs
- Equity mandatory: Performance tested across demographics; biased AI not deployed
- Physician autonomy preserved: AI supports, never replaces judgment
- Reimbursement aligned with value: Pay for AI that improves outcomes
What physicians must do:
Individually: Demand evidence, validate AI locally, document AI use meticulously, report errors, maintain clinical independence.
Institutionally: Establish AI governance committees, implement bias audits, create accountability frameworks, provide training.
Professionally: Engage specialty societies, lobby for evidence-based regulation and reimbursement, publish validation studies.
The future of AI in medicine will be shaped by the choices made today: the regulations demanded, the reimbursement models advocated for, the governance structures built, and the standards held.