Medical Ethics, Bias, and Health Equity

A commercial risk prediction algorithm used by millions of patients systematically under-prioritized Black patients for care, despite them being sicker than white patients with identical risk scores. The algorithm wasn’t programmed with racial bias. It learned bias from healthcare cost data that reflected existing disparities. This chapter examines how AI can amplify inequity, and what physicians must demand to prevent it.

Learning Objectives

After reading this chapter, you will be able to:

  • Apply bioethical principles (autonomy, beneficence, non-maleficence, justice) to medical AI
  • Recognize and mitigate algorithmic bias and health disparities
  • Navigate informed consent challenges with AI-assisted care
  • Assess fairness and equity implications of AI systems
  • Understand professional responsibilities when using AI
  • Apply ethical frameworks for AI development and deployment

Core Ethical Principles for Medical AI:

2. Beneficence and Non-Maleficence

  • AI must improve outcomes, not just efficiency (Topol, 2019)
  • Rigorous validation required before deployment (Nagendran et al., 2020)
  • Ongoing monitoring for harms essential (performance drift, unexpected failures)
  • First, do no harm. This applies to algorithms as much as treatments

3. Justice and Equity

  • AI must not worsen health disparities (Obermeyer et al., 2019)
  • Training data must represent diverse populations (Daneshjou et al., 2022)
  • Access to beneficial AI should be equitable, not just for well-resourced institutions
  • Algorithmic fairness requires intentional design, not just technical accuracy

4. Algorithmic Bias is Pervasive

  • Documented across radiology, dermatology, risk prediction, and resource allocation
  • Often reflects biases in training data and healthcare systems (Gichoya et al., 2022)
  • Can amplify existing health disparities if not actively mitigated
  • Requires diverse development teams and inclusive validation studies

Landmark Case Studies:

The Obermeyer Study (Science, 2019): - Commercial algorithm used healthcare costs as proxy for health needs - Black patients systematically under-prioritized despite being sicker - Algorithm affected millions of patients across the U.S. - Demonstrated how seemingly neutral technical choices encode racial bias (Obermeyer et al., 2019)

Dermatology AI Bias: - Skin cancer detection AI trained predominantly on light skin tones - Performance degradation on dark skin (higher false negatives) (Daneshjou et al., 2022) - Risk: Delayed cancer diagnosis in underrepresented populations

IDx-DR Inclusive Validation: - Diabetic retinopathy AI validated across diverse racial/ethnic groups - Prospective trial deliberately enrolled representative population - Demonstrated equitable performance across subgroups (Abràmoff et al., 2018) - Model for inclusive AI development

Professional Responsibilities: - Physicians remain ethically and legally responsible for AI-assisted decisions (Char et al., 2020) - Cannot delegate responsibility to algorithms (“the AI said so” is not a defense) - Must understand AI limitations and failure modes before clinical use - Advocacy for patients over algorithmic recommendations when appropriate - Duty to report and address bias when observed

Clinical Bottom Line: Medical AI is not ethically neutral technology. Every AI system embeds values, makes trade-offs, and can perpetuate or mitigate inequities. Physicians have professional obligation to ensure AI serves all patients equitably, respects autonomy, and improves (not just automates) care.

Introduction

Medical AI is often framed as a purely technical challenge: train algorithms on medical data, validate performance, deploy in clinical settings. But this framing obscures profound ethical questions:

  • Who benefits from medical AI? Patients at well-resourced academic centers, or also those in rural clinics and safety-net hospitals?
  • Who is harmed when AI fails? Disproportionately, underrepresented populations excluded from training data.
  • Who decides when AI is “good enough”? Developers optimizing for overall accuracy, or patients for whom the algorithm performs poorly?
  • How does AI change the physician-patient relationship? Does it enhance clinical judgment or erode physician autonomy and accountability?

These are not hypothetical concerns. As this chapter documents, medical AI has already perpetuated racial bias (Obermeyer et al., 2019), performed inequitably across skin tones (Daneshjou et al., 2022), and raised fundamental questions about informed consent and physician responsibility (Char et al., 2020).

The ethical deployment of medical AI requires more than technical validation. It demands intentional attention to fairness, equity, transparency, and accountability, principles central to medical professionalism but often absent from algorithmic development.

This chapter applies traditional bioethical frameworks to medical AI, examines documented cases of bias and inequity, and provides practical guidance for physicians navigating ethical challenges in AI-augmented practice.

Bioethical Principles Applied to Medical AI

The four principles of biomedical ethics (autonomy, beneficence, non-maleficence, and justice) provide a framework for evaluating medical AI (Char et al., 2020).

Beneficence: AI Must Benefit Patients

The Principle: Medical interventions should improve patient outcomes. “Do good.”

Challenges in Medical AI:

1. Efficiency ≠ Benefit: Many AI systems improve workflow efficiency (faster reads, reduced documentation burden) but don’t clearly improve patient outcomes. Efficiency benefits institutions and physicians; do they benefit patients?

2. Surrogate Outcomes: AI often validated on surrogate outcomes (detection accuracy, sensitivity, specificity) rather than clinical outcomes (mortality, morbidity, quality of life). High accuracy doesn’t guarantee patient benefit (Nagendran et al., 2020).

3. Unintended Consequences: AI can have hidden downsides: alert fatigue, deskilling of physicians, over-diagnosis from ultra-sensitive algorithms.

Evidence Standard for Beneficence:

Medical AI should meet the same evidence standard as medications or procedures:

  • Preclinical validation: Retrospective accuracy studies (analogous to Phase I/II trials)
  • Clinical validation: Prospective studies demonstrating clinical benefit (analogous to Phase III trials)
  • Post-market surveillance: Ongoing monitoring for harms and performance drift (analogous to Phase IV)

Critical Question: Has this AI been shown to improve patient outcomes in prospective studies, or only to achieve high accuracy in retrospective datasets?

Most medical AI has only retrospective validation. This is insufficient for claiming beneficence (Topol, 2019).

Example: IDx-DR (Diabetic Retinopathy Screening)

Demonstrated Beneficence: - Prospective trial showed AI enabled screening in primary care clinics where ophthalmologists unavailable - Increased screening rates for underserved populations - Detected vision-threatening retinopathy that would have otherwise been missed - Clear patient benefit: prevented vision loss through earlier detection (Abràmoff et al., 2018)

Example: Epic Sepsis Model

Failed to Demonstrate Beneficence: - Retrospective studies suggested high accuracy - External validation showed poor real-world performance (Wong et al., 2021) - High false positive rate caused alert fatigue - No evidence deployment improved sepsis outcomes - Efficiency goal (early sepsis detection) did not translate to patient benefit

Non-Maleficence: First, Do No Harm

The Principle: Medical interventions should not harm patients. When harm is unavoidable (e.g., chemotherapy side effects), benefits must outweigh harms.

Harms from Medical AI:

1. Direct Harms: - False negatives: Missed diagnoses (cancer, fractures, strokes) leading to delayed treatment - False positives: Unnecessary testing, procedures, anxiety, overtreatment - Incorrect treatment recommendations: AI suggesting wrong medication, wrong dose, contraindicated therapy

2. Indirect Harms: - Alert fatigue: Too many AI warnings causing physicians to ignore all alerts, including true positives - Deskilling: Over-reliance on AI eroding clinical skills (radiologists losing ability to detect findings without AI) - Delayed care: AI-driven workflows introducing bottlenecks (e.g., waiting for AI report before human review)

3. Equity Harms: - Algorithmic bias: AI performing worse for underrepresented groups, causing disparate harm (Obermeyer et al., 2019; Daneshjou et al., 2022) - Access disparities: Beneficial AI only available to wealthy institutions, widening quality gaps

4. Psychological Harms: - Loss of trust: Patients losing confidence in physicians who defer to algorithms - Dehumanization: Care feeling automated and impersonal

Risk Mitigation Strategies:

Preventing AI Harm in Clinical Practice

Before Deployment: 1. Demand prospective validation in populations similar to yours (Nagendran et al., 2020) 2. Assess subgroup performance: How does AI perform for your patient demographics? 3. Understand failure modes: How and why does the AI fail? 4. Ensure human oversight: AI should augment, not replace, physician judgment

During Use: 5. Monitor real-world performance: Does AI perform as expected in your setting? 6. Track harms: Capture false negatives, alert fatigue, workflow disruptions 7. Maintain clinical skills: Don’t let AI erode your ability to practice without it 8. Preserve physician final authority: Algorithms recommend; physicians decide

After Adverse Events: 9. Root cause analysis: Was AI contributory? How can recurrence be prevented? 10. Reporting mechanisms: Report AI failures to vendors, FDA (if applicable), and institutional quality/safety teams

Precautionary Principle: When AI evidence is uncertain, err on the side of caution. Burden of proof for safety and efficacy rests with those deploying AI, not with patients who may be harmed (Vayena & Blasimme, 2018).

Justice and Health Equity

The Principle: Medical resources should be distributed fairly. Benefits and burdens should not fall disproportionately on particular groups.

Justice Challenges in Medical AI:

1. Training Data Bias: AI trained predominantly on data from well-resourced academic centers, affluent populations, and racial/ethnic majorities performs worse for underrepresented groups (Gichoya et al., 2022).

2. Access Disparities: - Advanced AI often deployed first at wealthy institutions - Rural, safety-net, and under-resourced hospitals lack infrastructure for AI - Widens existing quality gaps between haves and have-nots

3. Algorithmic Amplification of Bias: AI can amplify existing healthcare disparities: - If training data reflects biased medical practice (e.g., Black patients receiving less pain medication), AI learns and perpetuates that bias - If outcome proxies are biased (e.g., healthcare costs correlating with access, not need), AI recommendations will be biased (Obermeyer et al., 2019)

4. Representation in AI Development: - Lack of diversity in AI development teams can lead to blind spots about bias and harm - Lack of diversity in leadership means equity considerations may be deprioritized

Algorithmic Bias: Documented Cases and Lessons

Algorithmic bias in medicine is not theoretical. It’s documented, measurable, and consequential.

Case 1: Racial Bias in Healthcare Resource Allocation

The Obermeyer Study (Science, 2019) (Obermeyer et al., 2019):

Background: - Commercial algorithm used by healthcare systems to allocate care management resources - Predicted which patients would benefit from extra support (care coordination, disease management) - Used healthcare costs as proxy for healthcare needs

The Bias: - At any given risk score, Black patients were significantly sicker than white patients - Algorithm systematically under-predicted risk for Black patients - Result: Black patients needed to be much sicker than white patients to receive same level of care

Why It Happened: - Healthcare costs reflect access to care, not just medical need - Black patients face barriers to accessing care (insurance, transportation, discrimination), leading to lower healthcare spending despite higher illness burden - Algorithm learned that “lower spending = lower need,” perpetuating inequity

Impact: - Affected millions of patients across U.S. healthcare systems - Reduced number of Black patients flagged for high-risk care management programs by >50% - Vendors corrected algorithm after publication, but similar bias likely exists in other systems

Lessons: - Choice of outcome variable matters: “Cost” and “need” are not the same - Historical biases propagate: If training data reflects biased systems, AI learns bias - Disparities can be subtle: Algorithm didn’t explicitly use race, but outcomes were racially biased - External auditing essential: Bias was discovered by independent researchers, not algorithm developers

Case 2: Dermatology AI Bias Across Skin Tones

Multiple Studies (Daneshjou et al., 2022; Esteva et al., 2017):

Background: - AI for skin cancer detection trained predominantly on images of light skin - Dermatology training datasets severely underrepresent darker skin tones (Fitzpatrick IV-VI)

The Bias: - AI performance degrades on darker skin tones - Higher false negative rates for melanoma in Black and Latino patients - Risk: Delayed cancer diagnosis in precisely the populations with worse melanoma outcomes

Why It Happened: - Training data reflect existing disparities: dermatology textbooks and databases predominantly feature light skin - Developers didn’t intentionally exclude dark skin. They used available data - Performance testing often doesn’t stratify by skin tone, hiding the problem

Lessons: - Representation in training data is critical - Performance must be evaluated across relevant subgroups (not just overall accuracy) - “Color-blind” algorithms are not fair algorithms: ignoring race/ethnicity can perpetuate disparities - Field-specific challenges: Dermatology must actively recruit diverse image datasets

Case 3: Pulse Oximetry Bias

Recent Findings (Sjoding et al., 2020):

Background: - Pulse oximeters measure oxygen saturation non-invasively - Critical for managing COVID-19, sepsis, respiratory failure - Generally considered accurate and unbiased

The Bias: - Pulse oximeters overestimate oxygen saturation in Black patients (compared to arterial blood gas) - Black patients more likely to have hidden hypoxemia (low oxygen despite “normal” pulse ox) - May delay recognition of deterioration and treatment escalation

Why It Happened: - Medical devices calibrated and validated primarily on white participants - Skin pigmentation affects light absorption, but devices not adjusted for this - Decades of use before bias recognized

Lessons: - Bias exists even in established, widely-used technologies - Real-world performance monitoring essential (not just initial validation) - Equity requires intentionality: assuming fairness is insufficient - Simple technologies can have complex bias (not just AI-specific problem)

Mitigating Bias and Promoting Equity

Addressing algorithmic bias requires action across the AI lifecycle: development, validation, deployment, and monitoring.

During AI Development

1. Diverse and Representative Training Data: - Ensure training datasets include adequate representation of all relevant demographic groups - Stratify by race, ethnicity, sex, age, geography, socioeconomic status - Partner with diverse healthcare institutions (not just academic centers)

2. Diverse Development Teams: - Include clinicians who care for underserved populations - Involve ethicists, health equity experts, community representatives - Diversity in team increases likelihood of identifying potential biases early

3. Equity as a Design Goal: - Define fairness explicitly: Equal performance across groups? Equal access? Equal benefit? - Different fairness definitions involve trade-offs. Be transparent about choices (Parikh et al., 2019) - Test for bias proactively (don’t assume fairness)

4. Choice of Outcome Variables: - Scrutinize proxies for actual outcomes (cost ≠ need, admissions ≠ severity) - Consider how historical disparities may bias outcome definitions - Validate that proxy measures don’t encode existing inequities

During Validation

5. Subgroup Analysis is Mandatory: - Report algorithm performance stratified by race, ethnicity, sex, age, insurance status - Don’t hide subgroup disparities in overall accuracy metrics - Identify where performance is acceptable vs. inadequate (Nagendran et al., 2020)

6. External Validation in Diverse Populations: - Validate in institutions serving underrepresented populations - Test across geographic, socioeconomic, and practice setting diversity - Don’t assume generalizability. Prove it (Vabalas et al., 2019)

7. Assess Calibration, Not Just Discrimination: - Discrimination (AUC-ROC) measures ability to rank risk - Calibration measures whether predicted probabilities match actual outcomes - Calibration often differs across subgroups even when discrimination appears similar - Miscalibration can lead to disparate treatment

8. Consider Intersectionality: - Bias often compounds across multiple identities (e.g., Black + female, or rural + elderly) - Test performance in intersectional subgroups, not just single demographic categories

During Deployment

9. Transparent Communication: - Disclose known limitations and subgroup performance differences - Don’t hide uncertainties or deployment risks (Char et al., 2020) - If AI performs worse for certain groups, inform clinicians and patients

10. Human Oversight and Physician Autonomy: - AI should inform, not dictate, decisions - Physicians must have ability to override algorithms when clinical judgment differs - Preserve physician accountability (can’t blame algorithm for bad decisions)

11. Equitable Access: - Ensure beneficial AI available across practice settings (not just wealthy institutions) - Consider cost, infrastructure requirements, and workflow fit - Avoid creating two-tiered healthcare: AI-augmented for the wealthy, standard care for everyone else

During Ongoing Monitoring

12. Continuous Performance Monitoring: - Track real-world performance across demographic subgroups - Monitor for performance drift over time (models degrade as clinical practice evolves) - Establish thresholds for acceptable performance and trigger investigations when crossed (Gichoya et al., 2022)

13. Adverse Event Reporting: - Create mechanisms for clinicians and patients to report suspected AI harms - Investigate patterns suggesting bias (e.g., more false negatives in particular group) - Share lessons learned across institutions

14. Regular Bias Audits: - Independent audits of AI fairness (not just self-reporting by vendors) - Involve community stakeholders and equity experts - Make audit results public (transparency increases accountability)

15. Willingness to Decommission: - If AI is found to perpetuate bias and cannot be corrected, stop using it - Don’t continue harmful AI just because it’s already deployed - Patient welfare > sunk costs

Professional Responsibilities in the AI Era

Medical AI doesn’t eliminate physician responsibility. It increases it.

Physician as Steward of AI

1. Understand Before Using: - Don’t use AI as a black box: “I don’t know how it works, but it’s accurate” - Understand (in general terms): What data does AI use? How was it trained? What are known limitations? - Know how AI was validated and in which populations

2. Maintain Independent Judgment: - AI recommendations are inputs to decision-making, not final decisions - Physician must independently assess patient, formulate differential, consider AI output in clinical context - Avoid automation bias (uncritically accepting AI recommendations)

The Dual Duty Tension

Physicians face an emerging ethical tension: the duty to use beneficial technology versus the duty to maintain independent clinical judgment. As AI tools demonstrate value, failing to use them may harm patients. Yet over-reliance on AI erodes the independent judgment that allows physicians to catch AI errors (Mello & Guha, 2024).

Navigating the tension:

  • Neither rejection nor uncritical adoption is ethically defensible. Physicians must thoughtfully integrate AI while maintaining diagnostic skill.
  • Structured collaboration patterns (see Chapter 20) help balance efficiency with skill preservation.
  • This tension creates legal exposure from both directions. See Chapter 21 for detailed analysis of dual liability risk.

The ethical path requires ongoing calibration: adopt AI where evidence supports benefit, verify outputs through independent reasoning, and maintain the foundational skills to practice when AI is unavailable or wrong.

3. Recognize and Override When Appropriate: - If AI recommendation conflicts with clinical judgment, investigate - Don’t defer to algorithm when patient-specific factors (not captured by AI) are relevant - Document reasoning when overriding AI

4. Protect Patient Interests: - Advocate for patients over algorithmic efficiency - If AI-driven workflow harms patient (delays, errors, loss of personalized care), escalate concerns - Professional obligation to prioritize patient welfare over institutional AI investments

Physician as Advocate for Equity

5. Demand Evidence of Fairness: - Ask: “How does this AI perform for my patient population?” - Refuse to use AI with known bias unless no alternative exists and bias is disclosed to patients

6. Monitor for Disparate Impact: - If you suspect AI performs worse for certain patients, document and report - Advocate for inclusive validation and bias mitigation

7. Ensure Equitable Access: - Support policies ensuring beneficial AI available to all patients (not just those at elite institutions)

Physician as Learner

8. Stay Current: - Medical AI evolves rapidly. Yesterday’s evidence may be outdated - Engage with medical AI literature (not just vendor claims) - Participate in institutional AI governance and education

9. Teach Others: - Educate trainees, colleagues, and patients about AI capabilities and limitations - Model appropriate AI use (thoughtful integration, not blind adherence)

Ethical Frameworks for Institutional AI Governance

Healthcare institutions deploying AI need structured governance to ensure ethical use (Char et al., 2020).

Core Components of AI Governance

1. AI Ethics Committee: - Multidisciplinary: clinicians, ethicists, informaticists, legal, community representatives - Reviews proposed AI deployments for ethical concerns - Authority to approve, modify, or reject AI adoption

2. Equity Impact Assessment: - Required before deploying AI - Analyzes potential disparate impact on vulnerable populations - Identifies mitigation strategies

3. Transparency Requirements: - AI systems must be documented: purpose, training data, validation studies, known limitations - Performance data (including subgroup performance) made available to clinicians - Patients informed about AI use (tiered consent approach)

4. Ongoing Monitoring: - Real-world performance tracking (overall and subgroup) - Adverse event reporting mechanisms - Regular bias audits

5. Accountability: - Clear designation of responsibility (who is accountable when AI fails?) - Physician retains ultimate authority and responsibility - Vendors held accountable for undisclosed risks or misrepresented performance

6. Sunset Provisions: - AI deployment isn’t permanent. Reevaluate periodically - Decommission AI that performs poorly, perpetuates bias, or becomes outdated

Conclusion: Toward Ethical Medical AI

Medical AI holds enormous promise: more accurate diagnosis, personalized treatment, equitable access to specialist expertise. But realizing this promise requires more than technical innovation. It demands ethical intentionality.

The history of medical AI thus far includes both successes (IDx-DR enabling diabetic retinopathy screening in underserved communities) and failures (algorithms perpetuating racial bias in resource allocation). The difference lies not in the technology itself, but in the values and priorities of those who develop, validate, deploy, and use it (Obermeyer et al., 2019; Char et al., 2020).

Ethical medical AI requires:

  1. Centering equity: Diverse training data, subgroup validation, bias mitigation, equitable access
  2. Respecting autonomy: Informed consent, transparency, patient opt-out options
  3. Demonstrating benefit: Prospective validation of clinical outcomes, not just retrospective accuracy
  4. Preventing harm: Rigorous testing, ongoing monitoring, physician oversight, accountability mechanisms
  5. Physician stewardship: Understanding AI, maintaining judgment, protecting patient interests

The goal is not AI-free medicine (that ship has sailed), nor is it uncritical AI adoption. The goal is AI that embodies the values of medicine: commitment to patients above all, respect for human dignity, pursuit of equity, and fierce protection of the vulnerable.

Physicians, by virtue of their clinical expertise, ethical training, and patient advocacy role, are uniquely positioned to ensure AI serves these ends. That responsibility cannot be delegated to algorithms.


Check Your Understanding

Test your ethical reasoning with these real-world scenarios involving AI ethics failures, algorithmic bias, and informed consent violations. Each scenario explores the complex interplay between medical technology, health equity, and professional responsibility.

Scenario 1: Risk Stratification Algorithm Perpetuates Racial Bias

You’re a family medicine physician and medical director at a large, urban safety-net hospital serving predominantly Black and Latino patients (72% of patient population). Your health system recently implemented a commercial “high-risk patient identification algorithm” integrated into the EHR to identify patients who would benefit from intensive care management programs.

Algorithm background: - Vendor: Major health IT company with widely-deployed algorithms - Purpose: Predict which patients will have high healthcare utilization in next 12 months - Intended use: Identify patients for enrollment in care management programs (nurse case managers, care coordinators, close follow-up) - Training data: 3 million patients from 50+ health systems nationwide - Reported performance: AUC 0.86 for predicting healthcare utilization - FDA status: Not regulated (administrative use, not diagnostic)

Implementation at your hospital: - Care management program capacity: 500 patients (limited by staffing) - Patient selection: Top 500 highest-risk scores enrolled automatically - Algorithm generates risk scores for all 15,000 patients in practice - Score range: 0-100 (higher = higher predicted utilization, higher priority for care management)

Month 3 - You notice troubling patterns:

Case 1 - Your patient excluded from care management: Patient: 68-year-old Black woman with uncontrolled diabetes (HbA1c 10.2%), CKD stage 4, heart failure, recent hospitalization - Algorithm risk score: 58/100 - Not enrolled in care management (below threshold) - Your clinical assessment: High-risk patient, would greatly benefit from care management

Case 2 - Lower-acuity white patient enrolled: Patient: 64-year-old white man with well-controlled diabetes (HbA1c 6.8%), hypertension, hyperlipidemia, no recent hospitalizations - Algorithm risk score: 73/100 - Enrolled in care management program - Your clinical assessment: Moderate-risk patient, already well-managed in primary care

You raise concerns to administration: “I’m seeing sicker Black patients excluded from care management while healthier white patients are enrolled. Something seems wrong with this algorithm.”

Administration response: “The algorithm doesn’t use race as an input variable. It’s race-neutral. We’re just following the scores.”

You conduct informal analysis: - Review your patient panel (n=347 patients) - Compare algorithm risk scores to your clinical assessment of patient complexity - Finding: For the same level of clinical complexity (# of chronic conditions, hospitalizations, ED visits), Black patients consistently score 10-15 points lower than white patients

You request formal investigation:

Hospital data analytics team retrospective analysis (6 months of data):

Overall findings: - 15,000 patients with algorithm risk scores - Top 500 (enrolled in care management): 31% Black/Latino, 69% white - Your patient population: 72% Black/Latino, 28% white - Dramatic underrepresentation of Black/Latino patients in care management program

Subgroup analysis by race/ethnicity:

For patients with identical clinical profiles (same # of chronic conditions, same prior-year hospitalizations, same ED visits):

Race/Ethnicity Average Risk Score % Enrolled in Care Mgmt (out of top 500)
White 67.2 69% (345/500)
Black 53.8 23% (115/500)
Latino 55.1 8% (40/500)

Finding: Black and Latino patients score 13-14 points lower than white patients with identical clinical complexity

Investigation into algorithm design:

Data analytics team reviews vendor documentation:

Algorithm’s prediction target: “Total healthcare costs in next 12 months”

Why costs, not clinical need? - Vendor’s rationale: “Costs are objective, consistently measured across all health systems, and available in billing data” - Used as proxy for healthcare needs

The problem - Cost ≠ Need:

Further analysis at your hospital reveals:

Race/Ethnicity Avg Healthcare Costs Avg # Chronic Conditions Avg # Hospitalizations
White $18,450 3.2 0.8
Black $12,100 4.1 1.2
Latino $10,800 3.9 1.1

Critical finding: Black and Latino patients have higher clinical complexity (more chronic conditions, more hospitalizations) but lower healthcare costs than white patients

Why costs are lower despite higher clinical needs: - Structural barriers to care access: - Lack of insurance (higher uninsured rate in Black/Latino patients: 18% vs. 6% white) - Transportation barriers (fewer Black/Latino patients have cars: 42% vs. 81%) - Work schedule inflexibility (inability to take time off for appointments) - Healthcare system discrimination: - Longer wait times for specialist appointments - Less likely to receive expensive treatments (cardiac cath, joint replacement, advanced imaging) - Less likely to be admitted (higher threshold for hospitalization) - Patient factors: - Medical mistrust (historical and ongoing discrimination) - Cost concerns (avoiding care due to inability to pay)

Result: Black and Latino patients receive less care (lower costs) despite being sicker → Algorithm learns “lower costs = lower need” → Systematically under-prioritizes Black/Latino patients for care management

The ethical violation: This is exactly the scenario described in the Obermeyer Science paper (Obermeyer et al., 2019) occurring at your institution

Questions for Analysis:

1. What ethical principles were violated by this algorithm’s deployment?

Principle #1: Justice and Health Equity (MAJOR VIOLATION)

Definition: Healthcare resources should be distributed fairly based on medical need, not on race or socioeconomic status

Violation: - Care management resources (500 slots) distributed based on algorithm that systematically under-prioritizes Black and Latino patients - Sicker minority patients excluded while healthier white patients enrolled - Algorithm perpetuates and amplifies existing healthcare disparities - 72% Black/Latino patient population → only 31% representation in care management program

Harm: - Black/Latino patients denied care management despite higher clinical need - Existing health disparities widened (patients who need help most receive least) - Violation of Title VI Civil Rights Act (disparate impact based on race)

Principle #2: Beneficence (Doing Good)

Violation: - Algorithm supposed to identify patients who would most benefit from care management - Instead, identifies patients with highest historical costs (not highest needs) - Result: Care management resources allocated suboptimally - Wrong patients enrolled (lower-need white patients) while high-need Black/Latino patients excluded

Harm: - Care management program fails to achieve intended benefit (reducing hospitalizations, improving outcomes for high-risk patients) - Resources wasted on lower-need patients - Higher-need patients experience preventable complications, hospitalizations

Principle #3: Non-Maleficence (First, Do No Harm)

Violation: - Algorithm causes direct harm to Black/Latino patients by excluding them from beneficial intervention - Creates systematic discrimination in resource allocation - Harm is predictable and preventable (vendor and hospital should have detected bias before deployment)

Harm: - Black/Latino patients with uncontrolled chronic conditions, recent hospitalizations, poor medication adherence excluded from care management - Predictable consequences: preventable readmissions, disease progression, complications, reduced quality of life - Harm disproportionately affects vulnerable populations already facing health disparities

Principle #4: Transparency and Informed Consent (Partial Violation)

Violation: - Patients unaware algorithm determines their access to care management program - No disclosure that algorithm uses cost (not clinical need) as proxy for risk - No informed consent for use of algorithmic resource allocation - Patients have no opportunity to appeal algorithmic exclusion

2. Who is responsible and liable for the algorithmic bias?

This case involves DISTRIBUTED ETHICAL AND LEGAL RESPONSIBILITY:

Hospital/Health System (Primary Responsibility):

Ethical failures: - Failed to evaluate algorithm for bias before deployment - Did not request subgroup performance data from vendor - Did not perform local retrospective validation stratified by race/ethnicity - Did not question use of “cost” as proxy for “need” - Deployed algorithm without equity impact assessment - No analysis of potential disparate impact on vulnerable populations - No involvement of ethics committee, community representatives, health equity experts - Failed to monitor for bias post-deployment - Took 3 months and physician complaint to investigate racial disparities - No systematic tracking of care management enrollment by race/ethnicity - Abdicated responsibility to algorithm - Administration response: “We’re just following the scores” (as if algorithm decisions are neutral and physicians have no responsibility to question them)

Legal liability: - Title VI Civil Rights Act violation (disparate impact based on race) - Federal law prohibits discrimination in programs receiving federal funds (Medicaid, Medicare) - Disparate impact standard: Policy that appears neutral but has discriminatory effect is illegal - This algorithm has clear disparate impact: 72% Black/Latino population → 31% care management enrollment - Corporate negligence - Hospital has duty to implement reasonable safeguards before deploying algorithms that affect patient care - Failure to assess for bias before deployment is negligent - Americans with Disabilities Act (ADA) potential violation - Patients with chronic diseases (diabetes, heart failure, CKD) are disabled under ADA - Algorithm systematically excludes disabled Black/Latino patients from beneficial program - May constitute disability discrimination

Likely outcome: - Department of Health and Human Services Office for Civil Rights (OCR) investigation likely if complaint filed - Potential findings: Pattern of discrimination, violation of Title VI - Remedies: Algorithm discontinued, care management program redesigned, compensatory services for affected patients, mandatory bias training, ongoing monitoring - Civil lawsuit possible (class action by affected Black/Latino patients) - Settlement: Institutional reform, financial compensation, independent oversight

AI Vendor (Secondary Responsibility):

Ethical failures: - Poor choice of outcome variable - Used “cost” as proxy for “need” without adequate consideration of how costs reflect access (not just illness severity) - Failed to recognize that historical healthcare costs encode racial disparities - Inadequate bias testing - Did not report subgroup performance by race/ethnicity in validation studies - May have tested for bias but not disclosed (publication bias toward positive results) - Misrepresented algorithm as “race-neutral” - Marketing materials likely claimed algorithm doesn’t use race → therefore fair - Ignoring race doesn’t make algorithm fair when outcome variable (cost) is itself racialized

Legal liability: - Product liability (difficult to establish but possible) - Algorithm was defective product (performed discriminatory function) - Vendor failed to warn about limitations and potential for bias - False advertising / misrepresentation - If vendor claimed algorithm was validated and unbiased without adequate testing - Vendor defense: - Hospital responsible for evaluating algorithm before deployment - Algorithm performed as designed (predicted costs accurately) - Bias is result of healthcare system disparities, not algorithm flaw

Likely outcome: - Vendor liability difficult to establish legally - More likely: Reputational harm, regulatory scrutiny, loss of customers - Vendor may modify algorithm (e.g., use clinical complexity scores instead of costs) after public exposure - Similar to Obermeyer case: Algorithm modified after Science publication, but only after years of discriminatory deployment

Physician/Medical Director (YOU) - Ethical Responsibility:

Ethical duties: - Recognize and report bias - YOU fulfilled this duty by noticing pattern and raising concerns - Many physicians would not notice (too busy, trust algorithm, don’t track outcomes by race) - Advocate for patients - YOU appropriately advocated for excluded Black/Latino patients - Pushed for formal investigation despite administration resistance - Protect vulnerable populations - Duty to ensure healthcare resources distributed equitably - Responsibility to question algorithms that produce unjust outcomes

You acted ethically and appropriately. No liability for physician who identifies and reports bias.

3. How should the hospital have evaluated this algorithm BEFORE deployment?

Pre-Deployment Equity Impact Assessment:

Step 1: Critical evaluation of algorithm design

Questions to ask vendor: - “What is your algorithm predicting?” (Answer: Healthcare costs) - “Why use costs as proxy for healthcare needs?” (Probe assumptions) - “Have you validated that costs correlate with clinical need across all racial/ethnic groups?” - “Could structural barriers to care (insurance, transportation, discrimination) affect costs independent of medical need?”

Red flags: - Vendor uses cost, utilization, or access-based metrics as proxies for clinical need - These metrics encode existing disparities - If vendor cannot provide race-stratified validation showing cost=need across all groups → DON’T DEPLOY

Step 2: Request subgroup performance data from vendor

Demand: - Algorithm performance stratified by race, ethnicity, age, sex, insurance status, ZIP code - For this algorithm: Correlation between predicted risk scores and actual clinical complexity (# chronic conditions, disease severity) BY RACE - If vendor refuses or lacks this data → RED FLAG, DON’T DEPLOY

Step 3: Local retrospective validation

Test on YOUR patient population BEFORE deployment: - Select 1,000-2,000 patients from your health system - Run algorithm to generate risk scores - Have clinical team independently assess patient complexity (# chronic conditions, hospitalizations, disease severity) - Critical analysis: Do risk scores correlate with clinical complexity EQUALLY across racial/ethnic groups?

For this algorithm, local validation would reveal: - Black/Latino patients score 13-14 points lower than white patients with identical clinical profiles - Algorithm systematically under-predicts risk for minority patients - This finding should STOP deployment

Step 4: Equity impact assessment

Required before deploying any algorithm that allocates scarce resources:

Questions to answer: 1. Who benefits? Which patients will be enrolled in care management based on algorithm scores? 2. Who is harmed? Which patients will be excluded? 3. Disparate impact analysis: Does enrollment by race/ethnicity match your patient population demographics? 4. Access barriers: Could the algorithm reflect barriers to care rather than medical need? 5. Fairness definition: What does “fair” allocation mean? (Equal access? Proportional to need? Maximize overall benefit?) 6. Alternatives: Are there less biased ways to identify high-risk patients? (Clinical complexity scores, physician referral, patient self-referral)

Require multidisciplinary review: - Clinicians who care for vulnerable populations - Ethicists - Health equity experts - Community representatives (Black/Latino patient advocates) - Legal counsel (Title VI compliance)

Decision criteria: - If equity impact assessment reveals disparate impact → Don’t deploy until bias mitigated - If bias cannot be mitigated → Don’t deploy, use alternative method

Step 5: Pilot testing with close monitoring

If algorithm passes equity impact assessment: - Deploy to limited pilot (e.g., 100 patients) - Track enrollment by race/ethnicity weekly - Compare to patient population demographics - Investigate any disparities immediately - Expand only if pilot demonstrates equitable allocation

For this algorithm: - Pilot would have revealed 69% white enrollment despite 28% white patient population - Pilot should have stopped deployment before full rollout

4. How should the algorithm be redesigned to eliminate bias?

Option 1: Change outcome variable from “cost” to “clinical complexity”

Instead of predicting: “Healthcare costs in next 12 months”

Predict: “Clinical complexity score” or “Risk of poor outcomes”

Clinical complexity score inputs: - Number of chronic conditions - Disease severity (HbA1c for diabetes, ejection fraction for heart failure, GFR for CKD) - Prior hospitalizations (count, not just cost) - ED visits (count) - Medication non-adherence (fills vs. prescribed) - Social determinants of health (housing instability, food insecurity, transportation barriers)

Key difference: Clinical complexity and social needs are DIRECT measures of who would benefit from care management, not proxies like cost

Validation: Test that clinical complexity scores correlate with care management benefit EQUALLY across racial/ethnic groups

Option 2: Explicitly adjust for known disparities

Acknowledge that Black/Latino patients face barriers to care access: - Add adjustment factor to scores for patients from underserved populations - Example: If Black patient scores 54 but clinical complexity suggests score should be 67, apply +13 point adjustment - Controversial but may be necessary to achieve equitable allocation

Option 3: Abandon algorithmic allocation entirely

Use physician referral or patient self-referral: - Physicians identify patients who would benefit from care management based on clinical judgment - Patients can self-refer if they feel overwhelmed managing their conditions - Advantages: Incorporates clinical nuance, patient autonomy, social factors that algorithms miss - Disadvantages: Physician bias possible, may miss patients who don’t seek help

Hybrid approach: - Algorithm provides preliminary risk scores (redesigned to use clinical complexity, not cost) - Physicians review scores and can nominate additional patients based on clinical judgment - Ensures algorithm supports (not replaces) clinical decision-making

Option 4: Use algorithm but monitor for bias rigorously

If algorithm is used: - Monthly monitoring: Track care management enrollment by race/ethnicity, insurance status, ZIP code - Trigger for investigation: If enrollment demographics deviate >10% from patient population → immediate investigation and recalibration - Transparency: Inform patients how care management enrollment decisions are made - Appeal process: Patients and physicians can appeal if excluded from care management

5. Key lessons for physicians on algorithmic bias and health equity:

Lesson #1: “Race-Neutral” Algorithms Can Be Racially Biased - Algorithm doesn’t need to use race as input variable to produce racially biased outputs - If algorithm uses variables that correlate with race (cost, ZIP code, insurance), it can perpetuate disparities - Always ask: “Does this algorithm perform equally across racial/ethnic groups?”

Lesson #2: Proxies for Outcomes Often Encode Bias - Cost ≠ need (reflects access, not just severity) - Utilization ≠ need (reflects access) - Even seemingly objective measures (lab values, vital signs) can be biased if collection rates differ by race

Lesson #3: Validation Must Include Subgroup Analysis - Overall accuracy metrics hide disparities - Demand race-stratified performance data BEFORE deployment - If vendor lacks or refuses to provide → RED FLAG

Lesson #4: Equity Requires Intentional Design - Algorithms optimized for overall accuracy will often perform worse for minority groups - Equity must be explicit design goal (not assumed) - Diverse development teams, inclusive validation, ongoing bias monitoring required

Lesson #5: Physicians Have Duty to Identify and Report Bias - Don’t assume algorithms are neutral or fair - Track outcomes by race/ethnicity in your practice - Speak up when you notice disparities - Professional obligation to protect vulnerable patients

Lesson #6: “Just Following the Algorithm” Is Not an Ethical Defense - Physicians retain professional responsibility for patient care - Cannot delegate ethical judgment to algorithms - Duty to question and override algorithms that produce unjust outcomes

Lesson #7: Health Equity Impact Assessment Should Be Mandatory - Before deploying any algorithm that allocates resources or affects clinical decisions - Multidisciplinary review including community representatives - Ongoing monitoring, not just pre-deployment assessment

Lesson #8: Algorithmic Bias Is a Civil Rights Issue - Title VI prohibits discrimination in federally-funded programs - Disparate impact liability applies even if discrimination unintentional - Hospitals can face federal investigation and legal action for biased algorithms

Scenario 2: Informed Consent Failure with Experimental AI

You’re a surgical oncologist at an academic medical center. Your hospital recently partnered with a medical AI startup to pilot an experimental algorithm that predicts post-operative complications for cancer surgery patients. The algorithm uses machine learning to analyze preoperative data (demographics, labs, imaging, tumor characteristics) and generates a personalized risk score for complications (surgical site infection, pneumonia, cardiac events, mortality).

Study details: - Goal: Validate algorithm in real-world clinical setting, collect data to improve model - Design: Prospective observational study; all colorectal cancer surgery patients offered enrollment - Intervention: Patients in study receive personalized risk report pre-surgery, shared with surgical team - Control: No control group; all enrolled patients receive AI risk prediction - IRB status: Approved as minimal risk study (observational, no change to standard surgical care) - Informed consent: Written consent obtained by research coordinator

Your involvement: - Principal investigator for surgical oncology patients - Responsible for explaining study to patients and obtaining informed consent when research coordinator unavailable

Month 2 - Complex patient enrolled in study:

Patient: 71-year-old Black man with stage III colon cancer - Medical history: Hypertension, type 2 diabetes, obesity (BMI 34), former smoker - Social history: Retired postal worker, married, supportive family, lives in rural area 90 miles from hospital - Baseline functional status: Independent, active, good quality of life - Cancer: Locally advanced but resectable; standard treatment is surgical resection followed by adjuvant chemotherapy - Prognosis with surgery: Good chance of cure (60-70% 5-year survival) - Prognosis without surgery: Progression, metastasis, death within 2-3 years

Study enrollment conversation (conducted by research coordinator, you present but silent):

Research coordinator: “Dr. Johnson’s team is conducting a research study using artificial intelligence to predict surgical complications. We’d like to invite you to participate. If you enroll, a computer algorithm will analyze your medical information and provide a personalized risk estimate for complications after surgery. This information will be shared with you and your surgical team. Participation is voluntary and won’t change your surgical care. Do you have any questions?”

Patient: “Will this computer thing help me avoid complications?”

Research coordinator: “It provides information about your risk, which may help the surgical team plan your care.”

Patient: “Okay, sounds good. Where do I sign?”

[Patient signs consent form, study enrollment complete]

Pre-operative AI risk report generated:

Algorithm output: - Predicted 30-day post-op complication risk: 68% (HIGH RISK) - Predicted 30-day mortality risk: 12% (HIGH RISK) - Factors driving high risk: Age 71, BMI 34, diabetes, Black race, ASA class III

Breakdown compared to average patient: - Average patient undergoing similar surgery: 24% complication risk, 2% mortality risk - This patient: 2.8× higher complication risk, 6× higher mortality risk

The report flags: “HIGH RISK PATIENT - Consider enhanced perioperative monitoring, ICU admission post-op, cardiology consultation”

You review the report and meet with patient to discuss:

You: “The AI system we’re using for the study has generated your personalized risk report. It estimates you have a 68% chance of complications and 12% chance of death within 30 days after surgery. These numbers are higher than average.”

Patient’s reaction: Visible distress, tearful, hands shaking

Patient: “12% chance of death? I didn’t know surgery was that dangerous. My wife and I were planning a trip after I recover. Should I even do this surgery?”

You (attempting to provide reassurance): “These are just predictions from a computer model. I’ve done hundreds of these surgeries. Your risk is higher than some patients because of your age and diabetes, but I think surgery is still your best option. Without surgery, the cancer will spread.”

Patient: “But the computer says 68% complications. That’s more than half. What kind of complications?”

You: “Infection, pneumonia, heart problems. We’ll monitor you closely. The AI is supposed to help us prepare.”

Patient (hesitant, uncertain): “I need to think about this. Can I talk to my wife?”

[Patient leaves, clearly shaken by risk numbers]

Next day - Patient returns with wife:

Patient: “We’ve decided not to do surgery. The risks are too high. 12% mortality is too scary. We’ll try other treatments.”

You: “Other treatments won’t cure your cancer. Surgery gives you the best chance.”

Wife: “But the computer says he has a 68% chance of complications. That’s not acceptable. We’re not doing it.”

You spend 45 minutes explaining: Cancer prognosis without surgery, benefits of surgery, how to interpret risk percentages, limitations of AI predictions

Patient remains undecided, leaves to “think more”

2 weeks later - Patient misses surgery date:

You call patient at home:

Patient: “We decided not to do the surgery. We’re going to try alternative medicine and diet changes. We researched online and found stories about people beating cancer naturally. The surgery risks are too high.”

You (frustrated, concerned): “I strongly advise surgery. This is a curable cancer if we operate now. Delaying surgery means the cancer will grow and spread. Alternative medicine won’t cure stage III colon cancer.”

Patient: “The computer said 68% complications. I’ve made my decision.”

[Patient declines surgery, opts out of further oncology care at your hospital]

6 months later - Patient presents to emergency department at different hospital:

Status: Cancer progression, now stage IV with liver metastases, bowel obstruction requiring emergency surgery - Emergency surgery performed: Palliative colostomy, no curative resection possible (metastatic disease) - Prognosis: Incurable, median survival 12-18 months with chemotherapy

You learn about outcome when reviewing study data: - Patient was enrolled in AI study - Declined curative surgery after seeing AI risk prediction - Now has incurable metastatic disease

You feel ethical distress: Did the AI risk prediction and inadequate informed consent contribute to this patient’s refusal of life-saving surgery?

Questions for Analysis:

1. What informed consent failures occurred in this case?

Failure #1: Inadequate Explanation of AI Uncertainty and Limitations

What patient should have been told: - “This AI has been trained on thousands of patients, but it’s NOT been specifically validated for patients like you” - “The algorithm’s predictions have uncertainty. Actual risk could be higher or lower” - “We don’t know how accurate this AI is for Black patients because the training dataset had few Black patients” - “Your individual circumstances may differ from algorithm’s predictions”

What patient was actually told: - Research coordinator: Vague description of “computer algorithm analyzes medical information” - You: Presented AI risk numbers (68%, 12%) without adequate context about uncertainty - Patient left with impression AI predictions were precise and definitive

Harm: Patient made life-altering decision (refuse surgery) based on misunderstood AI predictions

Failure #2: No Discussion of AI Training Data Representativeness

Critical missing information: - “This AI was trained predominantly on patients from academic medical centers, mostly white patients” - “We don’t know if it performs accurately for Black patients like you because there were very few in the training dataset” - “The algorithm flagged your race as a risk factor, but this might reflect bias in training data rather than true medical risk”

Why this matters: - Patient is Black, likely underrepresented in training data - Algorithm’s 68% complication risk and 12% mortality risk may be OVERESTIMATED due to training data bias - Similar to case in Scenario 1: Algorithms trained on biased data produce biased predictions

What should have been disclosed: - Uncertainty about algorithm accuracy for this patient’s demographic profile - Possibility that risk estimates are inflated due to training data bias - Strong recommendation: Exercise extreme caution interpreting AI predictions for underrepresented patients

Harm: Patient made decision based on potentially biased, inaccurate risk estimates

Failure #3: Inadequate Explanation of How AI Risk Compares to Physician Judgment

What patient should have been told: - “As your surgeon, based on 20 years of experience doing this operation, I believe your risk is elevated but manageable” - “The AI says 68% complication risk, but my clinical judgment is that risk is closer to 35-40%” - “AI uses statistical averages; I know your specific circumstances better” - “I’ve successfully operated on many patients with your risk profile”

What patient was told: - You presented AI numbers without clear comparison to your own clinical assessment - Patient left thinking: “Computer says 68%, doctor says surgery best option” → Conflicting messages without clarity about which to trust

Harm: Patient unclear about whose assessment (AI vs. physician) was more trustworthy; defaulted to trusting “objective” computer over physician’s clinical judgment

Failure #4: No Discussion of Alternative Interpretation Frameworks

AI presented risk as: “68% complication risk, 12% mortality risk”

Alternative framing: - “88% chance you’ll survive surgery” (focusing on survival probability) - “Most complications are minor and treatable” (contextualizing “complication”) - “Risk of death from NOT having surgery is 100% within 2-3 years” (framing relative risk) - “Your baseline health is good despite elevated risk” (emphasizing patient strengths)

Patient received: Worst-case framing without context

Failure #5: Inadequate Assessment of Patient’s Understanding

Research coordinator obtained consent, but didn’t assess: - Does patient understand AI is experimental and uncertain? - Does patient understand AI provides probabilities, not certainties? - Does patient understand how to weigh AI risk estimates against alternative risks (death from cancer without surgery)? - Does patient have health literacy to interpret percentages and risk?

You later attempted to explain, but patient had already formed strong negative impression from initial AI risk report

Failure #6: No Opportunity for Patient to Opt Out of Receiving AI Risk Prediction

Consent process: - Patient asked: “Do you consent to participate in AI study?” - Patient consented (thought it would “help”) - No option to participate in study data collection but NOT receive AI risk report

Ethical problem: - AI risk report was potentially harmful (induced fear, led to refusal of life-saving surgery) - Patient had no opportunity to decline receiving risk report - Consent to “participate in research” should have included separate consent to “receive AI prediction results”

Analogy: Genetic testing studies allow patients to opt out of receiving results; this study should have too

2. What ethical principles were violated?

Autonomy Violation (Inadequate Informed Consent): - Patient’s decision was NOT truly informed (lacked critical information about AI uncertainty, limitations, bias) - Consent process was pro forma (check-box) rather than substantive - Patient couldn’t meaningfully weigh AI risks against cancer risks without adequate explanation - Result: Patient’s autonomous choice was based on inadequate, misleading information

Non-Maleficence Violation (First, Do No Harm): - AI risk report caused direct harm: Patient declined life-saving surgery - Foreseeable harm: Risk predictions can induce fear and lead to refusal of beneficial treatments - Duty to minimize harm: Should have recognized risk of harm from AI predictions, mitigated through better consent and framing - Result: Patient progression from curable stage III to incurable stage IV cancer

Beneficence Violation (Duty to Benefit Patient): - Study’s primary goal: Validate AI algorithm, collect data to improve model (research goal) - Secondary goal: Help patients make informed decisions (clinical goal) - Hierarchy failure: Research goal prioritized over patient’s clinical benefit - Patient received AI prediction that harmed decision-making; questionable whether patient benefited from study participation - Result: Patient harmed by study participation (refused surgery after seeing AI prediction)

Justice Violation (Equitable Treatment): - Black patients underrepresented in AI training data → Algorithm potentially less accurate for Black patients → Disproportionate harm to minority patients who receive biased predictions - This patient: Black man potentially received inflated risk estimates due to training data bias - Experimental AI with uncertain accuracy deployed on vulnerable populations without adequate safeguards

3. Who is liable for the patient’s progression to incurable cancer?

This is complex case of shared liability and ethical failure:

Institutional Research Review Board (IRB) - Significant Responsibility:

Failures: - Approved study as “minimal risk” without adequate consideration of harms from AI predictions - Risk assessment focused on physical harm from surgery (no different than standard care) - Failed to consider psychological and decision-making harms from AI risk predictions - Risk predictions can induce fear, anxiety, refusal of beneficial treatments - NOT minimal risk for patients receiving potentially inaccurate, fear-inducing predictions - Inadequate informed consent process - Consent form likely didn’t adequately explain AI uncertainty, limitations, training data bias - No assessment of whether patients understood how to interpret probabilistic risk estimates - No provision for patients to opt out of receiving AI results while still participating in data collection - No safeguards for vulnerable populations - Black patients underrepresented in training data - No special protections or enhanced consent for minority patients receiving AI predictions

Corrective actions: - IRB should have required enhanced informed consent process - Should have required researcher to assess patient understanding before presenting AI results - Should have included option to opt out of receiving AI predictions - Should have required closer monitoring of study harms (patient refusal rates, psychological distress)

Principal Investigator (YOU) - Significant Responsibility:

Failures: - Inadequate informed consent when presenting AI risk results - Presented numbers (68%, 12%) without sufficient context about uncertainty - Didn’t explain AI training data limitations or potential bias for Black patients - Didn’t clearly state your own clinical judgment conflicted with AI prediction - Failed to recognize patient distress and intervene appropriately - Patient was “visibly distressed, tearful, hands shaking” after seeing AI report - Should have immediately provided counseling, additional explanation, possibly referral to psychology/social work - Instead, patient left shaken and made hasty decision to decline surgery - Inadequate follow-up after patient declined surgery - Called patient once, but didn’t pursue aggressive follow-up given life-threatening consequences of refusing surgery - Could have offered in-person meeting, brought in other surgeons for second opinion, involved patient’s primary care doctor - Protocol design flaw - Study designed to give AI predictions to all patients without control group receiving standard counseling - No way to assess if AI predictions helped or harmed decision-making

Legal liability: - Informed consent case: Patient could argue decision to decline surgery was based on inadequate informed consent about AI limitations - Standard: Would reasonable patient have declined surgery if adequately informed about AI uncertainty and training data bias? - Likely outcome: Settlement possible ($500K-$1.5M) for failure to obtain adequate informed consent and failure to prevent foreseeable harm

Research Coordinator - Some Responsibility:

Failures: - Consent conversation was cursory (patient asked one question, signed immediately) - No assessment of patient’s understanding - Presented study in overly positive light (“may help surgical team plan your care”) without balancing with potential harms

But: Research coordinator likely followed IRB-approved consent script; primary responsibility lies with PI and IRB

AI Startup Vendor - Some Responsibility:

Failures: - Algorithm likely not adequately validated on diverse patient populations before deploying in clinical study - May have overstated algorithm’s accuracy or applicability to minority patients - Should have warned that predictions for underrepresented groups (Black patients) have higher uncertainty

But: Vendor provided algorithm for research study; hospital/PI responsible for appropriate use and informed consent

Patient - No Liability, But Contributed to Outcome: - Patient made autonomous decision to decline surgery based on AI risk estimates - Patient didn’t fully understand AI limitations (but that’s informed consent failure, not patient’s fault) - Patient’s decision was rational given information provided (“68% complications too risky”)

4. How should informed consent have been conducted?

Enhanced Informed Consent Process for Experimental AI:

Step 1: Pre-consent educational session (before presenting AI results)

Research coordinator and surgeon together explain:

“We’re conducting research on artificial intelligence to predict surgical complications. If you participate, a computer algorithm will analyze your medical information and generate a risk estimate. Before you decide whether to participate, I want to explain what AI predictions mean and don’t mean:

What AI does well: - Analyzes large amounts of data quickly - Identifies patterns humans might miss - Provides statistical estimates based on thousands of similar patients

What AI doesn’t do well: - Doesn’t know YOU specifically (doesn’t account for your unique circumstances, motivation, family support) - Makes predictions with uncertainty (actual risk could be higher or lower) - May be less accurate for patients from groups underrepresented in training data

Limitations of THIS specific AI: - This algorithm is experimental. We’re testing how well it works - It was trained mostly on patients at academic medical centers, with fewer Black, Latino, and rural patients - We don’t know for certain how accurate it is for patients like you - It provides probabilities, not certainties

How you should use AI predictions: - AI provides one piece of information among many - Dr. Johnson’s clinical judgment based on 20 years of experience is equally (or more) important - We’ll discuss AI results together and help you interpret them - You can ask questions, challenge the AI, and make your own decision

Participation is voluntary: - You can decline to participate - You can participate in the study (allowing us to collect data) but opt out of receiving AI risk predictions - You can receive AI predictions but also get a second opinion from another surgeon - You can withdraw from the study at any time

Do you have questions before deciding whether to participate?“

Step 2: Assess patient’s understanding before enrolling

Ask patient: - “Can you explain in your own words what the AI does?” - “What does it mean if the AI says you have a 60% risk of complications?” - “If the AI prediction differs from Dr. Johnson’s judgment, whose assessment would you trust more?” - “What will you do if the AI prediction makes you worried about surgery?”

If patient doesn’t understand → More education before enrollment

Step 3: Present AI results with context and framing

If patient enrolls and receives AI risk prediction:

You (surgeon) meet with patient BEFORE showing numbers:

“The AI has generated your personalized risk estimate. Before I show you the numbers, I want to give you my clinical assessment. I’ve reviewed your medical history, your cancer, and your overall health. Based on my experience, I believe you’re a good candidate for surgery. Your risks are elevated compared to younger, healthier patients, but I’ve successfully operated on many patients with similar or higher risk. I believe surgery gives you the best chance for cure.”

Then show AI report:

“The AI predicts you have a 68% risk of complications and 12% risk of death within 30 days. These numbers are higher than average, and I want to explain what they mean and don’t mean:

What ‘complication’ means: - Includes minor complications (wound infection, treatable with antibiotics) - Includes moderate complications (pneumonia, requiring antibiotics and extra hospital days) - Includes serious complications (heart attack, requiring ICU care) - Most complications are treatable and don’t affect long-term outcomes

What ‘12% mortality risk’ means: - 88% chance you’ll survive surgery - This is a statistical average from thousands of patients - Your individual risk depends on factors AI can’t fully capture (your determination, family support, our surgical team’s experience)

Why these numbers might be OVERESTIMATES for you: - AI was trained mostly on white patients at academic centers - Fewer Black patients in training data → algorithm might be less accurate for you - Algorithm flagged your race as a risk factor, but this may reflect bias in training data rather than true medical risk

My clinical judgment: - I believe your true risk is lower than AI predicts, closer to 35-40% complication risk, 4-6% mortality risk - I’ve operated on patients with your risk profile successfully - Surgery gives you 60-70% chance of CURE from cancer - Without surgery, cancer will progress → 100% chance of death within 2-3 years

Comparing risks: - Surgery: 88% chance of survival, chance of cure - No surgery: 100% chance cancer spreads and becomes incurable

From my perspective, surgery is clearly your best option despite elevated risks. But this is your decision. What questions do you have?“

Step 4: Shared decision-making with time to process

Don’t rush decision: - “This is a lot of information. Take time to process it.” - “Talk to your wife, your family, your primary care doctor.” - “We can schedule another appointment to discuss further before surgery.”

Offer second opinion: - “If you’d like another surgeon’s perspective, I can arrange that.” - “Would you like to speak with patients who’ve had similar surgeries?”

Assess ongoing understanding: - At follow-up appointment: “Can you tell me your understanding of the risks and benefits of surgery?” - “What factors are most important to you in making this decision?” - “Do you have concerns we haven’t addressed?”

Step 5: Document informed consent process

In medical record: - “Patient enrolled in AI surgical risk prediction study. Extensive counseling provided about AI limitations, uncertainty, potential bias for underrepresented populations.” - “AI predicted 68% complication risk, 12% mortality risk. I explained these may be overestimates due to training data limitations. My clinical assessment: 35-40% complication risk, 4-6% mortality risk.” - “Patient expressed understanding of risks and benefits. Patient chose to proceed with surgery [or declined surgery]. Patient’s decision was informed, voluntary, and made after careful consideration.”

5. Key lessons for physicians on informed consent with AI:

Lesson #1: Experimental AI Requires Enhanced Informed Consent - Standard consent insufficient when AI is unvalidated, uncertain, or potentially biased - Patients must understand AI limitations, not just capabilities - IRBs should require enhanced consent for AI studies with decision-making implications

Lesson #2: AI Predictions Are Not Certainties - Probabilistic estimates have uncertainty - Patients often misinterpret percentages (68% sounds like “will definitely happen”) - Frame AI predictions clearly: “statistical average” not “your certain outcome”

Lesson #3: Disclose Training Data Limitations and Bias - If AI trained predominantly on non-representative populations, inform patients - “This AI may be less accurate for patients like you” is critical disclosure - Especially important for minority patients given pervasive AI training data bias

Lesson #4: Physician Judgment Must Contextualize AI Predictions - Don’t present AI numbers in isolation - Clearly state your clinical assessment and how it compares to AI - If you disagree with AI → explain why, help patient weigh both perspectives

Lesson #5: Assess Patient Understanding, Don’t Assume It - Use teach-back method: “Can you explain in your own words?” - Patients with limited health literacy may need more support interpreting AI predictions - Don’t rush consent process

Lesson #6: Consider Psychological Harms from AI Predictions - Risk predictions can induce fear, anxiety, refusal of beneficial treatments - IRBs and researchers must consider these harms in risk assessment - “Minimal risk” determination incorrect if AI predictions likely to alter patient decisions

Lesson #7: Offer Option to Opt Out of Receiving AI Predictions - Some patients may want AI to inform their care; others may find predictions unhelpful or distressing - Ethical to participate in research data collection without receiving AI results - Respect patient autonomy in how they use (or don’t use) AI information

Lesson #8: Follow Up Aggressively When Patients Refuse Beneficial Treatment After AI Predictions - If AI prediction leads to refusal of life-saving treatment → intervene immediately - Offer additional counseling, second opinions, time to reconsider - Don’t accept patient’s refusal as final after single conversation

Scenario 3: Automation Bias Leads to Missed Diagnosis

You’re an emergency medicine physician at a busy urban Level I trauma center. Your ED recently implemented an AI-powered chest X-ray interpretation system (FDA-cleared, widely deployed) that analyzes chest X-rays in real-time and flags abnormalities. The system is marketed as a “second reader” to reduce missed findings and improve diagnostic accuracy.

AI system details: - Vendor: Major medical imaging AI company - FDA 510(k) clearance: Yes - Reported sensitivity: 94% for pneumonia, 92% for pneumothorax, 91% for pulmonary edema - Deployment: Integrated with PACS (Picture Archiving and Communication System) - Workflow: Radiologist and ED physician both see AI outputs overlaid on images - Marketing claim: “Reduces missed diagnoses by 30%”

Implementation at your ED: - Go-live 6 months ago - All chest X-rays automatically analyzed by AI - AI generates color-coded overlays on images (red boxes around abnormalities) - ED physicians see AI analysis before or concurrently with radiologist preliminary read - Training: 2-hour online module + 1 in-person session on AI interface

Night shift - High-volume night (typical): - Volume: 47 patients in ED, 12 waiting to be seen - Staffing: You + 2 other ED physicians, 1 resident - Acuity: 3 critical (trauma, MI, stroke), 8 urgent, 36 non-urgent

11:30 PM - New patient arrival:

Patient: 34-year-old woman, presents with chest pain and shortness of breath - Chief complaint: “Chest pain and hard to breathe for 2 days” - History: Started 2 days ago, gradual onset, worse with deep breath, no radiation, no associated symptoms - Past medical history: No significant medical history, takes oral contraceptives - Social history: Non-smoker, works as teacher, no recent travel - Vital signs: Temp 98.4°F, HR 102, BP 118/74, RR 22, O2 sat 94% on room air (mildly decreased)

Your assessment (busy, high patient volume, trying to move quickly): - Differential diagnosis: Pneumonia, pleurisy, musculoskeletal pain, anxiety - Plan: Chest X-ray, consider CBC, D-dimer if concerned for pulmonary embolism

Chest X-ray ordered, completed at 11:45 PM

AI analysis (appears on PACS within 2 minutes): - AI output: “No acute abnormalities detected” - No red boxes or highlighted regions on X-ray - Confidence score: 87% (high confidence)

You review chest X-ray on PACS: - Your viewing time: 18 seconds (time-stamped in EHR) - Glance at image on workstation while simultaneously checking lab results for another patient - See AI analysis first: “No acute abnormalities detected” - Your interpretation: Quickly scan image, no obvious infiltrates, no large pneumothorax, heart size normal - Your conclusion: “Agree with AI, chest X-ray looks okay”

Physical exam: - Lungs: Clear to auscultation bilaterally (patient has some difficulty taking deep breaths due to pain, limiting exam) - Heart: RRR, no murmurs - Chest wall: No tenderness to palpation

Your diagnosis: Viral pleurisy vs. musculoskeletal chest pain

Plan: - D-dimer ordered “just to be safe” given oral contraceptive use - Reassure patient, NSAIDs for pain, return if symptoms worsen

D-dimer result: 1,247 ng/mL (normal <500, elevated)

Your interpretation: - D-dimer elevated but non-specific (can be elevated in many conditions) - Chest X-ray negative, patient clinically stable - Decision: Low suspicion for PE, don’t pursue CT angiography (CTA chest) - Reasoning: “Chest X-ray negative, probably viral pleurisy with some inflammation causing elevated D-dimer”

Discharge plan: - NSAIDs (ibuprofen 600mg Q6H) - Return precautions: If shortness of breath worsens, chest pain worsens, or if you develop fever - Primary care follow-up in 1 week - Patient discharged at 1:15 AM

Next day, 3 PM - Patient returns to ED:

Patient presentation: - Chest pain much worse, severe shortness of breath, light-headed - Vital signs: Temp 98.9°F, HR 128, BP 96/62, RR 32, O2 sat 88% on room air (significantly decreased) - Exam: Tachycardic, tachypneic, diaphoretic, in moderate distress

Day shift ED physician (different physician) management: - High suspicion for pulmonary embolism given tachycardia, hypoxia, pleuritic chest pain, oral contraceptive use, elevated D-dimer from prior visit - CTA chest ordered STAT

CTA chest result: - Bilateral pulmonary emboli (moderate-large clot burden in right and left main pulmonary arteries) - Right heart strain on imaging (RV dilation)

Patient management: - Admitted to ICU - Anticoagulation (heparin drip) - Supplemental oxygen - Hemodynamically stable after treatment, improves over 48 hours - Outcome: Survives, discharged on anticoagulation, no long-term sequelae

Root cause analysis triggered:

Quality and safety team reviews case:

Key question: Why was PE not diagnosed on initial ED visit despite pleuritic chest pain, tachycardia, mild hypoxia, oral contraceptive use, and elevated D-dimer?

Chest X-ray re-review by attending radiologist (not on call during initial visit):

Retrospective interpretation: - Subtle findings present on original X-ray: - Hampton’s hump (small wedge-shaped opacity in right lung base, consistent with pulmonary infarction from PE) - Westermark sign (focal oligemia in right mid-lung field, suggesting decreased vascularity from clot) - Enlarged right descending pulmonary artery (suggestive of pulmonary hypertension from PE)

Radiologist conclusion: “In retrospect, chest X-ray shows subtle but present findings suggestive of pulmonary embolism. These findings are easily missed on initial interpretation, especially in high-volume overnight setting, but should have raised suspicion for PE and prompted CTA.”

AI algorithm re-review:

Quality team runs original chest X-ray through AI system again: - AI output (consistent with original): “No acute abnormalities detected” - AI confidence: 87% - AI MISSED the subtle PE findings (Hampton’s hump, Westermark sign, enlarged PA)

Investigation: Why did AI miss PE findings?

AI vendor response: - “Our algorithm is trained to detect pneumonia, pneumothorax, pulmonary edema, masses, and fractures” - “Pulmonary embolism findings were not included in primary training objectives” - “Subtle PE findings (Hampton’s hump, Westermark sign) are rare in training dataset” - AI was not designed or validated to detect PE, despite being marketed as comprehensive chest X-ray analysis tool

Your cognitive process reconstructed (through interview during RCA):

Cognitive bias identified: Automation bias

You explain: - “I was very busy that night, 47 patients in ED, multiple critical cases” - “I looked at the chest X-ray, saw the AI said ‘no acute abnormalities,’ and agreed” - “I trusted the AI as a second reader. Thought if AI didn’t flag anything, I could move on quickly” - “In retrospect, I should have looked more carefully at the right lung base and pulmonary arteries” - “The AI output influenced my interpretation. I was looking to confirm ‘no abnormality’ rather than actively searching for findings”

Automation bias: Tendency to over-rely on automated systems and under-value contradictory information from other sources (clinical presentation, elevated D-dimer)

RCA findings:

Contributing factors to missed PE diagnosis: 1. Automation bias: Your over-reliance on AI “no abnormalities” output reduced your scrutiny of X-ray 2. High patient volume: Busy ED, limited time to carefully review each X-ray 3. AI system limitations: AI not trained to detect PE findings, but this was NOT disclosed to ED physicians 4. Inadequate training: Training module didn’t explain AI limitations, failure modes, or situations where AI might miss findings 5. Subtle findings: Hampton’s hump and Westermark sign are subtle, easily missed even without AI

RCA conclusion: “The missed diagnosis was multifactorial. The ED physician exhibited automation bias, over-relying on AI output. However, the AI system’s failure to detect PE findings (and lack of transparency about this limitation) contributed to the diagnostic error. The hospital bears some responsibility for deploying AI without adequately informing physicians of its limitations.”

Questions for Analysis:

1. What is automation bias and how did it contribute to the missed diagnosis?

Automation Bias Definition:

Automation bias is the propensity for humans to favor suggestions from automated systems and to ignore contradictory information from other sources, even when the automated system is incorrect (Goddard et al., 2012).

Two types of automation bias: 1. Errors of commission: Acting on incorrect automated advice (false positive) 2. Errors of omission: Failing to detect problems because automated system didn’t alert (false negative) ← This case

How automation bias occurred in this case:

Normal cognitive process WITHOUT AI: - ED physician orders chest X-ray - Physician carefully reviews X-ray, actively searching for abnormalities - Considers differential diagnosis: pneumonia, pneumothorax, PE, pleurisy - Weighs X-ray findings against clinical presentation (pleuritic pain, tachycardia, hypoxia, OCP use, elevated D-dimer) - Conclusion: Elevated D-dimer + pleuritic pain + hypoxia → high suspicion for PE → order CTA

Actual cognitive process WITH AI: - ED physician orders chest X-ray - AI analysis appears first: “No acute abnormalities detected” - Physician glances at X-ray (18 seconds viewing time) - Anchoring on AI output: Physician looks to CONFIRM AI assessment rather than independently search for findings - Satisficing behavior: “AI says normal, looks okay to me, I can move on” - Reduced scrutiny: Subtle PE findings (Hampton’s hump, Westermark sign) overlooked - Dismissal of contradictory information: Elevated D-dimer not pursued because “chest X-ray negative” - Conclusion: Viral pleurisy, discharge with NSAIDs

Psychological mechanisms underlying automation bias:

1. Trust in technology: - Belief that AI is more accurate/reliable than human judgment - FDA-cleared, “94% sensitive,” marketed as “reduces missed diagnoses” - Physician reasoning: “If AI doesn’t see it, it’s probably not there”

2. Cognitive offloading: - High workload environment (47 patients, busy night shift) - AI provides mental shortcut: “AI analyzed image thoroughly, I can focus on other tasks” - Trade-off: Efficiency (faster X-ray review) vs. accuracy (missed subtle findings)

3. Confirmation bias: - Once AI says “no abnormalities,” physician looks to confirm (not challenge) that assessment - Subtle findings dismissed as artifacts or normal variants - Contradictory data (elevated D-dimer) re-interpreted to fit “normal X-ray” narrative

4. Diffusion of responsibility: - AI as “second reader” creates perception of shared responsibility - Implicit reasoning: “Two readers (me + AI) both say normal → must be normal” - Reality: AI missed findings, physician also missed → both wrong, no safety net

Key lesson: Automation bias is NOT physician laziness or incompetence. It’s predictable cognitive phenomenon that occurs even in well-trained, experienced clinicians when using automated decision support systems.

2. What evaluation and implementation failures allowed this case to occur?

Failure #1: Inadequate Transparency About AI Limitations

What ED physicians should have been told BEFORE deployment: - “This AI is trained to detect pneumonia, pneumothorax, pulmonary edema, masses, and fractures” - “AI is NOT trained to detect pulmonary embolism findings (Hampton’s hump, Westermark sign, enlarged pulmonary arteries)” - “If clinical suspicion for PE, don’t rely on AI. Independently search for PE findings and order CTA if indicated” - “AI sensitivity for subtle findings is lower than for obvious abnormalities”

What ED physicians were actually told: - “AI detects abnormalities on chest X-ray with 92-94% sensitivity” - General statement, no mention of specific conditions AI does/doesn’t detect - Physicians assumed AI was comprehensive (detects all pathology)

Harm: Physician trusted AI “no abnormalities” output without knowing AI wasn’t designed to detect PE

Failure #2: Inadequate Training on Automation Bias

What training should have included: - “Automation bias is real. Clinicians tend to over-rely on AI, especially when busy” - “To avoid automation bias: Review image BEFORE looking at AI output, form your own impression, then use AI as check” - “Don’t use AI as shortcut in high-stakes cases (chest pain, dyspnea, trauma)” - “If clinical presentation doesn’t match AI output → trust your clinical judgment, not AI”

What training actually included: - How to use AI interface (technical training) - Examples of AI detecting pneumonia, pneumothorax (success cases) - No discussion of cognitive biases, failure modes, or when NOT to trust AI

Failure #3: Poor Workflow Design

Current workflow: AI output appears on PACS immediately, often BEFORE physician reviews image

Problem: Physician sees AI assessment first → anchoring bias

Better workflow: - Physician reviews image FIRST, documents preliminary interpretation - THEN views AI output as “second reader” - Compare physician interpretation to AI - Resolve discrepancies (Why did I see finding and AI didn’t? Why did AI flag finding I didn’t see?)

This workflow reduces automation bias by preserving physician’s independent assessment

Failure #4: No Clinical Decision Support for PE Risk Stratification

AI provided: Image interpretation only (“no abnormalities”)

AI didn’t provide: Clinical risk stratification integrating imaging + clinical data

What would have helped: - EHR-based alert: “Patient has elevated D-dimer + oral contraceptive use + pleuritic chest pain + hypoxia → Wells Score suggests moderate-high PE probability → Consider CTA” - This integrated clinical decision support might have overcome negative chest X-ray interpretation

Failure #5: No Real-Time Monitoring for Automation Bias

Hospital should have tracked: - Physician viewing times for X-rays before vs. after AI deployment - Diagnostic error rates (missed PE, missed pneumothorax, etc.) before vs. after AI - Correlation between AI false negatives and physician misses (Are physicians missing cases AI misses?)

This monitoring would have identified automation bias pattern BEFORE adverse events occurred

Failure #6: Vendor Misrepresentation

Vendor marketing: “Comprehensive chest X-ray analysis” with “92-94% sensitivity”

Reality: AI trained on specific findings (pneumonia, pneumothorax, edema), NOT comprehensive

Misrepresentation: Led physicians to believe AI detected all chest X-ray pathology

Vendor should have: - Clearly disclosed training objectives (what AI detects vs. doesn’t detect) - Provided performance data specific to PE findings (sensitivity for Hampton’s hump, Westermark sign) - Warned that AI is not validated for PE detection

3. Who is liable for the missed PE diagnosis?

ED Physician (YOU) - Primary Liability:

Standard of care: - ED physician evaluating chest pain + dyspnea patient has duty to consider and rule out life-threatening diagnoses (MI, PE, pneumothorax, aortic dissection) - Patient had multiple PE risk factors: oral contraceptives, pleuritic chest pain, tachycardia, hypoxia, elevated D-dimer - Wells Score (PE probability): 3 points (tachycardia >100) + 3 points (alternative diagnosis less likely) + 1.5 points (OCPs) = 7.5 pointshigh probability of PE → CTA indicated - Your management: Did not calculate Wells Score, did not order CTA despite elevated D-dimer

Plaintiff’s argument: - Deviation from standard of care: Patient met criteria for CTA (high Wells Score, elevated D-dimer), but you didn’t order it - Automation bias: You over-relied on AI “no abnormalities” output, reduced your independent clinical judgment - Inadequate X-ray review: 18-second viewing time insufficient for careful evaluation - Proximate cause: Missed PE diagnosis → delayed treatment → patient returned 15 hours later in worse condition (hypoxia, tachycardia, right heart strain) → could have had fatal PE

Defense arguments: - Busy ED: 47 patients, high acuity, reasonable to rely on AI as “second reader” for efficiency - Subtle findings: Hampton’s hump and Westermark sign are easily missed even by experienced radiologists - AI false negative: AI said “no abnormalities” → reasonable reliance on FDA-cleared technology - Good outcome: Patient ultimately diagnosed, treated, survived without permanent harm

Counter to defense: - Standard of care: Physician responsible for independent clinical judgment regardless of AI output - AI is adjunct, not replacement: Physician can’t delegate diagnostic responsibility to algorithm - Clinical presentation trumps imaging: Patient’s risk factors and elevated D-dimer warranted CTA even if chest X-ray appeared normal - Automation bias is foreseeable: Hospital trained you to use AI, should have warned about over-reliance

Likely outcome: - Settlement probable ($200K-$600K range) - Lower settlement because patient survived without permanent harm - Expert testimony will acknowledge automation bias but emphasize physician’s independent duty

Hospital - Secondary Liability (Vicarious + Corporate Negligence):

Corporate negligence arguments: - Inadequate AI training: Didn’t warn physicians about AI limitations (no PE detection) - Poor workflow design: AI output shown before physician review → predictable automation bias - Failed to monitor for automation bias: No tracking of physician viewing times, diagnostic error rates post-AI deployment - Created dangerous environment: Deployed AI that increased automation bias risk without adequate safeguards

Vicarious liability: - Hospital liable for your actions as employee

Likely outcome: - Hospital shares liability - Settlement includes hospital + physician (malpractice insurance covers)

AI Vendor - Difficult to Establish Liability:

Product liability arguments: - Failure to warn: Didn’t adequately disclose AI wasn’t trained for PE detection - Misrepresentation: Marketed as “comprehensive chest X-ray analysis” but wasn’t comprehensive - Design defect: AI should have included PE findings in training objectives

Vendor defenses: - FDA clearance: Regulatory approval based on submitted validation data (pneumonia, pneumothorax, edema) - Intended use: AI marketed as adjunct/second reader, not replacement for physician judgment - No direct patient relationship: Vendor sells to hospital; physician responsible for appropriate use

Likely outcome: - Vendor liability very difficult to establish - Strong defense: FDA clearance, user responsibility - Might face regulatory scrutiny if FDA investigates misrepresentation claims

4. How can automation bias be mitigated in clinical practice?

Strategy #1: Workflow Design to Preserve Independent Judgment

Problem: Seeing AI output first creates anchoring bias

Solution - “Independent First” Workflow: 1. Physician reviews image FIRST without AI output visible 2. Physician documents preliminary interpretation in EHR (“Preliminary read: No acute abnormalities” or “Preliminary read: Possible opacity RLL, DDx pneumonia vs. infarct”) 3. THEN physician views AI output 4. Compare physician interpretation to AI: - Agreement: Move forward - Disagreement: Investigate (“Why did I see finding and AI didn’t? Why did AI flag finding I didn’t see?”)

Benefit: Forces independent assessment before AI influence

Strategy #2: Training on Automation Bias Recognition

Include in all AI training: - Definition and examples of automation bias - Cognitive psychology: Why humans over-rely on automation - Real-world cases where automation bias led to errors (e.g., aviation, radiology) - Self-assessment: “Do I spend less time reviewing images when AI says ‘normal’?” (Probably yes → recognize this tendency)

Simulation training: - Present cases where AI is wrong (false positives, false negatives) - Practice detecting AI errors - Build cognitive habit: “Always question the AI”

Strategy #3: Transparency About AI Limitations

Require vendors to provide: - Specific training objectives (AI detects pneumonia, pneumothorax, edema but NOT PE, aortic dissection, subtle masses) - Failure modes (“AI struggles with subtle findings, overlapping structures, unusual presentations”) - Performance data by subgroup (performance in obese patients, elderly, portable X-rays)

Require hospitals to communicate: - Share limitations with all users in clear, concise format (one-page reference guide: “What AI Does/Doesn’t Detect”) - Periodic refresher training on limitations

Strategy #4: Clinical Decision Support Integration

Don’t use AI imaging interpretation in isolation

Integrate with clinical risk scores: - Patient with chest pain + dyspnea → EHR calculates Wells Score automatically - If Wells Score high + D-dimer elevated → Alert: “HIGH PE PROBABILITY - Consider CTA even if chest X-ray negative”

Benefit: Clinical context overrides negative imaging when appropriate

Strategy #5: Real-Time Monitoring and Feedback

Track metrics that reveal automation bias: - Physician viewing times before vs. after AI deployment (Are physicians spending less time reviewing images? → automation bias) - Diagnostic concordance: How often physician agrees with AI? (>95% agreement → possible automation bias) - Error rates: Are physicians missing cases AI misses? (Correlation between AI false negatives and physician misses → automation bias)

Feedback to physicians: - Quarterly reports: “Your average X-ray viewing time: 15 seconds (department average: 22 seconds). Consider whether you’re adequately reviewing images independently.” - Case reviews when AI wrong and physician agreed: “Let’s discuss this case where both you and AI missed the finding. How can we avoid this in the future?”

Strategy #6: Promote Healthy Skepticism

Culture change: - “AI is tool, not truth” (AI makes suggestions, physicians decide) - “Trust but verify” (Use AI but always check independently) - “When in doubt, ignore AI and trust your clinical judgment”

Avoid language that promotes automation bias: - “AI read this as normal” - “AI didn’t flag abnormalities, but I’ll review independently”

Strategy #7: Limit AI Use in High-Stakes, High-Workload Settings

When automation bias risk is highest: - Busy ED overnight shifts (fatigue, high patient volume) - High-stakes cases (chest pain, trauma, altered mental status)

Consider: - Double-read by human: High-stakes cases get attending radiologist read in real-time (not just AI + ED physician) - Delayed AI output: In very high-stakes cases, don’t show AI output until physician has documented preliminary interpretation

5. Key lessons on automation bias and AI:

Lesson #1: Automation Bias is Predictable and Preventable - Well-documented cognitive bias across multiple domains (aviation, radiology, anesthesia) - Occurs even in experienced, well-trained clinicians - Can be mitigated through workflow design, training, and monitoring

Lesson #2: AI as “Second Reader” Can Paradoxically Reduce Diagnostic Accuracy - Intended benefit: AI catches findings physician misses - Actual harm: Physician misses findings because AI didn’t flag them (over-reliance) - Net effect depends on balance: Does AI catch more than it causes physicians to miss?

Lesson #3: Physicians Remain Fully Responsible for AI-Assisted Decisions - Legal standard: Physician liable for missed diagnosis even when AI wrong - “AI said it was normal” is NOT malpractice defense - Independent clinical judgment required regardless of AI output

Lesson #4: AI Limitations Must Be Transparently Disclosed - Vendors should clearly state what AI does/doesn’t detect - Hospitals should communicate limitations to users - “Comprehensive” or “94% sensitive” marketing without specifics is misleading

Lesson #5: Workflow Design Matters - Showing AI output BEFORE physician review increases automation bias - “Independent first” workflow preserves physician judgment - Small design changes have large impact on cognitive biases

Lesson #6: Training on AI Should Include Failure Modes and Cognitive Biases - Technical training insufficient (how to use AI interface) - Must include: When AI fails, how to recognize over-reliance, strategies to maintain independent judgment - Simulation training with AI errors valuable

Lesson #7: Busy, High-Workload Environments Amplify Automation Bias - Fatigue, time pressure, cognitive overload increase reliance on automation - Extra safeguards needed for overnight ED, high-volume clinics - Don’t assume AI will compensate for physician fatigue (may make it worse)

Lesson #8: Monitoring for Automation Bias Should Be Standard Practice - Track viewing times, diagnostic concordance, error patterns - Identify physicians at high risk for automation bias - Provide feedback and additional training


References