19 Medical Ethics, Bias, and Health Equity
AI in medicine raises profound ethical questions: algorithmic bias, health equity, informed consent, autonomy, and the changing physician-patient relationship. This chapter examines ethical frameworks for responsible AI deployment in healthcare. You will learn to:
- Apply bioethical principles (autonomy, beneficence, non-maleficence, justice) to medical AI
 - Recognize and mitigate algorithmic bias and health disparities
 - Navigate informed consent challenges with AI-assisted care
 - Assess fairness and equity implications of AI systems
 - Understand professional responsibilities when using AI
 - Apply ethical frameworks for AI development and deployment
 
Essential for all physicians and healthcare leaders.
19.1 Introduction
Medical AI is often framed as a purely technical challenge: train algorithms on medical data, validate performance, deploy in clinical settings. But this framing obscures profound ethical questions:
- Who benefits from medical AI? Patients at well-resourced academic centers, or also those in rural clinics and safety-net hospitals?
 - Who is harmed when AI fails? Disproportionately, underrepresented populations excluded from training data.
 - Who decides when AI is “good enough”? Developers optimizing for overall accuracy, or patients for whom the algorithm performs poorly?
 - How does AI change the physician-patient relationship? Does it enhance clinical judgment or erode physician autonomy and accountability?
 
These are not hypothetical concerns. As this chapter documents, medical AI has already perpetuated racial bias (Obermeyer et al. 2019), performed inequitably across skin tones (Daneshjou et al. 2022), and raised fundamental questions about informed consent and physician responsibility (Char, Shah, and Magnus 2018).
The ethical deployment of medical AI requires more than technical validation. It demands intentional attention to fairness, equity, transparency, and accountability—principles central to medical professionalism but often absent from algorithmic development.
This chapter applies traditional bioethical frameworks to medical AI, examines documented cases of bias and inequity, and provides practical guidance for physicians navigating ethical challenges in AI-augmented practice.
19.2 Bioethical Principles Applied to Medical AI
The four principles of biomedical ethics—autonomy, beneficence, non-maleficence, and justice—provide a framework for evaluating medical AI (Char, Shah, and Magnus 2018).
19.2.1 Autonomy and Informed Consent
The Principle: Patients have the right to make informed decisions about their care, including whether AI influences diagnosis or treatment recommendations.
Challenges in Medical AI:
1. Complexity and Opacity: Modern AI (especially deep learning) operates as “black boxes”—even developers cannot fully explain why a model makes specific predictions. How can patients provide informed consent to something no one fully understands?
2. Unclear Risks: Traditional medical interventions have well-characterized risk profiles (e.g., “1% risk of infection”). AI risks are less clear: What is the risk of a false negative? How does the algorithm perform for patients like me (my race, age, comorbidities)?
3. Lack of Alternatives: If AI is integrated into standard workflows (e.g., radiology interpretation, sepsis screening), can patients meaningfully opt out? What alternatives exist?
4. Hidden AI: Many AI systems operate silently in the background (EHR-based risk scores, automated lab flagging). Patients may be unaware AI influenced their care.
Ethical Framework for Informed Consent (Char, Shah, and Magnus 2018):
Always Inform When: - AI directly influences diagnosis or treatment decisions - AI performance varies across patient subgroups (race, age, sex) - AI is experimental or not yet widely validated - Reasonable alternatives exist
Transparency Not Required (But Permitted) When: - AI provides purely administrative functions (scheduling, billing) - AI assists but doesn’t replace physician judgment - AI is fully validated and performs equitably across populations
Key Principle: If a reasonable patient would want to know AI is involved, inform them.
Practical Approaches:
For High-Stakes AI (e.g., cancer detection, surgical risk prediction): > “In evaluating your imaging, I use an AI system that assists in detecting abnormalities. This system has been validated in large studies but is not perfect. I review all AI findings independently and make final diagnostic and treatment recommendations based on my clinical judgment. Do you have questions about how this system works or how it might affect your care?”
For Lower-Stakes AI (e.g., diabetic retinopathy screening): > “This retinal camera uses an FDA-cleared AI system to detect diabetic retinopathy. The system has been tested in large studies and performs accurately. If the AI detects anything concerning, we’ll refer you to an ophthalmologist for confirmation and treatment.”
Documentation: Note AI use in medical record: “Diagnostic decision supported by [AI system name]. Independent physician review confirms/modifies AI findings.”
19.2.2 Beneficence: AI Must Benefit Patients
The Principle: Medical interventions should improve patient outcomes. “Do good.”
Challenges in Medical AI:
1. Efficiency ≠Benefit: Many AI systems improve workflow efficiency (faster reads, reduced documentation burden) but don’t clearly improve patient outcomes. Efficiency benefits institutions and physicians; do they benefit patients?
2. Surrogate Outcomes: AI often validated on surrogate outcomes (detection accuracy, sensitivity, specificity) rather than clinical outcomes (mortality, morbidity, quality of life). High accuracy doesn’t guarantee patient benefit (Nagendran et al. 2020).
3. Unintended Consequences: AI can have hidden downsides: alert fatigue, deskilling of physicians, over-diagnosis from ultra-sensitive algorithms.
Evidence Standard for Beneficence:
Medical AI should meet the same evidence standard as medications or procedures:
- Preclinical validation: Retrospective accuracy studies (analogous to Phase I/II trials)
 - Clinical validation: Prospective studies demonstrating clinical benefit (analogous to Phase III trials)
 - Post-market surveillance: Ongoing monitoring for harms and performance drift (analogous to Phase IV)
 
Critical Question: Has this AI been shown to improve patient outcomes in prospective studies, or only to achieve high accuracy in retrospective datasets?
Most medical AI has only retrospective validation. This is insufficient for claiming beneficence (Topol 2019).
Example: IDx-DR (Diabetic Retinopathy Screening)
âś… Demonstrated Beneficence: - Prospective trial showed AI enabled screening in primary care clinics where ophthalmologists unavailable - Increased screening rates for underserved populations - Detected vision-threatening retinopathy that would have otherwise been missed - Clear patient benefit: prevented vision loss through earlier detection (AbrĂ moff et al. 2018)
Example: Epic Sepsis Model
❌ Failed to Demonstrate Beneficence: - Retrospective studies suggested high accuracy - External validation showed poor real-world performance (Wong et al. 2021) - High false positive rate caused alert fatigue - No evidence deployment improved sepsis outcomes - Efficiency goal (early sepsis detection) did not translate to patient benefit
19.2.3 Non-Maleficence: First, Do No Harm
The Principle: Medical interventions should not harm patients. When harm is unavoidable (e.g., chemotherapy side effects), benefits must outweigh harms.
Harms from Medical AI:
1. Direct Harms: - False negatives: Missed diagnoses (cancer, fractures, strokes) leading to delayed treatment - False positives: Unnecessary testing, procedures, anxiety, overtreatment - Incorrect treatment recommendations: AI suggesting wrong medication, wrong dose, contraindicated therapy
2. Indirect Harms: - Alert fatigue: Too many AI warnings causing physicians to ignore all alerts, including true positives - Deskilling: Over-reliance on AI eroding clinical skills (radiologists losing ability to detect findings without AI) - Delayed care: AI-driven workflows introducing bottlenecks (e.g., waiting for AI report before human review)
3. Equity Harms: - Algorithmic bias: AI performing worse for underrepresented groups, causing disparate harm (Obermeyer et al. 2019; Daneshjou et al. 2022) - Access disparities: Beneficial AI only available to wealthy institutions, widening quality gaps
4. Psychological Harms: - Loss of trust: Patients losing confidence in physicians who defer to algorithms - Dehumanization: Care feeling automated and impersonal
Risk Mitigation Strategies:
Before Deployment: 1. Demand prospective validation in populations similar to yours (Nagendran et al. 2020) 2. Assess subgroup performance: How does AI perform for your patient demographics? 3. Understand failure modes: How and why does the AI fail? 4. Ensure human oversight: AI should augment, not replace, physician judgment
During Use: 5. Monitor real-world performance: Does AI perform as expected in your setting? 6. Track harms: Capture false negatives, alert fatigue, workflow disruptions 7. Maintain clinical skills: Don’t let AI erode your ability to practice without it 8. Preserve physician final authority: Algorithms recommend; physicians decide
After Adverse Events: 9. Root cause analysis: Was AI contributory? How can recurrence be prevented? 10. Reporting mechanisms: Report AI failures to vendors, FDA (if applicable), and institutional quality/safety teams
Precautionary Principle: When AI evidence is uncertain, err on the side of caution. Burden of proof for safety and efficacy rests with those deploying AI, not with patients who may be harmed (vayena2018machine?).
19.2.4 Justice and Health Equity
The Principle: Medical resources should be distributed fairly. Benefits and burdens should not fall disproportionately on particular groups.
Justice Challenges in Medical AI:
1. Training Data Bias: AI trained predominantly on data from well-resourced academic centers, affluent populations, and racial/ethnic majorities performs worse for underrepresented groups (gichoya2022equity?).
2. Access Disparities: - Advanced AI often deployed first at wealthy institutions - Rural, safety-net, and under-resourced hospitals lack infrastructure for AI - Widens existing quality gaps between haves and have-nots
3. Algorithmic Amplification of Bias: AI can amplify existing healthcare disparities: - If training data reflects biased medical practice (e.g., Black patients receiving less pain medication), AI learns and perpetuates that bias - If outcome proxies are biased (e.g., healthcare costs correlating with access, not need), AI recommendations will be biased (Obermeyer et al. 2019)
4. Representation in AI Development: - Lack of diversity in AI development teams can lead to blind spots about bias and harm - Lack of diversity in leadership means equity considerations may be deprioritized
19.3 Algorithmic Bias: Documented Cases and Lessons
Algorithmic bias in medicine is not theoretical—it’s documented, measurable, and consequential.
19.3.1 Case 1: Racial Bias in Healthcare Resource Allocation
The Obermeyer Study (Science, 2019) (Obermeyer et al. 2019):
Background: - Commercial algorithm used by healthcare systems to allocate care management resources - Predicted which patients would benefit from extra support (care coordination, disease management) - Used healthcare costs as proxy for healthcare needs
The Bias: - At any given risk score, Black patients were significantly sicker than white patients - Algorithm systematically under-predicted risk for Black patients - Result: Black patients needed to be much sicker than white patients to receive same level of care
Why It Happened: - Healthcare costs reflect access to care, not just medical need - Black patients face barriers to accessing care (insurance, transportation, discrimination), leading to lower healthcare spending despite higher illness burden - Algorithm learned that “lower spending = lower need,” perpetuating inequity
Impact: - Affected millions of patients across U.S. healthcare systems - Reduced number of Black patients flagged for high-risk care management programs by >50% - Vendors corrected algorithm after publication, but similar bias likely exists in other systems
Lessons: - Choice of outcome variable matters: “Cost” and “need” are not the same - Historical biases propagate: If training data reflects biased systems, AI learns bias - Disparities can be subtle: Algorithm didn’t explicitly use race, but outcomes were racially biased - External auditing essential: Bias was discovered by independent researchers, not algorithm developers
19.3.2 Case 2: Dermatology AI Bias Across Skin Tones
Multiple Studies (Daneshjou et al. 2022; Esteva et al. 2017):
Background: - AI for skin cancer detection trained predominantly on images of light skin - Dermatology training datasets severely underrepresent darker skin tones (Fitzpatrick IV-VI)
The Bias: - AI performance degrades on darker skin tones - Higher false negative rates for melanoma in Black and Latino patients - Risk: Delayed cancer diagnosis in precisely the populations with worse melanoma outcomes
Why It Happened: - Training data reflect existing disparities: dermatology textbooks and databases predominantly feature light skin - Developers didn’t intentionally exclude dark skin—they used available data - Performance testing often doesn’t stratify by skin tone, hiding the problem
Lessons: - Representation in training data is critical - Performance must be evaluated across relevant subgroups (not just overall accuracy) - “Color-blind” algorithms are not fair algorithms—ignoring race/ethnicity can perpetuate disparities - Field-specific challenges: Dermatology must actively recruit diverse image datasets
19.3.3 Case 3: Pulse Oximetry Bias
Recent Findings (sjoding2020racial?):
Background: - Pulse oximeters measure oxygen saturation non-invasively - Critical for managing COVID-19, sepsis, respiratory failure - Generally considered accurate and unbiased
The Bias: - Pulse oximeters overestimate oxygen saturation in Black patients (compared to arterial blood gas) - Black patients more likely to have hidden hypoxemia (low oxygen despite “normal” pulse ox) - May delay recognition of deterioration and treatment escalation
Why It Happened: - Medical devices calibrated and validated primarily on white participants - Skin pigmentation affects light absorption, but devices not adjusted for this - Decades of use before bias recognized
Lessons: - Bias exists even in established, widely-used technologies - Real-world performance monitoring essential (not just initial validation) - Equity requires intentionality—assuming fairness is insufficient - Simple technologies can have complex bias (not just AI-specific problem)
19.4 Mitigating Bias and Promoting Equity
Addressing algorithmic bias requires action across the AI lifecycle: development, validation, deployment, and monitoring.
19.4.1 During AI Development
1. Diverse and Representative Training Data: - Ensure training datasets include adequate representation of all relevant demographic groups - Stratify by race, ethnicity, sex, age, geography, socioeconomic status - Partner with diverse healthcare institutions (not just academic centers)
2. Diverse Development Teams: - Include clinicians who care for underserved populations - Involve ethicists, health equity experts, community representatives - Diversity in team increases likelihood of identifying potential biases early
3. Equity as a Design Goal: - Define fairness explicitly: Equal performance across groups? Equal access? Equal benefit? - Different fairness definitions involve trade-offs—be transparent about choices (parikh2019addressing?) - Test for bias proactively (don’t assume fairness)
4. Choice of Outcome Variables: - Scrutinize proxies for actual outcomes (cost ≠need, admissions ≠severity) - Consider how historical disparities may bias outcome definitions - Validate that proxy measures don’t encode existing inequities
19.4.2 During Validation
5. Subgroup Analysis is Mandatory: - Report algorithm performance stratified by race, ethnicity, sex, age, insurance status - Don’t hide subgroup disparities in overall accuracy metrics - Identify where performance is acceptable vs. inadequate (Nagendran et al. 2020)
6. External Validation in Diverse Populations: - Validate in institutions serving underrepresented populations - Test across geographic, socioeconomic, and practice setting diversity - Don’t assume generalizability—prove it (vabalas2019machine?)
7. Assess Calibration, Not Just Discrimination: - Discrimination (AUC-ROC) measures ability to rank risk - Calibration measures whether predicted probabilities match actual outcomes - Calibration often differs across subgroups even when discrimination appears similar - Miscalibration can lead to disparate treatment
8. Consider Intersectionality: - Bias often compounds across multiple identities (e.g., Black + female, or rural + elderly) - Test performance in intersectional subgroups, not just single demographic categories
19.4.3 During Deployment
9. Transparent Communication: - Disclose known limitations and subgroup performance differences - Don’t hide uncertainties or deployment risks (Char, Shah, and Magnus 2018) - If AI performs worse for certain groups, inform clinicians and patients
10. Human Oversight and Physician Autonomy: - AI should inform, not dictate, decisions - Physicians must have ability to override algorithms when clinical judgment differs - Preserve physician accountability (can’t blame algorithm for bad decisions)
11. Equitable Access: - Ensure beneficial AI available across practice settings (not just wealthy institutions) - Consider cost, infrastructure requirements, and workflow fit - Avoid creating two-tiered healthcare: AI-augmented for the wealthy, standard care for everyone else
19.4.4 During Ongoing Monitoring
12. Continuous Performance Monitoring: - Track real-world performance across demographic subgroups - Monitor for performance drift over time (models degrade as clinical practice evolves) - Establish thresholds for acceptable performance and trigger investigations when crossed (gichoya2022equity?)
13. Adverse Event Reporting: - Create mechanisms for clinicians and patients to report suspected AI harms - Investigate patterns suggesting bias (e.g., more false negatives in particular group) - Share lessons learned across institutions
14. Regular Bias Audits: - Independent audits of AI fairness (not just self-reporting by vendors) - Involve community stakeholders and equity experts - Make audit results public (transparency increases accountability)
15. Willingness to Decommission: - If AI is found to perpetuate bias and cannot be corrected, stop using it - Don’t continue harmful AI just because it’s already deployed - Patient welfare > sunk costs
19.5 Informed Consent in the Age of Medical AI
Traditional informed consent assumes patients can understand their condition, treatment options, risks, and benefits. AI challenges these assumptions.
19.5.1 Challenges to Informed Consent
1. Complexity: Explaining logistic regression is difficult; explaining deep neural networks with millions of parameters is essentially impossible. How can patients consent to something they can’t understand?
2. Uncertainty: AI risks are often unknown: “We don’t know how often this algorithm fails in patients like you because it hasn’t been studied in your demographic group.” Can consent be truly informed when key information is missing?
3. Voluntariness: If AI is integrated into standard workflows, can patients decline? If opting out means foregoing evidence-based care, is consent truly voluntary?
4. Hidden AI: Many AI systems operate invisibly: risk scores auto-calculated in EHRs, imaging findings flagged by algorithms, lab values interpreted by AI. Patients often unaware AI influenced their care.
19.5.2 Practical Approaches to AI Consent
Tiered Consent Framework (vayena2018machine?):
Tier 1: Explicit Informed Consent Required
When: - Experimental or investigational AI - AI with known subgroup performance disparities - AI that directly determines treatment (not just advises) - High-stakes decisions (cancer diagnosis, surgical risk prediction)
How: - Document AI use in consent forms - Explain in lay terms what AI does - Disclose known limitations and failure modes - Offer alternatives if available - Allow opt-out
Tier 2: Transparent Notification (But Not Explicit Consent)
When: - AI is well-validated and performs equitably - AI assists physician but doesn’t replace judgment - Physician retains final authority over decisions - Reasonable alternatives exist
How: - Inform patients AI is used - Explain role in clinical workflow - Reassure physician oversight - Answer questions if asked
Tier 3: General Disclosure (No Specific Notification)
When: - AI provides administrative functions (scheduling, billing) - AI doesn’t influence clinical decisions - AI performs purely background tasks (imaging enhancement, data standardization)
How: - General institutional disclosure (e.g., on website, in patient handbook) - No specific per-encounter notification needed
Practical Example (Radiology AI):
Poor Consent: > “We use advanced technology to interpret your scans. Sign here.”
Better Consent: > “Your chest X-ray will be analyzed by both a radiologist and an AI system that detects abnormalities. The AI has been trained on thousands of X-rays and helps ensure nothing is missed. However, the radiologist is responsible for the final interpretation and will review all AI findings. The AI works well but isn’t perfect—it sometimes flags normal findings or misses subtle abnormalities. Do you have questions about this process?”
Best Consent (High-Stakes, Experimental): > “We’re offering participation in a study using AI to predict surgical complications. The AI analyzes your medical record and provides a risk estimate. This is experimental—the AI hasn’t been widely validated, and we don’t know how accurate it is for patients with your specific characteristics. If you participate, the AI’s predictions will be shared with you and your surgical team to inform decision-making, but you and your surgeon will make the final decision about surgery. You can decline to participate and still receive standard preoperative evaluation. Would you like to participate?”
19.5.3 Special Considerations for Vulnerable Populations
Pediatrics: - Parental consent + child assent (age-appropriate) - Consider developmental capacity to understand AI - Protect children from harms of poorly validated pediatric AI (most AI trained on adults)
Cognitive Impairment: - Surrogate decision-makers may not understand AI - Simplify explanations without oversimplifying risks - Document consent process carefully
Language Barriers: - Provide consent information in patient’s preferred language - Use professional interpreters (not family members) for complex AI discussions - Ensure cultural appropriateness of consent process
Low Health Literacy: - Use plain language, avoid jargon - Visual aids (diagrams, infographics) can help - Teach-back method: “Can you tell me in your own words how this AI will be used?”
19.6 Professional Responsibilities in the AI Era
Medical AI doesn’t eliminate physician responsibility—it increases it.
19.6.1 Physician as Steward of AI
1. Understand Before Using: - Don’t use AI as a black box: “I don’t know how it works, but it’s accurate” - Understand (in general terms): What data does AI use? How was it trained? What are known limitations? - Know how AI was validated and in which populations
2. Maintain Independent Judgment: - AI recommendations are inputs to decision-making, not final decisions - Physician must independently assess patient, formulate differential, consider AI output in clinical context - Avoid automation bias (uncritically accepting AI recommendations)
3. Recognize and Override When Appropriate: - If AI recommendation conflicts with clinical judgment, investigate - Don’t defer to algorithm when patient-specific factors (not captured by AI) are relevant - Document reasoning when overriding AI
4. Protect Patient Interests: - Advocate for patients over algorithmic efficiency - If AI-driven workflow harms patient (delays, errors, loss of personalized care), escalate concerns - Professional obligation to prioritize patient welfare over institutional AI investments
19.6.2 Physician as Advocate for Equity
5. Demand Evidence of Fairness: - Ask: “How does this AI perform for my patient population?” - Refuse to use AI with known bias unless no alternative exists and bias is disclosed to patients
6. Monitor for Disparate Impact: - If you suspect AI performs worse for certain patients, document and report - Advocate for inclusive validation and bias mitigation
7. Ensure Equitable Access: - Support policies ensuring beneficial AI available to all patients (not just those at elite institutions)
19.6.3 Physician as Learner
8. Stay Current: - Medical AI evolves rapidly—yesterday’s evidence may be outdated - Engage with medical AI literature (not just vendor claims) - Participate in institutional AI governance and education
9. Teach Others: - Educate trainees, colleagues, and patients about AI capabilities and limitations - Model appropriate AI use (thoughtful integration, not blind adherence)
19.7 Ethical Frameworks for Institutional AI Governance
Healthcare institutions deploying AI need structured governance to ensure ethical use (Char, Shah, and Magnus 2018).
19.7.1 Core Components of AI Governance
1. AI Ethics Committee: - Multidisciplinary: clinicians, ethicists, informaticists, legal, community representatives - Reviews proposed AI deployments for ethical concerns - Authority to approve, modify, or reject AI adoption
2. Equity Impact Assessment: - Required before deploying AI - Analyzes potential disparate impact on vulnerable populations - Identifies mitigation strategies
3. Transparency Requirements: - AI systems must be documented: purpose, training data, validation studies, known limitations - Performance data (including subgroup performance) made available to clinicians - Patients informed about AI use (tiered consent approach)
4. Ongoing Monitoring: - Real-world performance tracking (overall and subgroup) - Adverse event reporting mechanisms - Regular bias audits
5. Accountability: - Clear designation of responsibility (who is accountable when AI fails?) - Physician retains ultimate authority and responsibility - Vendors held accountable for undisclosed risks or misrepresented performance
6. Sunset Provisions: - AI deployment isn’t permanent—reevaluate periodically - Decommission AI that performs poorly, perpetuates bias, or becomes outdated
19.8 Conclusion: Toward Ethical Medical AI
Medical AI holds enormous promise: more accurate diagnosis, personalized treatment, equitable access to specialist expertise. But realizing this promise requires more than technical innovation—it demands ethical intentionality.
The history of medical AI thus far includes both successes (IDx-DR enabling diabetic retinopathy screening in underserved communities) and failures (algorithms perpetuating racial bias in resource allocation). The difference lies not in the technology itself, but in the values and priorities of those who develop, validate, deploy, and use it (Obermeyer et al. 2019; Char, Shah, and Magnus 2018).
Ethical medical AI requires:
- Centering equity: Diverse training data, subgroup validation, bias mitigation, equitable access
 - Respecting autonomy: Informed consent, transparency, patient opt-out options
 - Demonstrating benefit: Prospective validation of clinical outcomes, not just retrospective accuracy
 - Preventing harm: Rigorous testing, ongoing monitoring, physician oversight, accountability mechanisms
 - Physician stewardship: Understanding AI, maintaining judgment, protecting patient interests
 
The goal is not AI-free medicine (that ship has sailed), nor is it uncritical AI adoption. The goal is AI that embodies the values of medicine: commitment to patients above all, respect for human dignity, pursuit of equity, and fierce protection of the vulnerable.
Physicians—by virtue of their clinical expertise, ethical training, and patient advocacy role—are uniquely positioned to ensure AI serves these ends. That responsibility cannot be delegated to algorithms.