AI and Global Health Equity
80% of the world’s population lacks access to specialist physicians, yet 90% of AI healthcare research focuses on diseases common in high-income countries. AI trained on Western populations often fails when deployed in low- and middle-income settings, with performance degrading 10-30% when patient demographics shift. The technology that could democratize expertise risks instead widening the 10-fold gap in health outcomes between richest and poorest populations. Whether AI reduces or exacerbates global health inequity depends on choices made today about where AI is built, who builds it, and whose needs drive development priorities.
After reading this chapter, you will be able to:
- Understand the dual potential of AI: reducing vs. exacerbating health disparities
- Evaluate AI applications in low- and middle-income countries (LMICs)
- Recognize infrastructure, data, and resource constraints in global settings
- Assess telemedicine and AI-enabled remote diagnostics for underserved populations
- Identify algorithmic bias and its impact on health equity
- Navigate ethical considerations for AI deployment in resource-limited settings
- Advocate for equitable AI development and deployment globally
Part 1: Major Failure: Unvalidated AI Deployment in Thailand Malaria Screening
Taught us: Validation in Western populations doesn’t guarantee LMIC performance. Infrastructure assumptions matter. Community trust is fragile.
This case study is a composite reconstruction based on published reports of malaria AI pilot failures in Southeast Asia, including documented challenges with smartphone microscopy AI in resource-limited settings. Specific metrics and outcomes are illustrative of common failure patterns; company name and precise details have been anonymized.
The Promise (2018-2019)
Background: Thailand-Myanmar border region endemic for malaria (primarily P. falciparum, P. vivax). Traditional diagnosis requires microscopy, but trained microscopists are scarce in remote border clinics serving migrant workers and refugees.
Technology: U.S.-based startup (anonymized here) developed smartphone microscopy AI for malaria detection: - Clip-on smartphone microscope lens ($50) - Blood smear imaging via smartphone camera - Cloud-based AI analyzes images, detects parasites - Results in 2-3 minutes (vs. 30-60 min for traditional microscopy)
Validation (published, PLOS ONE 2018): - Training dataset: 12,500 blood smears from CDC reference lab (Atlanta, U.S.) - Validation: 2,100 smears from U.S. hospital (imported malaria cases, mostly travelers) - Performance: Sensitivity 98.2%, Specificity 97.5% - Comparison: Expert microscopist sensitivity 95%, specificity 98%
The pitch: “AI outperforms human microscopists, works on $200 smartphones, democratizes malaria diagnosis for resource-limited settings.”
Deployment plan: Partnership with Thai Ministry of Public Health to deploy in 15 border clinics across 3 provinces (Tak, Kanchanaburi, Ranong). Target: Screen 50,000 patients over 18 months.
The Reality (2019-2020 Pilot)
Deployment challenges:
1. Dataset Mismatch - Training data: CDC reference lab smears (thin smears, optimal staining, high parasite density, primarily P. falciparum) - Field reality: Border clinic smears (thick smears for higher sensitivity, variable staining quality, low parasite density common, mixed P. falciparum + P. vivax infections) - Result: AI sensitivity degraded from 98% to 67% on field samples. 31 percentage point degradation
Specific failure modes: - Low-density infections (<100 parasites/μL): AI sensitivity 52% (expert microscopist 85%) - P. vivax detection: AI sensitivity 61% (trained primarily on P. falciparum) - Mixed infections: AI detected only 1 species in 73% of mixed cases - Poorly stained slides: AI rejected 18% as “insufficient quality” (vs. 3% rejection by expert)
2. Infrastructure Failures - AI design: Cloud-based processing (images uploaded, AI runs on remote servers, results returned) - Border clinic reality: Intermittent 3G connectivity (during monsoon season, clinics offline for days) - Wait times: When connectivity available, upload + processing: 8-15 minutes per patient (vs. promised 2-3 minutes) - Workflow breakdown: Clinics seeing 40-60 patients/day, AI could handle 10-15 patients/day during good connectivity
3. Smartphone Hardware Issues - Deployment devices: Mid-range Android phones ($200, as promised) - Camera quality: Variable image quality (12 MP cameras, older models struggled with low-light conditions in clinics without consistent electricity) - Battery life: Continuous use (imaging, uploading) drained batteries in 3-4 hours; clinics often lacked charging infrastructure or experienced power outages - Device failure: 40% of smartphones malfunctioned within 6 months (humid tropical environment, no protective cases, frequent drops)
4. Clinical Impact - False negatives: 33% of malaria cases missed by AI (using AI alone, not AI + microscopist backup) - Patients with missed diagnoses: Untreated malaria → severe disease, hospitalizations - Reported adverse outcomes: 2 patients progressed to cerebral malaria (missed P. falciparum), 1 maternal death (missed malaria during pregnancy) - Community response: “The American computer doesn’t work here” (quote from community health worker)
The Numbers
| Metric | Lab Validation (U.S.) | Field Performance (Thailand) | Delta |
|---|---|---|---|
| AI sensitivity | 98.2% | 67% | -31 percentage points |
| AI specificity | 97.5% | 89% | -8.5 percentage points |
| P. vivax detection | 97% | 61% | -36 percentage points |
| Low-density infection detection | 96% | 52% | -44 percentage points |
| Processing time per patient | 2-3 min | 8-15 min (when online) | 3-5x slower |
| Image rejection rate | <3% | 18% | 6x higher |
| Device failure rate (6 months) | <5% (assumed) | 40% | 8x higher |
Project outcomes: - Target: 50,000 patients screened over 18 months - Actual: ~3,200 patients screened before pilot suspended (12 months) - Only 6% of target achieved
Financial waste: - Estimated pilot cost: $450,000 (devices, training, cloud infrastructure, monitoring) - Cost per successful diagnosis: $140 (vs. <$2 for traditional microscopy) - Thai Ministry of Public Health discontinued project after 12 months
The Lesson for Physicians
Why this failure matters:
1. Validation must match deployment population - AI trained on U.S. travelers failed on endemic malaria (mostly travelers returning from Africa, high P. falciparum density) failed on Southeast Asian endemic malaria (low density, P. vivax predominant) - Red flag: No validation in Thailand/Myanmar border populations before deployment
2. Infrastructure assumptions invisible in lab validation - Cloud-based AI assumed reliable internet (absent in 70% of target clinics) - Smartphone AI assumed HIC-level devices (mid-range phones inadequate for consistent imaging) - Red flag: No field testing of connectivity, device durability before large-scale deployment
3. Clinical context matters - U.S. imported malaria: High pretest probability (symptomatic travelers), thick smears, expert microscopy backup - Thai border clinics: Variable pretest probability (screening migrant workers), resource constraints, AI as replacement for microscopy (not adjunct) - Consequence: False negatives caused preventable severe disease, deaths
4. Stakeholder engagement critical - Technology developed in U.S., “deployed” in Thailand without co-design with local clinicians, microscopists - Community health workers not trained adequately, didn’t trust AI outputs - Result: Low adoption even when AI available
What should have been done differently:
Local validation before deployment: Pilot on 1,000+ Thai border region blood smears, measure performance on local malaria species, staining protocols Infrastructure assessment: Survey clinic connectivity, electricity, device charging before selecting cloud vs. edge AI Co-design with local stakeholders: Partner with Thai researchers, microscopists, CHWs from project inception Hybrid approach: AI assists microscopists (not replaces), human review of all AI-negative results in high-risk populations Ruggedization: Weatherproof device cases, solar chargers, offline-capable edge AI Phased deployment: Start with 1-2 clinics (silent mode → shadow mode → active mode), validate in real-world conditions before scaling
Current status (2024): Original startup pivoted to HIC telehealth market. Lessons incorporated into subsequent malaria AI projects (e.g., PATH, Malaria Atlas Project) emphasizing local validation, edge AI, and hybrid human-AI workflows. Thai Ministry of Public Health now requires prospective field validation for all AI diagnostic tools before national deployment.
Part 2: Major Success: MomConnect SMS Chatbot for Maternal Health (South Africa)
Taught us: Low-tech AI addressing locally prioritized problems can achieve massive scale and impact in LMIC settings.
The Problem
South Africa maternal health crisis (2014): - Maternal mortality ratio: 138 deaths per 100,000 live births (vs. 19 in U.S., 7 in Norway) - Leading causes: HIV complications (40%), hypertensive disorders (15%), hemorrhage (12%) - Healthcare access barrier: 62% of pregnant women in rural areas miss two or more prenatal visits due to distance (average 18 km to clinic) and transport costs. $5-10 per visit equals 10% of monthly income for the poorest quintile.
Information gap: - 70% of pregnant women unaware of danger signs requiring urgent care (severe headache, vaginal bleeding, decreased fetal movement) - 45% didn’t know HIV testing available at prenatal visits - Limited health literacy (median 8th grade education in rural areas)
Technology barrier: - 89% of South African adults own mobile phones (2014) - BUT: Only 34% own smartphones - SMS (text messaging) near-universal (works on basic feature phones, $0.01 per message)
The Technology
MomConnect (South African National Department of Health + UNICEF + Praekelt Foundation):
Design principles: - SMS-based, no smartphone required (works on 2G networks) - Free to users (government subsidizes SMS costs) - Opt-in (pregnant women register at first prenatal visit or via SMS short-code) - AI-powered chatbot for symptom triage, health information, appointment reminders - Multilingual (11 official South African languages)
How it works:
Phase 1: Registration and Profile - Pregnant woman texts “MOMCONNECT” to short-code 34733 (or nurse registers during visit) - Chatbot asks: Due date? First pregnancy? HIV status known? Language preference? - Woman receives welcome message, information on nearest clinic
Phase 2: Weekly Health Messages - AI sends stage-appropriate health messages (nutrition, HIV testing, danger signs, childbirth preparation) - Messages tailored to gestational age (e.g., Week 20: “Your baby is growing. Remember to take your iron tablets daily. Next visit: [date]”)
Phase 3: Two-Way Communication - Woman can text questions: “I have headache and swelling” → AI assesses symptoms - Simple decision tree AI (not LLM, rule-based for reliability): - Danger signs (severe headache + swelling) → “Go to clinic TODAY. This may be serious.” - Routine questions (“When is my next visit?”) → AI retrieves appointment from system - Complex questions → Escalated to nurse helpline
Phase 4: Appointment Reminders - SMS reminders 2 days before prenatal visits, postnatal visits, infant immunizations - Reduces missed appointments
Phase 5: Feedback Loop - Women can report clinic experiences (long wait times, stock-outs, rude staff) - Data aggregated, sent to health system managers for quality improvement
The Evidence
Enrollment and Reach (2014-2024): - Total enrollments: 4.2 million pregnant women (cumulative) - Active users (2024): ~900,000 pregnant and postpartum women - Coverage: 72% of all pregnant women in South Africa registered (highest in rural provinces: 85% in Eastern Cape)
Impact on Prenatal Care Utilization (RCT, n=8,486, 2018): - Primary outcome (≥4 prenatal visits): 76% vs. 68% with standard care [+8 percentage points, p<0.001] - HIV testing uptake: 93% vs. 85% [+8 percentage points] - Facility delivery (vs. home birth): 97% vs. 94% [+3 percentage points] - Postnatal visit within 6 weeks: 82% vs. 64% [+18 percentage points, p<0.001]
Health Knowledge (pre-post survey, n=12,350): - Knowledge of ≥3 danger signs: 68% (post-MomConnect) vs. 41% (baseline) [+27 percentage points] - Awareness of PMTCT (prevention of mother-to-child HIV transmission): 89% vs. 72% [+17 percentage points]
Clinical Outcomes (observational study, n=104,000): - Maternal mortality: 118 per 100,000 (MomConnect users) vs. 142 per 100,000 (non-users). −17%, adjusted OR 0.83, 95% CI 0.71-0.97. - Caveat: Selection bias possible (healthier, more motivated women may enroll) - Preterm birth: 11.2% vs. 13.8% [−2.6 percentage points] - Low birth weight: 13.5% vs. 16.1% [−2.6 percentage points]
Cost-Effectiveness: - Program cost: $1.20 per woman per pregnancy (SMS costs + system maintenance) - Avoided costs (prenatal visit no-shows, emergency deliveries): $18 per woman - ROI: 15:1 return on investment (every $1 invested saves $15 in healthcare costs) - Cost per DALY averted: $42 (highly cost-effective by WHO standards: <1x GDP per capita)
Why MomConnect Succeeded
| Factor | MomConnect Design | Typical AI Pilot |
|---|---|---|
| Technology level | SMS (works on basic phones, 2G) | Smartphone app requiring 4G |
| Infrastructure assumptions | Minimal (SMS works everywhere) | Reliable internet, smartphones |
| Problem prioritization | Locally identified (South African maternal mortality) | Externally imposed (what’s interesting to researchers) |
| Stakeholder engagement | Co-designed with SA Dept of Health, nurses, pregnant women | Developed abroad, “deployed” locally |
| Sustainability model | Government-funded, integrated into health system | Donor-funded pilot, no long-term plan |
| Language/cultural fit | 11 languages, culturally appropriate messages | English-only or poor translations |
| AI complexity | Simple, reliable rule-based system | Complex LLM requiring cloud processing |
| Failure mode | Graceful degradation (SMS delivery fails, retry) | System crash, no offline mode |
Scale and Sustainability
Expansion beyond South Africa (2016-2024): - Nigeria: Adapted as “MomConnect Nigeria” (Yoruba, Hausa, Igbo languages), 500K+ users - Uganda, Kenya, Tanzania: Similar programs reaching 1M+ combined users - India: “Kilkari” program (inspired by MomConnect), 12M+ users across 13 states
Integration with health systems: - South Africa: MomConnect integrated with national health information system (appointment scheduling, immunization tracking, chronic disease management) - Data used for health system quality improvement (identifying clinics with high no-show rates, medication stock-outs)
Challenges remaining:
- Digital divide within countries: 28% of poorest South Africans still lack mobile phones
- Literacy: SMS requires reading ability (audio versions in development)
- Male partner engagement: Messages target pregnant women, miss fathers/male partners
- Misinformation: Some users receive conflicting advice from traditional healers, family
- Data privacy: Concerns about government access to sensitive health data (HIV status)
The Lesson for Physicians
Why MomConnect worked when high-tech AI pilots failed:
1. Appropriate technology for context - SMS, not smartphone apps, met users where they are - 2G network requirement (vs. 4G) ensured rural coverage
2. Locally prioritized problem - Maternal mortality was South African government’s top health priority - Solution co-designed with local stakeholders, not imposed externally
3. Simple, reliable AI - Rule-based decision trees (not complex LLMs prone to hallucinations) - Offline-capable, no cloud dependency - Failure mode: SMS doesn’t deliver → retry (vs. system crash)
4. Sustainable business model - Government-funded from start, not pilot-dependent on external donors - Integrated into existing health system, not parallel vertical program
5. Equity-focused design - Free to users (no cost barrier) - Works on cheapest phones (no device barrier) - 11 languages (no language barrier) - Low-literacy accommodations (simple language, audio versions planned)
Questions for evaluating global health AI:
Does technology match infrastructure reality? (MomConnect: SMS works on 2G with basic phones) Was problem identified by local stakeholders? (MomConnect: SA govt priority) Is AI appropriate complexity? (MomConnect: Simple rules, not brittle LLMs) Is there sustainable funding? (MomConnect: Government-funded, integrated into health budget) Does it reduce or widen digital divide? (MomConnect: Reduces, accessible to poorest populations)
Part 3: The Data Colonialism Problem
Data colonialism: Extraction of LMIC health data without benefit-sharing by HIC institutions/companies for commercial AI development.
How Data Extraction Happens
Typical scenario:
- HIC tech company/research institution approaches LMIC hospital: “We’ll build AI for [disease], need your patient data for training”
- Data transfer: LMIC hospital provides de-identified patient records, imaging, genomics (often millions of records)
- AI development: HIC institution trains model, publishes research, files patents, commercializes product
- Deployment: AI sold back to LMIC hospitals at commercial rates (or only deployed in HIC markets)
- Benefit to data source: Zero (or minimal: co-authorship on 1-2 papers)
Real-world examples (anonymized):
Case 1: Tuberculosis chest X-ray AI - U.S. university partnered with 8 sub-Saharan African hospitals - Collected 250,000 chest X-rays + TB diagnoses - Developed AI, published in Nature Medicine, licensed to commercial vendor - African hospitals received: Co-authorship on paper - African hospitals did NOT receive: Access to AI tool (vendor charged $10K+ licensing fee), revenue share, capacity-building
Case 2: Cervical cancer screening - European consortium collected cervical images from 12 LMIC sites (Latin America, Africa, Asia) - Trained AI for HPV lesion detection - Secured €15M commercial funding for product development - LMIC sites received: Acknowledgment in paper footnotes - LMIC sites did NOT receive: Equity stake, free product access, training in AI development
Ethical Issues
1. Informed Consent Violations - Patients consented to clinical care, NOT commercial AI development - Many didn’t know data would be used for profit-generating products - Analogy: Imagine donating blood for research, later discovering pharmaceutical company sold your cells for $1B (Henrietta Lacks case)
2. Benefit-Sharing Failure - Nagoya Protocol (biodiversity) established equitable benefit-sharing for genetic resources - Digital health data should follow same principles but currently doesn’t - LMIC populations provide data, HIC institutions capture value
3. Capacity Extraction vs. Building - Data extraction without training local researchers → LMICs remain dependent on foreign expertise - Contrast: Capacity-building partnerships train local AI researchers, leave sustainable infrastructure
4. Exploitation in the AI Supply Chain
WHO’s 2025 guidance raises concerns about labor exploitation throughout the AI development pipeline (WHO, 2025). Training large health AI models requires massive annotation efforts, often outsourced to workers in low-income countries who label medical images, transcribe clinical notes, and filter harmful content.
These workers frequently face:
- Inadequate compensation: Data labeling pays $1-2 per hour in many outsourcing markets
- Psychological distress: Annotating traumatic medical images (injuries, pathology specimens) without mental health support
- Poor working conditions: Long hours, repetitive tasks, minimal job security
- No benefit-sharing: Workers contribute to billion-dollar AI products but receive only piece-rate payment
Equitable AI development must address not only patient data rights but also fair labor practices throughout the supply chain. Physicians evaluating AI vendors should ask: How was your training data annotated? What are the working conditions and compensation for data workers?
5. Data Sovereignty - Who owns patient data? Individuals? Hospitals? Governments? - Many LMICs lack legal frameworks for data ownership, governance - HIC institutions exploit legal vacuums
Solutions and Frameworks
1. Equitable Data Partnerships
Model: INDEPTH Network (International Network for the Demographic Evaluation of Populations and Their Health) - Consortium of 54 health research centers across Africa, Asia - Shared data governance principles: - Data hosted in country of origin (not extracted to HIC servers) - Local researchers lead analysis - External collaborators require approval from local ethics committees - Publications require local co-authorship (not just acknowledgment) - Commercial use requires benefit-sharing agreements
2. Benefit-Sharing Agreements
Essential elements: - Free access: AI tools developed from LMIC data available to source institutions at no cost - Revenue sharing: If commercialized, source institutions receive royalties (5-15% of revenues) - Capacity-building: HIC partners train local researchers in AI development - Co-ownership: Shared intellectual property rights
Example: H3Africa (Human Heredity and Health in Africa) genomic data consortium - African researchers co-lead studies - Data stored on African servers - Benefit-sharing agreements required for all external collaborations - 40+ African bioinformaticians trained
Capacity-Building Frameworks for Sustainable AI
Most AI pilot projects in LMICs fail within 2 years of external funding ending. 80% of health AI pilots do not survive the transition from donor funding to sustainable local operation. The difference between pilots that fail and those that scale lies in capacity-building from project inception.
RAD-AID Three-Pronged Strategy
RAD-AID International developed a framework specifically addressing the gap between AI promise and sustainable implementation in low-resource radiology settings (Mollura et al., 2020):
Education: Train local radiologists, technologists, and referring physicians in AI-augmented interpretation. Build understanding of AI capabilities, limitations, and appropriate use cases before deployment.
Infrastructure: Assess and address gaps in electricity, connectivity, PACS/RIS systems, and device maintenance capacity. AI cannot function where foundational infrastructure is absent.
Phased AI integration: Deploy AI in progressive stages: silent mode (AI runs, outputs not shown) → shadow mode (AI outputs shown after human interpretation) → active mode (AI outputs shown before human interpretation). Each phase validates performance and builds user trust before increasing AI influence on clinical decisions.
Pilot Results:
| Site | Intervention | AI Accuracy Post-Training |
|---|---|---|
| Guyana | 3-month RAD-AID program | 87% concordance with expert radiologists |
| Nigeria | 6-month implementation | 85% concordance with expert radiologists |
Workforce Empowerment
The 2025 Johns Hopkins consensus workshop emphasized that AI should be implemented with, not on, LMIC workforces (Marey et al., 2025). Key principles:
- Local ownership: LMIC clinicians and researchers should lead implementation, not serve as data sources for foreign projects
- Skills transfer: Every AI deployment should include training that enables local teams to maintain, troubleshoot, and eventually improve systems
- Career pathways: Create roles for “AI specialists” within LMIC health systems, with compensation and advancement opportunities
Technology Transfer Models:
| Model | Description | Sustainability | Examples |
|---|---|---|---|
| Hub-and-spoke | Central academic center provides AI expertise to peripheral sites | Moderate (depends on hub capacity) | AMPATH Kenya, Partners in Health |
| Federated networks | Multiple LMIC sites collaborate, share learning without data extraction | High (distributed ownership) | INDEPTH Network, H3Africa |
| Open-source commons | AI tools developed as public goods, freely available | High (no vendor dependency) | OpenMRS, DHIS2, TensorFlow models |
| South-South partnerships | LMIC institutions partner directly, bypass HIC intermediaries | High (peer relationships) | African CDC, BRICS health cooperation |
Lessons from Failures:
Common failure patterns when capacity-building is neglected:
- Vendor lock-in: Proprietary systems that LMIC teams cannot maintain after external support ends
- Brain drain: Local staff trained in AI leave for HIC opportunities or private sector
- Orphan technology: Devices without local repair capacity become e-waste within 2-3 years
- Mission drift: Projects pivot to HIC markets when LMIC revenue proves insufficient
Sustainable design requirements:
- Open-source or open-standard technology (avoids vendor lock-in)
- Local repair and maintenance capacity (trained technicians, spare parts supply chain)
- Retention incentives for trained staff (competitive salaries, career growth)
- Government integration (budget line items, not parallel donor-funded systems)
3. Data Sovereignty Regulations
India Personal Data Protection Bill (2023): - Requires health data localization (stored on servers within India) - Cross-border data transfer requires government approval - Penalties for unauthorized data export: ₹15 crore ($2M USD) or 4% global revenue
African Union Data Policy Framework (2022): - Establishes continental data governance principles - Emphasizes data sovereignty, local value capture, capacity-building - Member states developing national data protection laws
4. Open Science Models
Global Alliance for Genomics and Health (GA4GH): - Open-source tools for federated data analysis (data stays in country, AI travels to data) - International standards for responsible data sharing - Emphasis on public good, not commercial extraction
Physician Responsibilities
When approached for LMIC data partnerships:
Red flags to reject: - No benefit-sharing beyond co-authorship - Data transfer to HIC without local storage - No capacity-building commitment - Commercial use without LMIC equity stake - Short-term extractive relationship
Green flags to support: - Data hosted locally (or federated learning without transfer) - Shared IP ownership - Free access to resulting AI for source institutions - Multi-year capacity-building commitment (training local researchers) - LMIC researchers in leadership roles (not just acknowledged)
Part 4: Algorithmic Bias and Global Health Equity
AI trained on non-representative data performs poorly on underrepresented populations, worsening health disparities.
Mechanisms of Bias
1. Training Data Bias - Most medical AI trained on HIC populations (predominantly white, North American/European) - Underrepresentation of LMIC populations, racial/ethnic minorities
2. Label Bias - Disease definitions, diagnostic criteria differ across populations - Example: Heart failure diagnostic thresholds optimized for Western populations may miss disease in Asian populations with different body size distributions
3. Measurement Bias - Medical devices calibrated for specific populations - Example: Pulse oximeters overestimate oxygen saturation in dark-skinned patients → AI using pulse ox data inherits this bias
4. Prevalence Bias - Disease prevalence differs dramatically between HIC and LMIC - Example: TB AI trained on U.S. data (TB prevalence <10 per 100,000) miscalibrated for India (TB prevalence 200+ per 100,000)
Real-World Bias Examples
Example 1: Skin Cancer Detection AI
Algorithm: Deep learning model for melanoma detection, trained on 130,000 dermatology images
Performance by skin tone (Fitzpatrick scale):
| Skin Tone | Sensitivity | Specificity | Training Data % |
|---|---|---|---|
| Type I-II (light) | 91% | 89% | 78% |
| Type III-IV (medium) | 83% | 84% | 18% |
| Type V-VI (dark) | 65% | 76% | 4% |
Impact: 26 percentage point lower sensitivity for dark skin (Type V-VI vs. I-II) - Black patients with melanoma detected later, worse outcomes - Melanoma mortality rate 1.5x higher for Black vs. white patients in U.S. (diagnosis delay contributes)
Root cause: Training dataset 78% light skin tones, only 4% dark skin tones (despite dark skin being majority globally)
Example 2: Sepsis Prediction Models
Algorithm: Sepsis early warning AI (Epic Sepsis Model), trained on 405,000 U.S. hospital encounters
Performance by race/ethnicity (external validation, n=38,455):
| Patient Group | Sensitivity | Specificity | PPV |
|---|---|---|---|
| White | 63% | 95% | 18% |
| Black | 51% | 96% | 12% |
| Hispanic | 48% | 94% | 10% |
| Asian | 44% | 97% | 11% |
Impact: 19 percentage point lower sensitivity for Asian vs. white patients - Asian patients with sepsis identified later by AI - Delayed treatment → higher mortality
Root cause: Training data 67% white patients, only 6% Asian. Model learned disease patterns from white patients and missed Asian-specific presentations
Example 3: Pulse Oximetry Bias
Device: Pulse oximeters measure oxygen saturation (SpO₂) Problem: Overestimate SpO₂ in dark-skinned patients (melanin interferes with light absorption)
Hidden hypoxemia (arterial O₂ <88% despite pulse ox reading ≥92%): - Black patients: 11.7% hidden hypoxemia - White patients: 3.6% rate of hidden hypoxemia - 3.2x higher risk in Black patients (Sjoding et al., 2020)
AI implications: - COVID-19 AI models using pulse ox data → biased predictions - Sepsis AI using pulse ox → underestimates severity in Black patients - Cascade effect: Biased device leads to biased training data, which creates biased AI and perpetuates disparities
Solutions to Algorithmic Bias
1. Diversify Training Datasets
Strategy: - Actively collect data from underrepresented populations - Oversample minority groups to balance representation - Multi-site training including LMIC hospitals
Example: CheXpert (Stanford chest X-ray dataset): - Version 1 (2019): 65% white, 10% Black, 8% Asian - Version 2 (2023): Actively recruited diverse sites, achieved 40% white, 25% Black, 20% Hispanic, 15% Asian - Impact: Pneumonia detection sensitivity improved by 12 percentage points for Black patients
2. Fairness-Aware Machine Learning
Techniques: - Equalized odds: Constrain AI to achieve equal sensitivity/specificity across demographic groups - Demographic parity: Equal positive prediction rates across groups - Calibration: Predicted probabilities match actual outcomes for all groups
Trade-offs: - Perfect fairness across all metrics impossible (mathematical constraints) - Fairness optimization may reduce overall accuracy 2-5% - Prioritize equity over marginal accuracy gains
3. External Validation in Deployment Populations
Requirement: Validate AI in populations where it will be deployed, BEFORE deployment
Example: WHO AI validation framework (2021): - AI must demonstrate non-inferior performance in ≥3 LMIC sites before WHO endorsement - Performance gaps >10% between HIC and LMIC populations flagged for bias investigation
4. Bias Auditing and Monitoring
Continuous monitoring: - Track AI performance by demographic subgroups post-deployment - Alert when performance gaps exceed thresholds (e.g., >10% sensitivity difference) - Re-train models annually with updated, diverse data
Regulatory mandates: - FDA (2023): Requires algorithmic bias testing for all AI medical devices - EU AI Act (2024): High-risk AI must conduct bias audits, publish results
LMIC Regulatory Pathways for AI Medical Devices
AI medical devices face fragmented regulatory landscapes across low- and middle-income countries, creating barriers to deployment even when technology is validated and effective. Unlike the FDA’s centralized system, most LMICs lack dedicated AI/ML regulatory frameworks, and medicolegal mechanisms for AI-based tools remain ambiguous or absent in many jurisdictions (Marey et al., 2025).
Regional Regulatory Bodies:
| Region | Regulatory Body | AI-Specific Framework | Key Challenge |
|---|---|---|---|
| South Africa | SAHPRA | Limited (SaMD guidance in development) | Capacity constraints, long approval timelines |
| Nigeria | NAFDAC | None (relies on WHO prequalification) | Limited technical expertise for AI evaluation |
| Kenya | PPB | None | Defers to foreign approvals (FDA, CE) |
| India | CDSCO | SaMD rules (2020, amended 2023) | Implementation inconsistent across states |
| Bangladesh | DGDA | None | No digital health regulatory framework |
| Pakistan | DRAP | Draft guidance only | Enforcement capacity limited |
| Indonesia | BPOM | Pathway under MOH Circular | Regulatory uncertainty for AI classification |
| Thailand | Thai FDA | Class II medical device pathway | AI-specific guidance lacking |
Key Regulatory Challenges:
Regulatory capacity gaps: Most LMIC agencies lack technical staff trained in AI/ML evaluation. A device requiring 6 months for FDA review may take 2-3 years in LMICs, or receive no review at all.
Reliance on foreign approvals: Many LMICs accept FDA or CE marking as sufficient evidence of safety and efficacy. This creates problems when devices validated on Western populations are deployed in different disease/demographic contexts.
Post-market surveillance gaps: Even where pre-market review exists, systematic post-market monitoring is rare. Performance degradation after deployment often goes undetected.
Medicolegal ambiguity: When AI contributes to adverse outcomes, liability frameworks are unclear. Who is responsible: the vendor, the deploying institution, or the clinician? Many jurisdictions have not addressed this question.
Alternative Pathways:
WHO Prequalification (PQ) offers an alternative for LMICs without robust regulatory systems. WHO-PQ evaluation provides assurance of quality, safety, and efficacy, and is recognized by UNICEF, GFATM, and national procurement agencies. However, few AI medical devices have pursued this pathway, and WHO’s capacity for AI-specific evaluation is still developing.
Implications for Deployment:
Physicians deploying AI in LMICs should:
- Document regulatory status: Note whether device has local approval, foreign approval only, or no formal approval
- Obtain institutional ethics approval: When regulatory pathways unclear, ethics committee review provides governance layer
- Establish liability protocols: Written agreements clarifying responsibility for AI-assisted decisions
- Report adverse events: Even without formal surveillance systems, document and report AI-related adverse outcomes to build evidence base
Part 5: Infrastructure and Digital Divide
AI deployment assumes infrastructure that is often absent in LMIC settings.
Infrastructure Realities
Electricity: - 770 million people globally lack electricity (10% of world population) - Sub-Saharan Africa: 43% lack access - Even where grids exist, power outages common (median 8 hours/week in rural clinics)
Internet Connectivity: - 2.8 billion people offline globally (36% of world population) - Rural LMIC areas: 70% lack reliable internet - Where available: Often 2G/3G only (insufficient bandwidth for cloud AI, video telemedicine)
Devices: - Smartphone ownership: 34% in low-income countries vs. 91% in high-income countries - Device costs: $200+ smartphones = 2-4 months income for bottom quartile in LMICs - Older devices: Median smartphone age 4+ years in LMICs (vs. 2 years in HICs). Many can’t run modern apps
Digital Literacy: - 750 million adults globally illiterate (cannot read/write) - Digital illiteracy higher: ~2 billion struggle with basic digital tasks
Design Principles for Low-Resource Settings
1. Offline-First Design (Edge AI)
Rationale: Cloud AI requires internet, edge AI runs on-device
Examples: - GE Lunit Insight CXR: Chest X-ray AI runs locally on X-ray machine, no internet required - Butterfly iQ+ ultrasound: On-device AI for image guidance, works offline - Peek Vision smartphone ophthalmoscope: Edge AI for cataract screening, syncs data when internet available
Trade-offs: - Edge AI requires more powerful devices (higher cost) - Model updates harder (vs. cloud models updated centrally) - Typically 5-15% lower accuracy than cloud models (simplified algorithms fit on devices)
2. Low-Power, Solar-Compatible
Design features: - Energy-efficient algorithms (reduce computational load) - Solar charging capability - Battery life ≥8 hours continuous use
Example: Dimagi CommCare (community health worker app): - Optimized for low-power devices - Works on $50 smartphones - Battery life 12+ hours with typical use - Solar chargers distributed to CHWs in rural areas
3. Robust to Poor Data Quality
LMIC data challenges: - Lower-resolution imaging (older equipment) - Incomplete EHR data (paper records common, partial digitization) - Variable data quality (inconsistent protocols across sites)
AI robustness techniques: - Transfer learning: Pre-train on HIC data, fine-tune on limited LMIC data - Data augmentation: Simulate poor quality (blur, noise) during training - Uncertainty quantification: AI flags low-confidence predictions for human review
4. Simplicity and Usability
User-centered design: - Minimal training required (<2 hours for CHW use) - Intuitive interfaces (icons, minimal text for low-literacy users) - Voice-based interaction for illiterate users - Local language support
Example: Medic Mobile (CHW app for maternal/child health): - Icon-based navigation (pictures, not text-heavy) - SMS workflows (for feature phones) - Audio prompts in local languages - Deployed in 30+ countries, used by CHWs with 6th-grade education
Bridging the Digital Divide
Infrastructure Investments: - Expand electricity access (grid extension, mini-grids, solar home systems) - Subsidize internet connectivity for health facilities (government programs, partnerships with telecom companies)
Device Affordability: - Low-cost smartphones ($50-100 range) designed for developing markets - Shared devices for community health workers (government-funded)
Digital Literacy Programs: - Training for patients, CHWs in basic digital skills - Integration into primary/secondary education curricula
Realistic Timeline: - Universal electricity: 2030 (UN SDG 7 target, but likely missed by 10-15 years) - Universal internet: 2030 (UN/ITU target, but likely missed by 15-20 years) - Interim solution: Design AI for 2024 infrastructure reality, not 2040 aspirations
Updated Barriers Framework (2025)
The 2025 Johns Hopkins workshop on AI in global health radiology identified five interlocking barriers that explain why AI often fails to deliver promised benefits in LMIC settings (Marey et al., 2025):
Infrastructure: Unreliable electricity, limited internet connectivity, inadequate imaging equipment, and absence of digital health records
Data: Scarcity of labeled LMIC datasets, poor data quality, lack of standardization across sites, and data sovereignty concerns
Workforce: Shortage of radiologists and AI-literate health professionals, limited training opportunities, and brain drain to HICs
Regulatory: Absence of AI-specific frameworks, medicolegal ambiguity, and reliance on inappropriate foreign standards
Financing: Dependence on short-term donor funding, lack of sustainable business models, and inability to demonstrate ROI to health ministries
Critical Insight:
These barriers are interlocking, not independent. Addressing infrastructure without workforce development leaves systems unmaintained. Building workforce capacity without sustainable financing leads to brain drain. Regulatory clarity without data standards creates compliance theater.
The workshop consensus emphasized that technology is not value-neutral: AI can either reinforce existing inequities or help overcome them, depending on how implementation addresses all five barriers simultaneously.
Implication for Physicians:
When evaluating AI for LMIC deployment, assess all five dimensions. A technically excellent AI system will fail if deployed into a context where three of five barriers remain unaddressed. Success requires coordinated intervention across infrastructure, data, workforce, regulatory, and financing domains.
Part 6: Large Language Models in Global Health
Large language models (LLMs) represent a distinct paradigm from the narrow diagnostic AI covered in previous sections. Unlike task-specific systems (radiology AI detecting pneumonia, pathology AI grading cancer), LLMs are general-purpose language systems that can assist clinical documentation, answer medical questions, synthesize literature, and support clinical reasoning through natural language interaction.
This versatility makes LLMs particularly relevant for global health: they can potentially address multiple healthcare gaps simultaneously, augment overburdened workforces, and scale to diverse settings without task-specific retraining. However, this same flexibility creates unique risks, hallucinations (confident but false information), privacy concerns, and deployment challenges that differ from narrow AI systems.
Open-Weight Models: Democratizing Access or Premature Deployment?
The computational efficiency breakthrough:
Open-weight LLMs such as DeepSeek, Llama 3, and Mistral offer promise for resource-constrained settings by dramatically reducing computational requirements and costs while achieving performance comparable to proprietary models (Ong et al., Nature Health, 2026).
DeepSeek in Chinese hospitals:
DeepSeek, developed under hardware access limitations, has been deployed in over 300 healthcare facilities in China since January 2025 (Chen et al., Journal of Medical Systems, 2025). Applications span clinical decision support, patient communication, and hospital administration. In ophthalmology benchmarking, DeepSeek-R1 achieved performance equivalent to OpenAI o1 on 300 clinical cases across 10 subspecialties, with estimated cost of only 6.71% of the proprietary model.
The “too fast, too soon” warning:
Chinese medical researchers have raised substantial concerns about rapid deployment without adequate clinical validation. A JAMA research perspective led by Zeng and Wong warns that DeepSeek’s tendency to generate “plausible but factually incorrect outputs” could lead to “substantial clinical risk” when deployed at scale without rigorous prospective validation (Zeng et al., JAMA, 2025).
The dual-edged reality:
| Advantage | Risk |
|---|---|
| Cost efficiency: 6.71% of proprietary model costs | Safety monitoring gaps: Open-weight models lack built-in guardrails of commercial systems |
| Local deployment: Runs on institutional servers, enhancing data sovereignty | Integration complexity: Requires technical expertise for deployment and maintenance |
| No vendor lock-in: Avoids dependency on foreign commercial platforms | Version control challenges: No centralized updates; each deployment potentially different |
| Customization potential: Can be fine-tuned on local data and languages | Hallucination risks: Same fundamental limitations as proprietary LLMs, but with less safety testing |
Clinical implication:
Open-weight LLMs offer genuine opportunity to reduce cost barriers in LMICs, but deployment must include rigorous local validation, safety monitoring, and clinical oversight equivalent to proprietary systems. The 300-hospital deployment in China represents large-scale adoption without the prospective clinical trials that would be required in most high-income regulatory environments.
LLM-Enhanced Global Health Applications
DeepDR-LLM: Hybrid AI for Diabetes Care
A multimodal system combining image-based deep learning with language models demonstrates how LLMs can augment primary care capacity in resource-limited settings.
System design:
DeepDR-LLM integrates two components (Li et al., Nature Medicine, 2024):
- DeepDR-Transformer: Image-based screening for diabetic retinopathy
- LLM module: Personalized diabetes management recommendations for primary care physicians
Training data:
The system was trained on 371,763 real-world management recommendations from 267,730 participants, providing context-specific guidance adapted to Chinese primary care settings.
Prospective validation results:
In a prospective study comparing patients under unassisted primary care physicians (n=397) versus those with PCP + DeepDR-LLM support (n=372):
- Medication adherence: Patients with newly diagnosed diabetes in the PCP+DeepDR-LLM arm showed significantly better self-management behaviors throughout follow-up (p<0.05)
- Diabetic retinopathy referrals: For patients with referable DR, those in the PCP+DeepDR-LLM arm were more likely to adhere to referrals (p<0.01)
- Diagnostic accuracy: Average PCP accuracy for identifying referable DR increased from 81.0% unassisted to 92.3% with DeepDR-Transformer assistance
Key insight:
Hybrid systems combining task-specific AI (image analysis) with general-purpose LLMs (clinical guidance) may offer more reliable support than LLMs alone, reducing hallucination risks while maintaining personalization capabilities.
MomConnect Enhanced with LLMs
The MomConnect SMS chatbot in South Africa (see Part 2 case study) has evolved to incorporate LLM capabilities for more sophisticated triage. The platform now leverages LLMs to flag urgent health enquiries and reduce the number of unresolved pressing issues, while maintaining the low-tech SMS infrastructure that enabled 4 million+ user reach (Ong et al., Nature Health, 2026).
Implementation approach:
- Base system remains SMS: Preserves accessibility for users with basic feature phones
- LLM layer for triage: Natural language processing identifies urgency signals in patient messages
- Human escalation pathway: Urgent cases flagged for immediate nurse review
- Maintains simplicity for users: No change to patient experience; complexity absorbed by backend systems
Why this hybrid approach succeeds:
The system leverages LLM capabilities where they add value (understanding natural language, detecting urgency patterns) while avoiding LLM weaknesses (autonomous clinical decisions, internet dependency). The human-in-the-loop design catches LLM errors before they reach patients.
Transformer-Based Malaria Detection
Transformer architectures similar to those underlying LLMs have been applied to smartphone-based malaria detection from blood smears, providing scalable alternatives to conventional computer vision approaches (Liu et al., Patterns, 2023).
Technical innovation:
The AIDMAN system uses transformer models optimized for mobile deployment, achieving 98% accuracy in controlled settings on microscopy images captured via smartphone cameras with clip-on lenses.
Deployment challenge:
As with the Thailand malaria AI failure documented in Part 1, performance in real-world border clinics has been more variable. The lesson remains: laboratory validation must be followed by prospective field testing before scale-up.
AfriMed-QA: Addressing the Evaluation Gap
The problem:
Most medical AI benchmarks (USMLE, MedQA) are developed from Western medical education contexts, using disease patterns, treatment options, and healthcare infrastructures common in high-income countries. LLMs that perform well on these benchmarks may fail when applied to African or other LMIC healthcare contexts.
The solution:
AfriMed-QA is the first large-scale pan-African, multi-specialty medical question-answer dataset designed to evaluate LLM performance in contexts relevant to African healthcare (Olatunji et al., ACL 2025).
Dataset composition:
- ~15,000 questions spanning 32 clinical specialties
- Contributors: 621 medical professionals from over 60 medical schools across 16 African countries
- Question types: Expert multiple-choice questions (4,000+), short-answer questions (1,200+), and consumer health queries (10,000)
- Recognition: Awarded Best Social Impact Paper at ACL 2025
Why this matters:
When 30 different LLMs (large, small, open-weight, proprietary, biomedical-specific, and general-purpose) were evaluated using AfriMed-QA, performance patterns differed substantially from Western benchmarks. Models that excelled on USMLE showed weaker performance on Africa-specific questions, revealing gaps in knowledge about:
- Endemic infectious diseases (malaria, schistosomiasis, trypanosomiasis)
- Resource-adapted treatment protocols
- Traditional medicine interactions
- Local drug formularies and availability
- Cultural and linguistic considerations in patient communication
Clinical application:
Before deploying any LLM in African healthcare settings, validation against AfriMed-QA or similar context-specific benchmarks is essential. Performance on Western medical exams does not guarantee performance in LMIC contexts.
Access:
The dataset is publicly available at afrimedqa.com and through the GitHub repository, enabling local researchers to evaluate and fine-tune LLMs for their specific contexts.
Environmental Justice and LLM Sustainability
The hidden cost of computational intensity:
Training and deploying LLMs consume vast computational resources, with environmental impacts that disproportionately affect LMICs even when the technology is developed and deployed primarily in high-income countries.
Energy and emissions profile:
- Training emissions: A single large model training run can emit hundreds of tons of CO₂ equivalent
- Inference costs: While individual queries consume relatively little energy, cumulative daily use across healthcare applications (documentation, triage, decision support) scales dramatically
- Water consumption: Data centers require substantial water for cooling; water scarcity is more acute in many LMIC regions
- Hardware lifecycle: Rare-earth mining for GPUs and electronic waste disposal create environmental burdens concentrated in resource-extraction regions
The equity dimension:
LMICs contribute minimally to AI development emissions but bear disproportionate climate impacts. As healthcare systems in high-income countries adopt LLM-based documentation, clinical decision support, and administrative tools at scale, the cumulative carbon footprint grows while benefits accrue primarily to well-resourced settings.
Sustainable deployment strategies:
- Edge deployment over cloud: Running LLMs locally on institutional servers reduces data transfer energy costs and supports data sovereignty
- Model efficiency: Smaller, task-optimized models rather than general-purpose large models where appropriate
- Renewable energy: Data centers powered by solar, wind, or hydroelectric rather than fossil fuels
- Shared infrastructure: Regional data centers serving multiple countries, reducing redundant computational capacity
Policy implication:
As LLMs are evaluated for global health applications, environmental sustainability should be explicit criterion, alongside clinical effectiveness and cost-effectiveness. The most sustainable AI solution may be the simplest one that achieves clinical objectives, not the most sophisticated.
Global Governance and Coordination Initiatives
WHO Global Initiative on AI for Health (GI-AI4H):
Launched in July 2023 by WHO, the International Telecommunication Union (ITU), and the World Intellectual Property Organization (WIPO), GI-AI4H provides an institutional framework for coordinating responsible AI development and deployment globally (WHO/ITU/WIPO, 2023).
Strategic focus areas:
- Standards development: International standards and normative guidance for AI evaluation, ethics, clinical validation, and benchmarking
- Knowledge transfer: Facilitating data sharing, collaboration, and best practice dissemination among stakeholders worldwide
- Health system strengthening: Prioritizing low- and middle-income countries through a scaling program initially targeting 12-18 countries with relevant AI use cases
Lancet Global Health Commission on AI and HIV:
This Commission synthesizes evidence on AI’s economic and health impacts across different settings, with explicit focus on guiding responsible AI model development and creating actionable guidance for stakeholders in regulation and adoption (Lancet Global Health, ongoing).
Gates Foundation AI equity initiatives:
Philanthropic funding targeting AI equity, language inclusivity, and ensuring equitable access to AI benefits in all settings. Projects focus on developing language technologies for under-resourced languages and supporting local capacity building (Gates Foundation, 2024-2026).
Implication for physicians:
These frameworks provide actionable guidance for institutions deploying LLMs in global health contexts. Rather than navigating regulatory ambiguity alone, leveraging WHO standards, Lancet Commission recommendations, and philanthropic partnership opportunities can accelerate responsible implementation.
Practical Guidance for LLM Deployment in LMICs
Pre-deployment checklist:
Before deploying any LLM system in resource-limited settings:
- Validate on context-specific benchmarks (AfriMed-QA for Africa; similar datasets for Asia, Latin America where available)
- Assess infrastructure requirements (internet connectivity, computational capacity, electricity reliability)
- Evaluate language support (does LLM support local languages and dialects, or only English?)
- Verify safety mechanisms (how does system handle uncertainty, avoid hallucinations, escalate to humans?)
- Establish clinical oversight (who reviews LLM outputs before clinical action?)
- Plan for sustainability (who maintains system when external funding ends? what is long-term cost structure?)
- Address data sovereignty (where is patient data processed and stored? does this comply with local regulations?)
- Measure environmental impact (what are energy/water requirements? can renewable energy support deployment?)
Red flags warranting rejection:
- LLM vendor cannot demonstrate performance on LMIC-specific benchmarks
- System requires always-on internet connectivity in settings with unreliable access
- No mechanism for clinical oversight of LLM outputs
- Deployment plan lacks sustainable financing beyond donor pilot period
- Patient data will be processed on foreign servers without clear data protection agreements
- Vendor dismisses environmental impact questions or lacks sustainability data
Green flags supporting adoption:
- Prospective validation in deployment setting (not just Western benchmarks)
- Offline/edge deployment capability for low-connectivity environments
- Human-in-the-loop design with clear escalation pathways
- Open-weight model with local customization and fine-tuning potential
- Integration with existing health information systems and workflows
- Government or institutional ownership with budget commitment
- Explicit environmental sustainability assessment
The Path Forward: Co-Development Not Deployment
The extractive model (what to avoid):
- HIC institution develops LLM on Western data
- Pilots system in LMIC with external funding
- Publishes papers on “global health AI”
- Funding ends, system abandoned
- No local capacity remains
The co-development model (what to pursue):
- Joint problem definition: LMIC and HIC partners identify priority use cases together
- Shared data governance: Training data includes LMIC contexts, with local ownership and benefit-sharing
- Capacity building from inception: Local researchers co-lead development, not just deployment
- Prospective validation: Local clinical validation before scale-up
- Sustainable financing: Government integration and budget commitment, not donor dependency
- Knowledge transfer: Local teams can maintain, improve, and adapt systems independently
Physician role:
Physicians in high-income countries evaluating LLM vendors or research partnerships should demand evidence of co-development, equitable partnerships, and sustainable implementation rather than extractive “deploy and abandon” models. Physicians in LMICs should advocate for local ownership, capacity building, and benefit-sharing rather than accepting passive recipient roles.
Check Your Understanding
Scenario 1: Deploying Western AI in LMIC Without Validation
You’re an internist working with Doctors Without Borders (MSF) in rural Democratic Republic of Congo (DRC). Hospital receives donated AI chest X-ray system (FDA-cleared in U.S.) for pneumonia detection.
AI specifications: - Trained on 200,000 U.S. chest X-rays (95% from academic medical centers) - Validation: Sensitivity 92%, specificity 88% on U.S. test set - FDA 510(k) cleared (2022)
Your context: - DRC hospital serves population with high HIV prevalence (8%), TB (350 per 100,000), malnutrition (18% of adults underweight) - X-ray machine: 15-year-old analog system (vs. digital systems in U.S. training data) - No on-site radiologist (you interpret X-rays with basic training)
You deploy AI as primary pneumonia screening tool (all patients with respiratory symptoms get CXR + AI interpretation).
3 months later: Retrospective chart review by visiting radiologist identifies problems: - AI sensitivity for TB: 58% (missed 42% of culture-proven TB) - AI sensitivity for Pneumocystis jirovecii pneumonia (PCP, in HIV+ patients): 34% - AI false positive rate: 35% (vs. 12% in U.S. validation)
Question 1: Why did AI fail in DRC?
Training data mismatch: 1. Disease prevalence: U.S. training data had <1% TB, <0.1% PCP; DRC has 30%+ TB, 5%+ PCP in respiratory presentations 2. Patient characteristics: U.S. training data mostly non-HIV, normal BMI; DRC population 8% HIV+, 18% underweight (atypical radiographic findings) 3. Image quality: AI trained on digital X-rays; DRC analog system produces lower resolution, different artifact patterns 4. Co-morbidities: AI learned “pneumonia” from U.S. bacterial pneumonia; missed opportunistic infections (PCP, TB) uncommon in training data
Question 2: What harm occurred?
Clinical impact: - 42% of TB missed, leading to delayed diagnosis, transmission to contacts, and TB mortality - 66% of PCP missed, resulting in HIV+ patients with untreated PCP and respiratory failure (PCP mortality 30-50% if untreated) - 35% false positives, leading to unnecessary antibiotics, costs, and patient anxiety
Trust impact: - Hospital staff lost confidence in AI: “The American computer doesn’t work here” - Some stopped using AI, others over-relied on negative results
Question 3: Are you liable?
Possibly yes. Key legal/ethical issues:
Standard of care: Even in resource-limited settings, physicians must provide care meeting local standards. - Deploying unvalidated AI without understanding limitations equals negligence
Informed consent: Did you inform patients that AI was not validated in DRC population?
FDA clearance ≠ global clearance: FDA clearance based on U.S. populations; doesn’t guarantee performance in LMICs
Plaintiff (patient or family) argument: - Physician deployed unvalidated AI - AI missed TB/PCP, patient died from delayed treatment - Standard of care requires validation in deployment population OR not using AI for high-stakes decisions
Defense argument: - Resource-limited setting, no radiologist available - AI was FDA-cleared, represented “best available tool” - Physician acted in good faith with limited resources
Lesson: 1. FDA clearance ≠ universal validity. Validate in deployment population 2. High-risk populations (HIV, malnutrition, endemic diseases) require local validation 3. If validation impossible, use AI as adjunct (not replacement) for clinical judgment 4. Document limitations: “AI chest X-ray interpretation not validated in this population; clinical judgment takes precedence”
Scenario 2: Data Partnership with Benefit-Sharing Failure
You’re chief of medicine at teaching hospital in Nairobi, Kenya. U.S. university approaches with proposal:
Proposal: “We’re developing AI for early sepsis detection. We need diverse African data to make our model generalizable. Can you provide 50,000 patient records (ICU admissions, vitals, labs, outcomes)? In return, we’ll co-author you on publications and acknowledge your hospital.”
You agree. Hospital IT department exports 50,000 de-identified records, sends to U.S. university.
18 months later: - U.S. team publishes 3 papers in Nature Medicine, JAMA, Critical Care Medicine - Your hospital listed in acknowledgments (not co-authorship as promised) - AI commercialized: U.S. startup licenses technology, raises $25M Series A funding - Startup offers to sell AI back to your hospital: $50,000 annual licensing fee
Your hospital administration: “We provided the data for free, now they want us to pay $50K/year for our own data? This is exploitation.”
Question 1: What went wrong?
Classic data extraction: 1. Verbal agreement only: No written contract specifying co-authorship, IP rights, benefit-sharing 2. Data transfer without equity stake: Hospital gave away valuable asset (50K patient records) for vague promises 3. No benefit-sharing clause: U.S. team commercialized AI, Kenyan hospital received nothing 4. Insufficient oversight: Hospital didn’t involve legal team, tech transfer office before data sharing
Question 2: Do you have legal recourse?
Weak legal position: - No written contract → hard to enforce verbal promises - Data was “de-identified” → hospital may not have ownership claim - Kenya data protection laws (2019) were new, untested in court - U.S. university likely claims IP ownership (researchers developed AI)
Possible arguments: - Breach of verbal contract (co-authorship promised, not delivered) - Unjust enrichment (U.S. team profited from Kenyan data without compensation) - Data sovereignty violation (Kenya Data Protection Act requires data localization for processing)
Likely outcome: Costly litigation with uncertain results, or negotiated settlement (hospital gets discounted AI access, not equity/revenue share)
Question 3: What should you have done differently?
Before sharing data:
Written data-sharing agreement specifying: - Co-authorship: Kenyan researchers as co-first/co-corresponding authors on all publications - IP ownership: Shared IP rights (hospital owns data, university owns algorithm, jointly own AI) - Benefit-sharing: If commercialized, hospital receives equity stake (5-10%) or royalties (10-15% of revenues) - Free access: Resulting AI provided to hospital at no cost - Capacity-building: U.S. team trains 3-5 Kenyan researchers in AI development - Data governance: Data hosted in Kenya (or federated learning, no data export)
Institutional approvals: - Legal team review - Ethics committee approval - Tech transfer office involvement (protect IP) - Data protection officer review (Kenya Data Protection Act compliance)
Pilot first: Start with 1,000 records, evaluate partnership quality before sharing 50,000
Lesson: Data is valuable. Demand equitable partnerships, not exploitation. LMIC hospitals should negotiate from position of strength (you have data they need, so demand fair value).
Scenario 3: AI Exacerbating Health Disparities
You’re public health official in Ministry of Health, Bangladesh. Government considering national rollout of AI-powered telemedicine platform for rural primary care clinics.
Platform features: - Smartphone app (patients video-call physician + AI clinical decision support) - AI provides differential diagnosis, treatment recommendations based on symptoms + patient history - Targets rural areas with physician shortages (1 physician per 50,000 people)
Deployment plan: - Distribute subsidized smartphones ($100 each) to 100,000 rural households - Train 500 community health workers to assist patients using app - Budget: $15 million over 3 years
12 months post-launch, evaluation shows:
Utilization by wealth quintile:
| Wealth Quintile | % Using Telemedicine | Traditional Clinic Use | Total Healthcare Access |
|---|---|---|---|
| Richest 20% | 62% | 45% | 107% (using both) |
| Middle 60% | 28% | 38% | 66% |
| Poorest 20% | 9% | 22% | 31% |
Digital divide drivers: - Poorest 20% often lack literacy (41% illiterate) and struggle with smartphone app despite CHW assistance - Smartphone distribution focused on households with electricity (excluded 38% of poorest quintile) - Data costs (mobile internet): $2-5/month equals 5-10% of poorest households’ income, making it unaffordable despite subsidized smartphones - Language: App in Bengali only, excluded 12% of population speaking minority languages (Chittagonian, Sylheti, tribal languages)
Health equity impact: - Richest quintile: Healthcare access increased 7% (using telemedicine in addition to clinics) - Poorest quintile: Healthcare access decreased 9% (traditional clinics closed due to “telemedicine coverage”, but poorest can’t access telemedicine)
Result: AI widened health disparities by 16 percentage points between richest and poorest.
Question 1: Why did telemedicine worsen equity?
Inverse care law (Julian Tudor Hart, 1971): “The availability of good medical care tends to vary inversely with the need for it in the population served.”
Applied to AI: - Telemedicine benefited those with smartphones, literacy, internet, electricity (already better-off) - Excluded those lacking these resources (poorest, who need healthcare most) - Clinic closures harmed poorest (who depended on in-person care) while richest benefited from telemedicine
Question 2: What should have been done differently?
Equity-focused design:
Universal access pre-requisites: - Ensure electricity, internet, smartphones, literacy BEFORE telemedicine rollout - OR: Design for low-tech (SMS, voice calls, not video; feature phones, not smartphones)
Hybrid model: - Telemedicine supplements clinics (not replaces) - Maintain in-person care for those who can’t access digital tools
Targeted support for poorest: - Free data plans (not just subsidized phones) - Voice-based apps (for illiterate users) - Minority language support - CHW home visits for those unable to use technology
Equity monitoring: - Track utilization by wealth, literacy, language from Month 1 - If disparities emerge, pause deployment and redesign
Question 3: How to fix the program now?
Immediate actions:
- Reopen closed clinics in poorest areas (don’t rely solely on telemedicine)
- Free data plans for poorest quintile (government subsidy to mobile operators)
- Voice-based app version for illiterate users
- Multilingual support (Chittagonian, Sylheti, tribal languages)
- CHW-assisted telemedicine: CHWs help poorest households access platform (human-in-the-loop)
Lesson: AI can widen disparities if designed for already-privileged populations. Equity requires intentional design for most marginalized, not just “average” users. Monitor equity impacts from start and correct course quickly.
Key Takeaways
Validation ≠ Validation: AI validated in HIC populations often fails in LMICs (10-30% performance degradation common). Require local validation before deployment.
Low-Tech Often Better: MomConnect (SMS chatbot) achieved 4M+ users, 15:1 ROI. High-tech smartphone apps often fail in LMIC settings. Match technology to infrastructure.
Data Extraction ≠ Partnership: Demand benefit-sharing (equity, royalties, free access) when sharing LMIC data for AI development. Reject exploitation.
Algorithmic Bias Is Real: AI trained on Western populations shows 10-50% worse performance on underrepresented groups. Diversify training data, conduct bias audits.
Infrastructure Matters: 770M lack electricity, 2.8B lack internet. Design for offline, low-power, robust-to-poor-data-quality if targeting global health.
Equity Requires Intention: AI follows “inverse care law” and benefits privileged populations unless explicitly designed for marginalized groups. Monitor equity impacts from Day 1.
Capacity-Building > Extraction: Train local AI researchers, leave sustainable infrastructure. Extractive partnerships perpetuate dependency.
Physician Advocacy Role: Demand equitable AI development, reject unvalidated deployments, support open-source tools, partner with LMIC institutions as equals.