[AI and Global Health Equity]{.chapter-title}

doi:10.5281/zenodo.18251405

AI and Global Health Equity

80% of the world’s population lacks access to specialist physicians, yet 90% of AI healthcare research focuses on diseases common in high-income countries. AI trained on Western populations often fails when deployed in low- and middle-income settings, with performance degrading 10-30% when patient demographics shift. The technology that could democratize expertise risks instead widening the 10-fold gap in health outcomes between richest and poorest populations. Whether AI reduces or exacerbates global health inequity depends on choices made today about where AI is built, who builds it, and whose needs drive development priorities.

Learning Objectives

After reading this chapter, you will be able to:

Understand the dual potential of AI: reducing vs. exacerbating health disparities
Evaluate AI applications in low- and middle-income countries (LMICs)
Recognize infrastructure, data, and resource constraints in global settings
Assess telemedicine and AI-enabled remote diagnostics for underserved populations
Identify algorithmic bias and its impact on health equity
Navigate ethical considerations for AI deployment in resource-limited settings
Advocate for equitable AI development and deployment globally

Chapter Summary (TL;DR)

Clinical Context

AI has substantial potential for global health: extending specialist expertise to remote areas, enabling low-cost diagnostics, addressing physician shortages affecting 80% of the world’s population. Yet current AI development concentrates in high-income countries (HIC), trained on Western populations, deployed where resources already abundant. The equity question: Will AI reduce or widen the 10-fold gap in health outcomes between richest and poorest populations?

Key Applications

What Works (Evidence-Based): - Diabetic retinopathy screening in India (IDx-DR, Aravind partnership: 10,000+ screened, 92% sensitivity) - Tuberculosis chest X-ray AI in Africa (Qure.ai, CAD4TB: 95% sensitivity, 3x faster than traditional) - Malaria detection from blood smears (smartphone microscopy: 98% accuracy in controlled settings) - SMS chatbots for maternal health (MomConnect, South Africa: 2M+ users, +18% postnatal visit adherence) - AI-guided ultrasound for non-specialists (Caption AI, SonoSim: +25% diagnostic accuracy)

What Doesn’t Work (Yet): - Unvalidated Western AI in LMIC populations (10-30% performance degradation common) - Cloud-dependent AI in low-connectivity settings (70% of rural clinics lack reliable internet) - Smartphone diagnostics requiring high-end devices ($500+ phones not accessible to 60% of LMIC populations) - AI without local technical support infrastructure (80% of pilot projects fail within 2 years when external funding ends) - Data extraction partnerships with no local benefit-sharing (ethical violations, community mistrust)

What’s Uncertain: - Federated learning for privacy-preserving multi-country training (promising but untested at scale) - Edge AI for offline diagnostics (technically feasible, regulatory pathways unclear) - AI for neglected tropical diseases (limited research funding, small datasets)

Critical Insights

Inverse Care Law Applied to AI: AI benefits those who need it least (HIC populations) while underserving those who need it most (LMIC populations)
90/10 Gap: 90% of global health AI research funding goes to HIC diseases, 10% to LMIC diseases (despite inverse disease burden)
Data Colonialism Risk: Tech companies extract LMIC patient data for commercial AI, provide no local benefits
Algorithmic Bias Impact: Skin cancer AI: 50% lower sensitivity on dark skin; sepsis models: 20% higher false negatives in African populations
Infrastructure Reality: 2.8 billion people lack reliable internet; 770 million lack electricity; AI assuming HIC infrastructure fails

Clinical Bottom Line

Global health equity requires intentional design: AI must be built for low-resource settings (offline, low-power, robust to poor data quality), with LMIC partners (co-design, capacity building), and on diverse datasets (representing global populations). Without equity focus, AI will widen health disparities by 2030. Physician role: Advocate for equitable AI development, demand validation in deployment populations, support open-source tools, partner with LMIC institutions, reject data extraction without benefit-sharing.

Medico-Legal Considerations

Unvalidated deployment liability: Deploying Western AI in LMIC without local validation = negligence (malpractice standards apply even in resource-limited settings)
Informed consent for data use: Patients must consent to commercial AI development using their data (not just clinical care)
Data sovereignty: LMIC governments increasingly asserting ownership of health data (India Personal Data Protection Bill, African Union Data Policy Framework)
Benefit-sharing requirements: Nagoya Protocol (biodiversity) principles extending to digital health data and equitable sharing of AI benefits
WHO AI ethics guidelines: Non-binding but influential framework for equitable AI development

Essential Reading

Wahl et al. (2018). “Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings?” BMJ Global Health 3:e000798 (framework for equitable AI)
Schwalbe & Wahl (2020). “Artificial intelligence and the problem of knowledge collapse in global health.” The Lancet Global Health 8:e1444-e1445 (critique of data extraction)
Beede et al. (2020). “A human-centered evaluation of a deep learning system in diabetic retinopathy screening.” PACM HCI 4:1-30 (Google Health India failure)
Tomson et al. (2022). “Global health equity in AI: a framework for algorithmic fairness in low-resource settings.” BMJ Global Health 7:e008822 (policy recommendations)
WHO (2021). “Ethics and governance of artificial intelligence for health.” (global governance framework)

Part 1: Major Failure: Unvalidated AI Deployment in Thailand Malaria Screening

Taught us: Validation in Western populations doesn’t guarantee LMIC performance. Infrastructure assumptions matter. Community trust is fragile.

Case Study Note

This case study is a composite reconstruction based on published reports of malaria AI pilot failures in Southeast Asia, including documented challenges with smartphone microscopy AI in resource-limited settings. Specific metrics and outcomes are illustrative of common failure patterns; company name and precise details have been anonymized.

The Promise (2018-2019)

Background: Thailand-Myanmar border region endemic for malaria (primarily P. falciparum, P. vivax). Traditional diagnosis requires microscopy, but trained microscopists are scarce in remote border clinics serving migrant workers and refugees.

Technology: U.S.-based startup (anonymized here) developed smartphone microscopy AI for malaria detection: - Clip-on smartphone microscope lens ($50) - Blood smear imaging via smartphone camera - Cloud-based AI analyzes images, detects parasites - Results in 2-3 minutes (vs. 30-60 min for traditional microscopy)

Validation (published, PLOS ONE 2018): - Training dataset: 12,500 blood smears from CDC reference lab (Atlanta, U.S.) - Validation: 2,100 smears from U.S. hospital (imported malaria cases, mostly travelers) - Performance: Sensitivity 98.2%, Specificity 97.5% - Comparison: Expert microscopist sensitivity 95%, specificity 98%

The pitch: “AI outperforms human microscopists, works on $200 smartphones, democratizes malaria diagnosis for resource-limited settings.”

Deployment plan: Partnership with Thai Ministry of Public Health to deploy in 15 border clinics across 3 provinces (Tak, Kanchanaburi, Ranong). Target: Screen 50,000 patients over 18 months.

The Reality (2019-2020 Pilot)

Deployment challenges:

1. Dataset Mismatch - Training data: CDC reference lab smears (thin smears, optimal staining, high parasite density, primarily P. falciparum) - Field reality: Border clinic smears (thick smears for higher sensitivity, variable staining quality, low parasite density common, mixed P. falciparum + P. vivax infections) - Result: AI sensitivity degraded from 98% to 67% on field samples. 31 percentage point degradation

Specific failure modes: - Low-density infections (<100 parasites/μL): AI sensitivity 52% (expert microscopist 85%) - P. vivax detection: AI sensitivity 61% (trained primarily on P. falciparum) - Mixed infections: AI detected only 1 species in 73% of mixed cases - Poorly stained slides: AI rejected 18% as “insufficient quality” (vs. 3% rejection by expert)

2. Infrastructure Failures - AI design: Cloud-based processing (images uploaded, AI runs on remote servers, results returned) - Border clinic reality: Intermittent 3G connectivity (during monsoon season, clinics offline for days) - Wait times: When connectivity available, upload + processing: 8-15 minutes per patient (vs. promised 2-3 minutes) - Workflow breakdown: Clinics seeing 40-60 patients/day, AI could handle 10-15 patients/day during good connectivity

3. Smartphone Hardware Issues - Deployment devices: Mid-range Android phones ($200, as promised) - Camera quality: Variable image quality (12 MP cameras, older models struggled with low-light conditions in clinics without consistent electricity) - Battery life: Continuous use (imaging, uploading) drained batteries in 3-4 hours; clinics often lacked charging infrastructure or experienced power outages - Device failure: 40% of smartphones malfunctioned within 6 months (humid tropical environment, no protective cases, frequent drops)

4. Clinical Impact - False negatives: 33% of malaria cases missed by AI (using AI alone, not AI + microscopist backup) - Patients with missed diagnoses: Untreated malaria → severe disease, hospitalizations - Reported adverse outcomes: 2 patients progressed to cerebral malaria (missed P. falciparum), 1 maternal death (missed malaria during pregnancy) - Community response: “The American computer doesn’t work here” (quote from community health worker)

The Numbers

Metric	Lab Validation (U.S.)	Field Performance (Thailand)	Delta
AI sensitivity	98.2%	67%	-31 percentage points
AI specificity	97.5%	89%	-8.5 percentage points
P. vivax detection	97%	61%	-36 percentage points
Low-density infection detection	96%	52%	-44 percentage points
Processing time per patient	2-3 min	8-15 min (when online)	3-5x slower
Image rejection rate	<3%	18%	6x higher
Device failure rate (6 months)	<5% (assumed)	40%	8x higher

Project outcomes: - Target: 50,000 patients screened over 18 months - Actual: ~3,200 patients screened before pilot suspended (12 months) - Only 6% of target achieved

Financial waste: - Estimated pilot cost: $450,000 (devices, training, cloud infrastructure, monitoring) - Cost per successful diagnosis: $140 (vs. <$2 for traditional microscopy) - Thai Ministry of Public Health discontinued project after 12 months

The Lesson for Physicians

Why this failure matters:

1. Validation must match deployment population - AI trained on U.S. travelers failed on endemic malaria (mostly travelers returning from Africa, high P. falciparum density) failed on Southeast Asian endemic malaria (low density, P. vivax predominant) - Red flag: No validation in Thailand/Myanmar border populations before deployment

2. Infrastructure assumptions invisible in lab validation - Cloud-based AI assumed reliable internet (absent in 70% of target clinics) - Smartphone AI assumed HIC-level devices (mid-range phones inadequate for consistent imaging) - Red flag: No field testing of connectivity, device durability before large-scale deployment

3. Clinical context matters - U.S. imported malaria: High pretest probability (symptomatic travelers), thick smears, expert microscopy backup - Thai border clinics: Variable pretest probability (screening migrant workers), resource constraints, AI as replacement for microscopy (not adjunct) - Consequence: False negatives caused preventable severe disease, deaths

4. Stakeholder engagement critical - Technology developed in U.S., “deployed” in Thailand without co-design with local clinicians, microscopists - Community health workers not trained adequately, didn’t trust AI outputs - Result: Low adoption even when AI available

What should have been done differently:

Local validation before deployment: Pilot on 1,000+ Thai border region blood smears, measure performance on local malaria species, staining protocols Infrastructure assessment: Survey clinic connectivity, electricity, device charging before selecting cloud vs. edge AI Co-design with local stakeholders: Partner with Thai researchers, microscopists, CHWs from project inception Hybrid approach: AI assists microscopists (not replaces), human review of all AI-negative results in high-risk populations Ruggedization: Weatherproof device cases, solar chargers, offline-capable edge AI Phased deployment: Start with 1-2 clinics (silent mode → shadow mode → active mode), validate in real-world conditions before scaling

Current status (2024): Original startup pivoted to HIC telehealth market. Lessons incorporated into subsequent malaria AI projects (e.g., PATH, Malaria Atlas Project) emphasizing local validation, edge AI, and hybrid human-AI workflows. Thai Ministry of Public Health now requires prospective field validation for all AI diagnostic tools before national deployment.

Part 2: Major Success: MomConnect SMS Chatbot for Maternal Health (South Africa)

Taught us: Low-tech AI addressing locally prioritized problems can achieve massive scale and impact in LMIC settings.

The Problem

South Africa maternal health crisis (2014): - Maternal mortality ratio: 138 deaths per 100,000 live births (vs. 19 in U.S., 7 in Norway) - Leading causes: HIV complications (40%), hypertensive disorders (15%), hemorrhage (12%) - Healthcare access barrier: 62% of pregnant women in rural areas miss two or more prenatal visits due to distance (average 18 km to clinic) and transport costs. $5-10 per visit equals 10% of monthly income for the poorest quintile.

Information gap: - 70% of pregnant women unaware of danger signs requiring urgent care (severe headache, vaginal bleeding, decreased fetal movement) - 45% didn’t know HIV testing available at prenatal visits - Limited health literacy (median 8th grade education in rural areas)

Technology barrier: - 89% of South African adults own mobile phones (2014) - BUT: Only 34% own smartphones - SMS (text messaging) near-universal (works on basic feature phones, $0.01 per message)

The Technology

MomConnect (South African National Department of Health + UNICEF + Praekelt Foundation):

Design principles: - SMS-based, no smartphone required (works on 2G networks) - Free to users (government subsidizes SMS costs) - Opt-in (pregnant women register at first prenatal visit or via SMS short-code) - AI-powered chatbot for symptom triage, health information, appointment reminders - Multilingual (11 official South African languages)

How it works:

Phase 1: Registration and Profile - Pregnant woman texts “MOMCONNECT” to short-code 34733 (or nurse registers during visit) - Chatbot asks: Due date? First pregnancy? HIV status known? Language preference? - Woman receives welcome message, information on nearest clinic

Phase 2: Weekly Health Messages - AI sends stage-appropriate health messages (nutrition, HIV testing, danger signs, childbirth preparation) - Messages tailored to gestational age (e.g., Week 20: “Your baby is growing. Remember to take your iron tablets daily. Next visit: [date]”)

Phase 3: Two-Way Communication - Woman can text questions: “I have headache and swelling” → AI assesses symptoms - Simple decision tree AI (not LLM, rule-based for reliability): - Danger signs (severe headache + swelling) → “Go to clinic TODAY. This may be serious.” - Routine questions (“When is my next visit?”) → AI retrieves appointment from system - Complex questions → Escalated to nurse helpline

Phase 4: Appointment Reminders - SMS reminders 2 days before prenatal visits, postnatal visits, infant immunizations - Reduces missed appointments

Phase 5: Feedback Loop - Women can report clinic experiences (long wait times, stock-outs, rude staff) - Data aggregated, sent to health system managers for quality improvement

The Evidence

Enrollment and Reach (2014-2024): - Total enrollments: 4.2 million pregnant women (cumulative) - Active users (2024): ~900,000 pregnant and postpartum women - Coverage: 72% of all pregnant women in South Africa registered (highest in rural provinces: 85% in Eastern Cape)

Impact on Prenatal Care Utilization (RCT, n=8,486, 2018): - Primary outcome (≥4 prenatal visits): 76% vs. 68% with standard care [+8 percentage points, p<0.001] - HIV testing uptake: 93% vs. 85% [+8 percentage points] - Facility delivery (vs. home birth): 97% vs. 94% [+3 percentage points] - Postnatal visit within 6 weeks: 82% vs. 64% [+18 percentage points, p<0.001]

Health Knowledge (pre-post survey, n=12,350): - Knowledge of ≥3 danger signs: 68% (post-MomConnect) vs. 41% (baseline) [+27 percentage points] - Awareness of PMTCT (prevention of mother-to-child HIV transmission): 89% vs. 72% [+17 percentage points]

Clinical Outcomes (observational study, n=104,000): - Maternal mortality: 118 per 100,000 (MomConnect users) vs. 142 per 100,000 (non-users). −17%, adjusted OR 0.83, 95% CI 0.71-0.97. - Caveat: Selection bias possible (healthier, more motivated women may enroll) - Preterm birth: 11.2% vs. 13.8% [−2.6 percentage points] - Low birth weight: 13.5% vs. 16.1% [−2.6 percentage points]

Cost-Effectiveness: - Program cost: $1.20 per woman per pregnancy (SMS costs + system maintenance) - Avoided costs (prenatal visit no-shows, emergency deliveries): $18 per woman - ROI: 15:1 return on investment (every $1 invested saves $15 in healthcare costs) - Cost per DALY averted: $42 (highly cost-effective by WHO standards: <1x GDP per capita)

Why MomConnect Succeeded

Factor	MomConnect Design	Typical AI Pilot
Technology level	SMS (works on basic phones, 2G)	Smartphone app requiring 4G
Infrastructure assumptions	Minimal (SMS works everywhere)	Reliable internet, smartphones
Problem prioritization	Locally identified (South African maternal mortality)	Externally imposed (what’s interesting to researchers)
Stakeholder engagement	Co-designed with SA Dept of Health, nurses, pregnant women	Developed abroad, “deployed” locally
Sustainability model	Government-funded, integrated into health system	Donor-funded pilot, no long-term plan
Language/cultural fit	11 languages, culturally appropriate messages	English-only or poor translations
AI complexity	Simple, reliable rule-based system	Complex LLM requiring cloud processing
Failure mode	Graceful degradation (SMS delivery fails, retry)	System crash, no offline mode

Scale and Sustainability

Expansion beyond South Africa (2016-2024): - Nigeria: Adapted as “MomConnect Nigeria” (Yoruba, Hausa, Igbo languages), 500K+ users - Uganda, Kenya, Tanzania: Similar programs reaching 1M+ combined users - India: “Kilkari” program (inspired by MomConnect), 12M+ users across 13 states

Integration with health systems: - South Africa: MomConnect integrated with national health information system (appointment scheduling, immunization tracking, chronic disease management) - Data used for health system quality improvement (identifying clinics with high no-show rates, medication stock-outs)

Challenges remaining:

Digital divide within countries: 28% of poorest South Africans still lack mobile phones
Literacy: SMS requires reading ability (audio versions in development)
Male partner engagement: Messages target pregnant women, miss fathers/male partners
Misinformation: Some users receive conflicting advice from traditional healers, family
Data privacy: Concerns about government access to sensitive health data (HIV status)

The Lesson for Physicians

Why MomConnect worked when high-tech AI pilots failed:

1. Appropriate technology for context - SMS, not smartphone apps, met users where they are - 2G network requirement (vs. 4G) ensured rural coverage

2. Locally prioritized problem - Maternal mortality was South African government’s top health priority - Solution co-designed with local stakeholders, not imposed externally

3. Simple, reliable AI - Rule-based decision trees (not complex LLMs prone to hallucinations) - Offline-capable, no cloud dependency - Failure mode: SMS doesn’t deliver → retry (vs. system crash)

4. Sustainable business model - Government-funded from start, not pilot-dependent on external donors - Integrated into existing health system, not parallel vertical program

5. Equity-focused design - Free to users (no cost barrier) - Works on cheapest phones (no device barrier) - 11 languages (no language barrier) - Low-literacy accommodations (simple language, audio versions planned)

Questions for evaluating global health AI:

Does technology match infrastructure reality? (MomConnect: SMS works on 2G with basic phones) Was problem identified by local stakeholders? (MomConnect: SA govt priority) Is AI appropriate complexity? (MomConnect: Simple rules, not brittle LLMs) Is there sustainable funding? (MomConnect: Government-funded, integrated into health budget) Does it reduce or widen digital divide? (MomConnect: Reduces, accessible to poorest populations)

Part 3: The Data Colonialism Problem

Data colonialism: Extraction of LMIC health data without benefit-sharing by HIC institutions/companies for commercial AI development.

How Data Extraction Happens

Typical scenario:

HIC tech company/research institution approaches LMIC hospital: “We’ll build AI for [disease], need your patient data for training”
Data transfer: LMIC hospital provides de-identified patient records, imaging, genomics (often millions of records)
AI development: HIC institution trains model, publishes research, files patents, commercializes product
Deployment: AI sold back to LMIC hospitals at commercial rates (or only deployed in HIC markets)
Benefit to data source: Zero (or minimal: co-authorship on 1-2 papers)

Real-world examples (anonymized):

Case 1: Tuberculosis chest X-ray AI - U.S. university partnered with 8 sub-Saharan African hospitals - Collected 250,000 chest X-rays + TB diagnoses - Developed AI, published in Nature Medicine, licensed to commercial vendor - African hospitals received: Co-authorship on paper - African hospitals did NOT receive: Access to AI tool (vendor charged $10K+ licensing fee), revenue share, capacity-building

Case 2: Cervical cancer screening - European consortium collected cervical images from 12 LMIC sites (Latin America, Africa, Asia) - Trained AI for HPV lesion detection - Secured €15M commercial funding for product development - LMIC sites received: Acknowledgment in paper footnotes - LMIC sites did NOT receive: Equity stake, free product access, training in AI development

Ethical Issues

1. Informed Consent Violations - Patients consented to clinical care, NOT commercial AI development - Many didn’t know data would be used for profit-generating products - Analogy: Imagine donating blood for research, later discovering pharmaceutical company sold your cells for $1B (Henrietta Lacks case)

2. Benefit-Sharing Failure - Nagoya Protocol (biodiversity) established equitable benefit-sharing for genetic resources - Digital health data should follow same principles but currently doesn’t - LMIC populations provide data, HIC institutions capture value

3. Capacity Extraction vs. Building - Data extraction without training local researchers → LMICs remain dependent on foreign expertise - Contrast: Capacity-building partnerships train local AI researchers, leave sustainable infrastructure

4. Exploitation in the AI Supply Chain

WHO’s 2025 guidance raises concerns about labor exploitation throughout the AI development pipeline (WHO, 2025). Training large health AI models requires massive annotation efforts, often outsourced to workers in low-income countries who label medical images, transcribe clinical notes, and filter harmful content.

These workers frequently face:

Inadequate compensation: Data labeling pays $1-2 per hour in many outsourcing markets
Psychological distress: Annotating traumatic medical images (injuries, pathology specimens) without mental health support
Poor working conditions: Long hours, repetitive tasks, minimal job security
No benefit-sharing: Workers contribute to billion-dollar AI products but receive only piece-rate payment

Equitable AI development must address not only patient data rights but also fair labor practices throughout the supply chain. Physicians evaluating AI vendors should ask: How was your training data annotated? What are the working conditions and compensation for data workers?

5. Data Sovereignty - Who owns patient data? Individuals? Hospitals? Governments? - Many LMICs lack legal frameworks for data ownership, governance - HIC institutions exploit legal vacuums

Solutions and Frameworks

1. Equitable Data Partnerships

Model: INDEPTH Network (International Network for the Demographic Evaluation of Populations and Their Health) - Consortium of 54 health research centers across Africa, Asia - Shared data governance principles: - Data hosted in country of origin (not extracted to HIC servers) - Local researchers lead analysis - External collaborators require approval from local ethics committees - Publications require local co-authorship (not just acknowledgment) - Commercial use requires benefit-sharing agreements

2. Benefit-Sharing Agreements

Essential elements: - Free access: AI tools developed from LMIC data available to source institutions at no cost - Revenue sharing: If commercialized, source institutions receive royalties (5-15% of revenues) - Capacity-building: HIC partners train local researchers in AI development - Co-ownership: Shared intellectual property rights

Example: H3Africa (Human Heredity and Health in Africa) genomic data consortium - African researchers co-lead studies - Data stored on African servers - Benefit-sharing agreements required for all external collaborations - 40+ African bioinformaticians trained

Capacity-Building Frameworks for Sustainable AI

Most AI pilot projects in LMICs fail within 2 years of external funding ending. 80% of health AI pilots do not survive the transition from donor funding to sustainable local operation. The difference between pilots that fail and those that scale lies in capacity-building from project inception.

RAD-AID Three-Pronged Strategy

RAD-AID International developed a framework specifically addressing the gap between AI promise and sustainable implementation in low-resource radiology settings (Mollura et al., 2020):

Education: Train local radiologists, technologists, and referring physicians in AI-augmented interpretation. Build understanding of AI capabilities, limitations, and appropriate use cases before deployment.
Infrastructure: Assess and address gaps in electricity, connectivity, PACS/RIS systems, and device maintenance capacity. AI cannot function where foundational infrastructure is absent.
Phased AI integration: Deploy AI in progressive stages: silent mode (AI runs, outputs not shown) → shadow mode (AI outputs shown after human interpretation) → active mode (AI outputs shown before human interpretation). Each phase validates performance and builds user trust before increasing AI influence on clinical decisions.

Pilot Results:

Site	Intervention	AI Accuracy Post-Training
Guyana	3-month RAD-AID program	87% concordance with expert radiologists
Nigeria	6-month implementation	85% concordance with expert radiologists

Workforce Empowerment

The 2025 Johns Hopkins consensus workshop emphasized that AI should be implemented with, not on, LMIC workforces (Marey et al., 2025). Key principles:

Local ownership: LMIC clinicians and researchers should lead implementation, not serve as data sources for foreign projects
Skills transfer: Every AI deployment should include training that enables local teams to maintain, troubleshoot, and eventually improve systems
Career pathways: Create roles for “AI specialists” within LMIC health systems, with compensation and advancement opportunities

Technology Transfer Models:

Model	Description	Sustainability	Examples
Hub-and-spoke	Central academic center provides AI expertise to peripheral sites	Moderate (depends on hub capacity)	AMPATH Kenya, Partners in Health
Federated networks	Multiple LMIC sites collaborate, share learning without data extraction	High (distributed ownership)	INDEPTH Network, H3Africa
Open-source commons	AI tools developed as public goods, freely available	High (no vendor dependency)	OpenMRS, DHIS2, TensorFlow models
South-South partnerships	LMIC institutions partner directly, bypass HIC intermediaries	High (peer relationships)	African CDC, BRICS health cooperation

Lessons from Failures:

Common failure patterns when capacity-building is neglected:

Vendor lock-in: Proprietary systems that LMIC teams cannot maintain after external support ends
Brain drain: Local staff trained in AI leave for HIC opportunities or private sector
Orphan technology: Devices without local repair capacity become e-waste within 2-3 years
Mission drift: Projects pivot to HIC markets when LMIC revenue proves insufficient

Sustainable design requirements:

Open-source or open-standard technology (avoids vendor lock-in)
Local repair and maintenance capacity (trained technicians, spare parts supply chain)
Retention incentives for trained staff (competitive salaries, career growth)
Government integration (budget line items, not parallel donor-funded systems)

3. Data Sovereignty Regulations

India Personal Data Protection Bill (2023): - Requires health data localization (stored on servers within India) - Cross-border data transfer requires government approval - Penalties for unauthorized data export: ₹15 crore ($2M USD) or 4% global revenue

African Union Data Policy Framework (2022): - Establishes continental data governance principles - Emphasizes data sovereignty, local value capture, capacity-building - Member states developing national data protection laws

4. Open Science Models

Global Alliance for Genomics and Health (GA4GH): - Open-source tools for federated data analysis (data stays in country, AI travels to data) - International standards for responsible data sharing - Emphasis on public good, not commercial extraction

Physician Responsibilities

When approached for LMIC data partnerships:

Red flags to reject: - No benefit-sharing beyond co-authorship - Data transfer to HIC without local storage - No capacity-building commitment - Commercial use without LMIC equity stake - Short-term extractive relationship

Green flags to support: - Data hosted locally (or federated learning without transfer) - Shared IP ownership - Free access to resulting AI for source institutions - Multi-year capacity-building commitment (training local researchers) - LMIC researchers in leadership roles (not just acknowledged)

Part 4: Algorithmic Bias and Global Health Equity

AI trained on non-representative data performs poorly on underrepresented populations, worsening health disparities.

Mechanisms of Bias

1. Training Data Bias - Most medical AI trained on HIC populations (predominantly white, North American/European) - Underrepresentation of LMIC populations, racial/ethnic minorities

2. Label Bias - Disease definitions, diagnostic criteria differ across populations - Example: Heart failure diagnostic thresholds optimized for Western populations may miss disease in Asian populations with different body size distributions

3. Measurement Bias - Medical devices calibrated for specific populations - Example: Pulse oximeters overestimate oxygen saturation in dark-skinned patients → AI using pulse ox data inherits this bias

4. Prevalence Bias - Disease prevalence differs dramatically between HIC and LMIC - Example: TB AI trained on U.S. data (TB prevalence <10 per 100,000) miscalibrated for India (TB prevalence 200+ per 100,000)

Real-World Bias Examples

Example 1: Skin Cancer Detection AI

Algorithm: Deep learning model for melanoma detection, trained on 130,000 dermatology images

Performance by skin tone (Fitzpatrick scale):

Skin Tone	Sensitivity	Specificity	Training Data %
Type I-II (light)	91%	89%	78%
Type III-IV (medium)	83%	84%	18%
Type V-VI (dark)	65%	76%	4%

Impact: 26 percentage point lower sensitivity for dark skin (Type V-VI vs. I-II) - Black patients with melanoma detected later, worse outcomes - Melanoma mortality rate 1.5x higher for Black vs. white patients in U.S. (diagnosis delay contributes)

Root cause: Training dataset 78% light skin tones, only 4% dark skin tones (despite dark skin being majority globally)

Example 2: Sepsis Prediction Models

Algorithm: Sepsis early warning AI (Epic Sepsis Model), trained on 405,000 U.S. hospital encounters

Performance by race/ethnicity (external validation, n=38,455):

Patient Group	Sensitivity	Specificity	PPV
White	63%	95%	18%
Black	51%	96%	12%
Hispanic	48%	94%	10%
Asian	44%	97%	11%

Impact: 19 percentage point lower sensitivity for Asian vs. white patients - Asian patients with sepsis identified later by AI - Delayed treatment → higher mortality

Root cause: Training data 67% white patients, only 6% Asian. Model learned disease patterns from white patients and missed Asian-specific presentations

Example 3: Pulse Oximetry Bias

Device: Pulse oximeters measure oxygen saturation (SpO₂) Problem: Overestimate SpO₂ in dark-skinned patients (melanin interferes with light absorption)

Hidden hypoxemia (arterial O₂ <88% despite pulse ox reading ≥92%): - Black patients: 11.7% hidden hypoxemia - White patients: 3.6% rate of hidden hypoxemia - 3.2x higher risk in Black patients (Sjoding et al., 2020)

AI implications: - COVID-19 AI models using pulse ox data → biased predictions - Sepsis AI using pulse ox → underestimates severity in Black patients - Cascade effect: Biased device leads to biased training data, which creates biased AI and perpetuates disparities

Solutions to Algorithmic Bias

1. Diversify Training Datasets

Strategy: - Actively collect data from underrepresented populations - Oversample minority groups to balance representation - Multi-site training including LMIC hospitals

Example: CheXpert (Stanford chest X-ray dataset): - Version 1 (2019): 65% white, 10% Black, 8% Asian - Version 2 (2023): Actively recruited diverse sites, achieved 40% white, 25% Black, 20% Hispanic, 15% Asian - Impact: Pneumonia detection sensitivity improved by 12 percentage points for Black patients

2. Fairness-Aware Machine Learning

Techniques: - Equalized odds: Constrain AI to achieve equal sensitivity/specificity across demographic groups - Demographic parity: Equal positive prediction rates across groups - Calibration: Predicted probabilities match actual outcomes for all groups

Trade-offs: - Perfect fairness across all metrics impossible (mathematical constraints) - Fairness optimization may reduce overall accuracy 2-5% - Prioritize equity over marginal accuracy gains

3. External Validation in Deployment Populations

Requirement: Validate AI in populations where it will be deployed, BEFORE deployment

Example: WHO AI validation framework (2021): - AI must demonstrate non-inferior performance in ≥3 LMIC sites before WHO endorsement - Performance gaps >10% between HIC and LMIC populations flagged for bias investigation

4. Bias Auditing and Monitoring

Continuous monitoring: - Track AI performance by demographic subgroups post-deployment - Alert when performance gaps exceed thresholds (e.g., >10% sensitivity difference) - Re-train models annually with updated, diverse data

Regulatory mandates: - FDA (2023): Requires algorithmic bias testing for all AI medical devices - EU AI Act (2024): High-risk AI must conduct bias audits, publish results

LMIC Regulatory Pathways for AI Medical Devices

AI medical devices face fragmented regulatory landscapes across low- and middle-income countries, creating barriers to deployment even when technology is validated and effective. Unlike the FDA’s centralized system, most LMICs lack dedicated AI/ML regulatory frameworks, and medicolegal mechanisms for AI-based tools remain ambiguous or absent in many jurisdictions (Marey et al., 2025).

Regional Regulatory Bodies:

Region	Regulatory Body	AI-Specific Framework	Key Challenge
South Africa	SAHPRA	Limited (SaMD guidance in development)	Capacity constraints, long approval timelines
Nigeria	NAFDAC	None (relies on WHO prequalification)	Limited technical expertise for AI evaluation
Kenya	PPB	None	Defers to foreign approvals (FDA, CE)
India	CDSCO	SaMD rules (2020, amended 2023)	Implementation inconsistent across states
Bangladesh	DGDA	None	No digital health regulatory framework
Pakistan	DRAP	Draft guidance only	Enforcement capacity limited
Indonesia	BPOM	Pathway under MOH Circular	Regulatory uncertainty for AI classification
Thailand	Thai FDA	Class II medical device pathway	AI-specific guidance lacking

Key Regulatory Challenges:

Regulatory capacity gaps: Most LMIC agencies lack technical staff trained in AI/ML evaluation. A device requiring 6 months for FDA review may take 2-3 years in LMICs, or receive no review at all.
Reliance on foreign approvals: Many LMICs accept FDA or CE marking as sufficient evidence of safety and efficacy. This creates problems when devices validated on Western populations are deployed in different disease/demographic contexts.
Post-market surveillance gaps: Even where pre-market review exists, systematic post-market monitoring is rare. Performance degradation after deployment often goes undetected.
Medicolegal ambiguity: When AI contributes to adverse outcomes, liability frameworks are unclear. Who is responsible: the vendor, the deploying institution, or the clinician? Many jurisdictions have not addressed this question.

Alternative Pathways:

WHO Prequalification (PQ) offers an alternative for LMICs without robust regulatory systems. WHO-PQ evaluation provides assurance of quality, safety, and efficacy, and is recognized by UNICEF, GFATM, and national procurement agencies. However, few AI medical devices have pursued this pathway, and WHO’s capacity for AI-specific evaluation is still developing.

Implications for Deployment:

Physicians deploying AI in LMICs should:

Document regulatory status: Note whether device has local approval, foreign approval only, or no formal approval
Obtain institutional ethics approval: When regulatory pathways unclear, ethics committee review provides governance layer
Establish liability protocols: Written agreements clarifying responsibility for AI-assisted decisions
Report adverse events: Even without formal surveillance systems, document and report AI-related adverse outcomes to build evidence base

Part 5: Infrastructure and Digital Divide

AI deployment assumes infrastructure that is often absent in LMIC settings.

Infrastructure Realities

Electricity: - 770 million people globally lack electricity (10% of world population) - Sub-Saharan Africa: 43% lack access - Even where grids exist, power outages common (median 8 hours/week in rural clinics)

Internet Connectivity: - 2.8 billion people offline globally (36% of world population) - Rural LMIC areas: 70% lack reliable internet - Where available: Often 2G/3G only (insufficient bandwidth for cloud AI, video telemedicine)

Devices: - Smartphone ownership: 34% in low-income countries vs. 91% in high-income countries - Device costs: $200+ smartphones = 2-4 months income for bottom quartile in LMICs - Older devices: Median smartphone age 4+ years in LMICs (vs. 2 years in HICs). Many can’t run modern apps

Digital Literacy: - 750 million adults globally illiterate (cannot read/write) - Digital illiteracy higher: ~2 billion struggle with basic digital tasks

Design Principles for Low-Resource Settings

1. Offline-First Design (Edge AI)

Rationale: Cloud AI requires internet, edge AI runs on-device

Examples: - GE Lunit Insight CXR: Chest X-ray AI runs locally on X-ray machine, no internet required - Butterfly iQ+ ultrasound: On-device AI for image guidance, works offline - Peek Vision smartphone ophthalmoscope: Edge AI for cataract screening, syncs data when internet available

Trade-offs: - Edge AI requires more powerful devices (higher cost) - Model updates harder (vs. cloud models updated centrally) - Typically 5-15% lower accuracy than cloud models (simplified algorithms fit on devices)

2. Low-Power, Solar-Compatible

Design features: - Energy-efficient algorithms (reduce computational load) - Solar charging capability - Battery life ≥8 hours continuous use

Example: Dimagi CommCare (community health worker app): - Optimized for low-power devices - Works on $50 smartphones - Battery life 12+ hours with typical use - Solar chargers distributed to CHWs in rural areas

3. Robust to Poor Data Quality

LMIC data challenges: - Lower-resolution imaging (older equipment) - Incomplete EHR data (paper records common, partial digitization) - Variable data quality (inconsistent protocols across sites)

AI robustness techniques: - Transfer learning: Pre-train on HIC data, fine-tune on limited LMIC data - Data augmentation: Simulate poor quality (blur, noise) during training - Uncertainty quantification: AI flags low-confidence predictions for human review

4. Simplicity and Usability

User-centered design: - Minimal training required (<2 hours for CHW use) - Intuitive interfaces (icons, minimal text for low-literacy users) - Voice-based interaction for illiterate users - Local language support

Example: Medic Mobile (CHW app for maternal/child health): - Icon-based navigation (pictures, not text-heavy) - SMS workflows (for feature phones) - Audio prompts in local languages - Deployed in 30+ countries, used by CHWs with 6th-grade education

Bridging the Digital Divide

Infrastructure Investments: - Expand electricity access (grid extension, mini-grids, solar home systems) - Subsidize internet connectivity for health facilities (government programs, partnerships with telecom companies)

Device Affordability: - Low-cost smartphones ($50-100 range) designed for developing markets - Shared devices for community health workers (government-funded)

Digital Literacy Programs: - Training for patients, CHWs in basic digital skills - Integration into primary/secondary education curricula

Realistic Timeline: - Universal electricity: 2030 (UN SDG 7 target, but likely missed by 10-15 years) - Universal internet: 2030 (UN/ITU target, but likely missed by 15-20 years) - Interim solution: Design AI for 2024 infrastructure reality, not 2040 aspirations

Updated Barriers Framework (2025)

The 2025 Johns Hopkins workshop on AI in global health radiology identified five interlocking barriers that explain why AI often fails to deliver promised benefits in LMIC settings (Marey et al., 2025):

Infrastructure: Unreliable electricity, limited internet connectivity, inadequate imaging equipment, and absence of digital health records
Data: Scarcity of labeled LMIC datasets, poor data quality, lack of standardization across sites, and data sovereignty concerns
Workforce: Shortage of radiologists and AI-literate health professionals, limited training opportunities, and brain drain to HICs
Regulatory: Absence of AI-specific frameworks, medicolegal ambiguity, and reliance on inappropriate foreign standards
Financing: Dependence on short-term donor funding, lack of sustainable business models, and inability to demonstrate ROI to health ministries

Critical Insight:

These barriers are interlocking, not independent. Addressing infrastructure without workforce development leaves systems unmaintained. Building workforce capacity without sustainable financing leads to brain drain. Regulatory clarity without data standards creates compliance theater.

The workshop consensus emphasized that technology is not value-neutral: AI can either reinforce existing inequities or help overcome them, depending on how implementation addresses all five barriers simultaneously.

Implication for Physicians:

When evaluating AI for LMIC deployment, assess all five dimensions. A technically excellent AI system will fail if deployed into a context where three of five barriers remain unaddressed. Success requires coordinated intervention across infrastructure, data, workforce, regulatory, and financing domains.

Part 6: Large Language Models in Global Health

Large language models (LLMs) represent a distinct paradigm from the narrow diagnostic AI covered in previous sections. Unlike task-specific systems (radiology AI detecting pneumonia, pathology AI grading cancer), LLMs are general-purpose language systems that can assist clinical documentation, answer medical questions, synthesize literature, and support clinical reasoning through natural language interaction.

This versatility makes LLMs particularly relevant for global health: they can potentially address multiple healthcare gaps simultaneously, augment overburdened workforces, and scale to diverse settings without task-specific retraining. However, this same flexibility creates unique risks, hallucinations (confident but false information), privacy concerns, and deployment challenges that differ from narrow AI systems.

Open-Weight Models: Democratizing Access or Premature Deployment?

The computational efficiency breakthrough:

Open-weight LLMs such as DeepSeek, Llama 3, and Mistral offer promise for resource-constrained settings by dramatically reducing computational requirements and costs while achieving performance comparable to proprietary models (Ong et al., Nature Health, 2026).

DeepSeek in Chinese hospitals:

DeepSeek, developed under hardware access limitations, has been deployed in over 300 healthcare facilities in China since January 2025 (Chen et al., Journal of Medical Systems, 2025). Applications span clinical decision support, patient communication, and hospital administration. In ophthalmology benchmarking, DeepSeek-R1 achieved performance equivalent to OpenAI o1 on 300 clinical cases across 10 subspecialties, with estimated cost of only 6.71% of the proprietary model.

The “too fast, too soon” warning:

Chinese medical researchers have raised substantial concerns about rapid deployment without adequate clinical validation. A JAMA research perspective led by Zeng and Wong warns that DeepSeek’s tendency to generate “plausible but factually incorrect outputs” could lead to “substantial clinical risk” when deployed at scale without rigorous prospective validation (Zeng et al., JAMA, 2025).

The dual-edged reality:

Advantage	Risk
Cost efficiency: 6.71% of proprietary model costs	Safety monitoring gaps: Open-weight models lack built-in guardrails of commercial systems
Local deployment: Runs on institutional servers, enhancing data sovereignty	Integration complexity: Requires technical expertise for deployment and maintenance
No vendor lock-in: Avoids dependency on foreign commercial platforms	Version control challenges: No centralized updates; each deployment potentially different
Customization potential: Can be fine-tuned on local data and languages	Hallucination risks: Same fundamental limitations as proprietary LLMs, but with less safety testing

Clinical implication:

Open-weight LLMs offer genuine opportunity to reduce cost barriers in LMICs, but deployment must include rigorous local validation, safety monitoring, and clinical oversight equivalent to proprietary systems. The 300-hospital deployment in China represents large-scale adoption without the prospective clinical trials that would be required in most high-income regulatory environments.

LLM-Enhanced Global Health Applications

DeepDR-LLM: Hybrid AI for Diabetes Care

A multimodal system combining image-based deep learning with language models demonstrates how LLMs can augment primary care capacity in resource-limited settings.

System design:

DeepDR-LLM integrates two components (Li et al., Nature Medicine, 2024):

DeepDR-Transformer: Image-based screening for diabetic retinopathy
LLM module: Personalized diabetes management recommendations for primary care physicians

Training data:

The system was trained on 371,763 real-world management recommendations from 267,730 participants, providing context-specific guidance adapted to Chinese primary care settings.

Prospective validation results:

In a prospective study comparing patients under unassisted primary care physicians (n=397) versus those with PCP + DeepDR-LLM support (n=372):

Medication adherence: Patients with newly diagnosed diabetes in the PCP+DeepDR-LLM arm showed significantly better self-management behaviors throughout follow-up (p<0.05)
Diabetic retinopathy referrals: For patients with referable DR, those in the PCP+DeepDR-LLM arm were more likely to adhere to referrals (p<0.01)
Diagnostic accuracy: Average PCP accuracy for identifying referable DR increased from 81.0% unassisted to 92.3% with DeepDR-Transformer assistance

Key insight:

Hybrid systems combining task-specific AI (image analysis) with general-purpose LLMs (clinical guidance) may offer more reliable support than LLMs alone, reducing hallucination risks while maintaining personalization capabilities.

MomConnect Enhanced with LLMs

The MomConnect SMS chatbot in South Africa (see Part 2 case study) has evolved to incorporate LLM capabilities for more sophisticated triage. The platform now leverages LLMs to flag urgent health enquiries and reduce the number of unresolved pressing issues, while maintaining the low-tech SMS infrastructure that enabled 4 million+ user reach (Ong et al., Nature Health, 2026).

Implementation approach:

Base system remains SMS: Preserves accessibility for users with basic feature phones
LLM layer for triage: Natural language processing identifies urgency signals in patient messages
Human escalation pathway: Urgent cases flagged for immediate nurse review
Maintains simplicity for users: No change to patient experience; complexity absorbed by backend systems

Why this hybrid approach succeeds:

The system leverages LLM capabilities where they add value (understanding natural language, detecting urgency patterns) while avoiding LLM weaknesses (autonomous clinical decisions, internet dependency). The human-in-the-loop design catches LLM errors before they reach patients.

Transformer-Based Malaria Detection

Transformer architectures similar to those underlying LLMs have been applied to smartphone-based malaria detection from blood smears, providing scalable alternatives to conventional computer vision approaches (Liu et al., Patterns, 2023).

Technical innovation:

The AIDMAN system uses transformer models optimized for mobile deployment, achieving 98% accuracy in controlled settings on microscopy images captured via smartphone cameras with clip-on lenses.

Deployment challenge:

As with the Thailand malaria AI failure documented in Part 1, performance in real-world border clinics has been more variable. The lesson remains: laboratory validation must be followed by prospective field testing before scale-up.

AfriMed-QA: Addressing the Evaluation Gap

The problem:

Most medical AI benchmarks (USMLE, MedQA) are developed from Western medical education contexts, using disease patterns, treatment options, and healthcare infrastructures common in high-income countries. LLMs that perform well on these benchmarks may fail when applied to African or other LMIC healthcare contexts.

The solution:

AfriMed-QA is the first large-scale pan-African, multi-specialty medical question-answer dataset designed to evaluate LLM performance in contexts relevant to African healthcare (Olatunji et al., ACL 2025).

Dataset composition:

~15,000 questions spanning 32 clinical specialties
Contributors: 621 medical professionals from over 60 medical schools across 16 African countries
Question types: Expert multiple-choice questions (4,000+), short-answer questions (1,200+), and consumer health queries (10,000)
Recognition: Awarded Best Social Impact Paper at ACL 2025

Why this matters:

When 30 different LLMs (large, small, open-weight, proprietary, biomedical-specific, and general-purpose) were evaluated using AfriMed-QA, performance patterns differed substantially from Western benchmarks. Models that excelled on USMLE showed weaker performance on Africa-specific questions, revealing gaps in knowledge about:

Endemic infectious diseases (malaria, schistosomiasis, trypanosomiasis)
Resource-adapted treatment protocols
Traditional medicine interactions
Local drug formularies and availability
Cultural and linguistic considerations in patient communication

Clinical application:

Before deploying any LLM in African healthcare settings, validation against AfriMed-QA or similar context-specific benchmarks is essential. Performance on Western medical exams does not guarantee performance in LMIC contexts.

Access:

The dataset is publicly available at afrimedqa.com and through the GitHub repository, enabling local researchers to evaluate and fine-tune LLMs for their specific contexts.

Environmental Justice and LLM Sustainability

The hidden cost of computational intensity:

Training and deploying LLMs consume vast computational resources, with environmental impacts that disproportionately affect LMICs even when the technology is developed and deployed primarily in high-income countries.

Energy and emissions profile:

Training emissions: A single large model training run can emit hundreds of tons of CO₂ equivalent
Inference costs: While individual queries consume relatively little energy, cumulative daily use across healthcare applications (documentation, triage, decision support) scales dramatically
Water consumption: Data centers require substantial water for cooling; water scarcity is more acute in many LMIC regions
Hardware lifecycle: Rare-earth mining for GPUs and electronic waste disposal create environmental burdens concentrated in resource-extraction regions

The equity dimension:

LMICs contribute minimally to AI development emissions but bear disproportionate climate impacts. As healthcare systems in high-income countries adopt LLM-based documentation, clinical decision support, and administrative tools at scale, the cumulative carbon footprint grows while benefits accrue primarily to well-resourced settings.

Sustainable deployment strategies:

Edge deployment over cloud: Running LLMs locally on institutional servers reduces data transfer energy costs and supports data sovereignty
Model efficiency: Smaller, task-optimized models rather than general-purpose large models where appropriate
Renewable energy: Data centers powered by solar, wind, or hydroelectric rather than fossil fuels
Shared infrastructure: Regional data centers serving multiple countries, reducing redundant computational capacity

Policy implication:

As LLMs are evaluated for global health applications, environmental sustainability should be explicit criterion, alongside clinical effectiveness and cost-effectiveness. The most sustainable AI solution may be the simplest one that achieves clinical objectives, not the most sophisticated.

Global Governance and Coordination Initiatives

WHO Global Initiative on AI for Health (GI-AI4H):

Launched in July 2023 by WHO, the International Telecommunication Union (ITU), and the World Intellectual Property Organization (WIPO), GI-AI4H provides an institutional framework for coordinating responsible AI development and deployment globally (WHO/ITU/WIPO, 2023).

Strategic focus areas:

Standards development: International standards and normative guidance for AI evaluation, ethics, clinical validation, and benchmarking
Knowledge transfer: Facilitating data sharing, collaboration, and best practice dissemination among stakeholders worldwide
Health system strengthening: Prioritizing low- and middle-income countries through a scaling program initially targeting 12-18 countries with relevant AI use cases

Lancet Global Health Commission on AI and HIV:

This Commission synthesizes evidence on AI’s economic and health impacts across different settings, with explicit focus on guiding responsible AI model development and creating actionable guidance for stakeholders in regulation and adoption (Lancet Global Health, ongoing).

Gates Foundation AI equity initiatives:

Philanthropic funding targeting AI equity, language inclusivity, and ensuring equitable access to AI benefits in all settings. Projects focus on developing language technologies for under-resourced languages and supporting local capacity building (Gates Foundation, 2024-2026).

Implication for physicians:

These frameworks provide actionable guidance for institutions deploying LLMs in global health contexts. Rather than navigating regulatory ambiguity alone, leveraging WHO standards, Lancet Commission recommendations, and philanthropic partnership opportunities can accelerate responsible implementation.

Practical Guidance for LLM Deployment in LMICs

Pre-deployment checklist:

Before deploying any LLM system in resource-limited settings:

Validate on context-specific benchmarks (AfriMed-QA for Africa; similar datasets for Asia, Latin America where available)
Assess infrastructure requirements (internet connectivity, computational capacity, electricity reliability)
Evaluate language support (does LLM support local languages and dialects, or only English?)
Verify safety mechanisms (how does system handle uncertainty, avoid hallucinations, escalate to humans?)
Establish clinical oversight (who reviews LLM outputs before clinical action?)
Plan for sustainability (who maintains system when external funding ends? what is long-term cost structure?)
Address data sovereignty (where is patient data processed and stored? does this comply with local regulations?)
Measure environmental impact (what are energy/water requirements? can renewable energy support deployment?)

Red flags warranting rejection:

LLM vendor cannot demonstrate performance on LMIC-specific benchmarks
System requires always-on internet connectivity in settings with unreliable access
No mechanism for clinical oversight of LLM outputs
Deployment plan lacks sustainable financing beyond donor pilot period
Patient data will be processed on foreign servers without clear data protection agreements
Vendor dismisses environmental impact questions or lacks sustainability data

Green flags supporting adoption:

Prospective validation in deployment setting (not just Western benchmarks)
Offline/edge deployment capability for low-connectivity environments
Human-in-the-loop design with clear escalation pathways
Open-weight model with local customization and fine-tuning potential
Integration with existing health information systems and workflows
Government or institutional ownership with budget commitment
Explicit environmental sustainability assessment

The Path Forward: Co-Development Not Deployment

The extractive model (what to avoid):

HIC institution develops LLM on Western data
Pilots system in LMIC with external funding
Publishes papers on “global health AI”
Funding ends, system abandoned
No local capacity remains

The co-development model (what to pursue):

Joint problem definition: LMIC and HIC partners identify priority use cases together
Shared data governance: Training data includes LMIC contexts, with local ownership and benefit-sharing
Capacity building from inception: Local researchers co-lead development, not just deployment
Prospective validation: Local clinical validation before scale-up
Sustainable financing: Government integration and budget commitment, not donor dependency
Knowledge transfer: Local teams can maintain, improve, and adapt systems independently

Physician role:

Physicians in high-income countries evaluating LLM vendors or research partnerships should demand evidence of co-development, equitable partnerships, and sustainable implementation rather than extractive “deploy and abandon” models. Physicians in LMICs should advocate for local ownership, capacity building, and benefit-sharing rather than accepting passive recipient roles.

Check Your Understanding

Scenario 1: Deploying Western AI in LMIC Without Validation

You’re an internist working with Doctors Without Borders (MSF) in rural Democratic Republic of Congo (DRC). Hospital receives donated AI chest X-ray system (FDA-cleared in U.S.) for pneumonia detection.

AI specifications: - Trained on 200,000 U.S. chest X-rays (95% from academic medical centers) - Validation: Sensitivity 92%, specificity 88% on U.S. test set - FDA 510(k) cleared (2022)

Your context: - DRC hospital serves population with high HIV prevalence (8%), TB (350 per 100,000), malnutrition (18% of adults underweight) - X-ray machine: 15-year-old analog system (vs. digital systems in U.S. training data) - No on-site radiologist (you interpret X-rays with basic training)

You deploy AI as primary pneumonia screening tool (all patients with respiratory symptoms get CXR + AI interpretation).

3 months later: Retrospective chart review by visiting radiologist identifies problems: - AI sensitivity for TB: 58% (missed 42% of culture-proven TB) - AI sensitivity for Pneumocystis jirovecii pneumonia (PCP, in HIV+ patients): 34% - AI false positive rate: 35% (vs. 12% in U.S. validation)

Question 1: Why did AI fail in DRC?

Training data mismatch: 1. Disease prevalence: U.S. training data had <1% TB, <0.1% PCP; DRC has 30%+ TB, 5%+ PCP in respiratory presentations 2. Patient characteristics: U.S. training data mostly non-HIV, normal BMI; DRC population 8% HIV+, 18% underweight (atypical radiographic findings) 3. Image quality: AI trained on digital X-rays; DRC analog system produces lower resolution, different artifact patterns 4. Co-morbidities: AI learned “pneumonia” from U.S. bacterial pneumonia; missed opportunistic infections (PCP, TB) uncommon in training data

Question 2: What harm occurred?

Clinical impact: - 42% of TB missed, leading to delayed diagnosis, transmission to contacts, and TB mortality - 66% of PCP missed, resulting in HIV+ patients with untreated PCP and respiratory failure (PCP mortality 30-50% if untreated) - 35% false positives, leading to unnecessary antibiotics, costs, and patient anxiety

Trust impact: - Hospital staff lost confidence in AI: “The American computer doesn’t work here” - Some stopped using AI, others over-relied on negative results

Question 3: Are you liable?

Possibly yes. Key legal/ethical issues:

Standard of care: Even in resource-limited settings, physicians must provide care meeting local standards. - Deploying unvalidated AI without understanding limitations equals negligence

Informed consent: Did you inform patients that AI was not validated in DRC population?

FDA clearance ≠ global clearance: FDA clearance based on U.S. populations; doesn’t guarantee performance in LMICs

Plaintiff (patient or family) argument: - Physician deployed unvalidated AI - AI missed TB/PCP, patient died from delayed treatment - Standard of care requires validation in deployment population OR not using AI for high-stakes decisions

Defense argument: - Resource-limited setting, no radiologist available - AI was FDA-cleared, represented “best available tool” - Physician acted in good faith with limited resources

Lesson: 1. FDA clearance ≠ universal validity. Validate in deployment population 2. High-risk populations (HIV, malnutrition, endemic diseases) require local validation 3. If validation impossible, use AI as adjunct (not replacement) for clinical judgment 4. Document limitations: “AI chest X-ray interpretation not validated in this population; clinical judgment takes precedence”

Scenario 2: Data Partnership with Benefit-Sharing Failure

You’re chief of medicine at teaching hospital in Nairobi, Kenya. U.S. university approaches with proposal:

Proposal: “We’re developing AI for early sepsis detection. We need diverse African data to make our model generalizable. Can you provide 50,000 patient records (ICU admissions, vitals, labs, outcomes)? In return, we’ll co-author you on publications and acknowledge your hospital.”

You agree. Hospital IT department exports 50,000 de-identified records, sends to U.S. university.

18 months later: - U.S. team publishes 3 papers in Nature Medicine, JAMA, Critical Care Medicine - Your hospital listed in acknowledgments (not co-authorship as promised) - AI commercialized: U.S. startup licenses technology, raises $25M Series A funding - Startup offers to sell AI back to your hospital: $50,000 annual licensing fee

Your hospital administration: “We provided the data for free, now they want us to pay $50K/year for our own data? This is exploitation.”

Question 1: What went wrong?

Classic data extraction: 1. Verbal agreement only: No written contract specifying co-authorship, IP rights, benefit-sharing 2. Data transfer without equity stake: Hospital gave away valuable asset (50K patient records) for vague promises 3. No benefit-sharing clause: U.S. team commercialized AI, Kenyan hospital received nothing 4. Insufficient oversight: Hospital didn’t involve legal team, tech transfer office before data sharing

Question 2: Do you have legal recourse?

Weak legal position: - No written contract → hard to enforce verbal promises - Data was “de-identified” → hospital may not have ownership claim - Kenya data protection laws (2019) were new, untested in court - U.S. university likely claims IP ownership (researchers developed AI)

Possible arguments: - Breach of verbal contract (co-authorship promised, not delivered) - Unjust enrichment (U.S. team profited from Kenyan data without compensation) - Data sovereignty violation (Kenya Data Protection Act requires data localization for processing)

Likely outcome: Costly litigation with uncertain results, or negotiated settlement (hospital gets discounted AI access, not equity/revenue share)

Question 3: What should you have done differently?

Before sharing data:

Written data-sharing agreement specifying: - Co-authorship: Kenyan researchers as co-first/co-corresponding authors on all publications - IP ownership: Shared IP rights (hospital owns data, university owns algorithm, jointly own AI) - Benefit-sharing: If commercialized, hospital receives equity stake (5-10%) or royalties (10-15% of revenues) - Free access: Resulting AI provided to hospital at no cost - Capacity-building: U.S. team trains 3-5 Kenyan researchers in AI development - Data governance: Data hosted in Kenya (or federated learning, no data export)

Institutional approvals: - Legal team review - Ethics committee approval - Tech transfer office involvement (protect IP) - Data protection officer review (Kenya Data Protection Act compliance)

Pilot first: Start with 1,000 records, evaluate partnership quality before sharing 50,000

Lesson: Data is valuable. Demand equitable partnerships, not exploitation. LMIC hospitals should negotiate from position of strength (you have data they need, so demand fair value).

Scenario 3: AI Exacerbating Health Disparities

You’re public health official in Ministry of Health, Bangladesh. Government considering national rollout of AI-powered telemedicine platform for rural primary care clinics.

Platform features: - Smartphone app (patients video-call physician + AI clinical decision support) - AI provides differential diagnosis, treatment recommendations based on symptoms + patient history - Targets rural areas with physician shortages (1 physician per 50,000 people)

Deployment plan: - Distribute subsidized smartphones ($100 each) to 100,000 rural households - Train 500 community health workers to assist patients using app - Budget: $15 million over 3 years

12 months post-launch, evaluation shows:

Utilization by wealth quintile:

Wealth Quintile	% Using Telemedicine	Traditional Clinic Use	Total Healthcare Access
Richest 20%	62%	45%	107% (using both)
Middle 60%	28%	38%	66%
Poorest 20%	9%	22%	31%

Digital divide drivers: - Poorest 20% often lack literacy (41% illiterate) and struggle with smartphone app despite CHW assistance - Smartphone distribution focused on households with electricity (excluded 38% of poorest quintile) - Data costs (mobile internet): $2-5/month equals 5-10% of poorest households’ income, making it unaffordable despite subsidized smartphones - Language: App in Bengali only, excluded 12% of population speaking minority languages (Chittagonian, Sylheti, tribal languages)

Health equity impact: - Richest quintile: Healthcare access increased 7% (using telemedicine in addition to clinics) - Poorest quintile: Healthcare access decreased 9% (traditional clinics closed due to “telemedicine coverage”, but poorest can’t access telemedicine)

Result: AI widened health disparities by 16 percentage points between richest and poorest.

Question 1: Why did telemedicine worsen equity?

Inverse care law (Julian Tudor Hart, 1971): “The availability of good medical care tends to vary inversely with the need for it in the population served.”

Applied to AI: - Telemedicine benefited those with smartphones, literacy, internet, electricity (already better-off) - Excluded those lacking these resources (poorest, who need healthcare most) - Clinic closures harmed poorest (who depended on in-person care) while richest benefited from telemedicine

Question 2: What should have been done differently?

Equity-focused design:

Universal access pre-requisites: - Ensure electricity, internet, smartphones, literacy BEFORE telemedicine rollout - OR: Design for low-tech (SMS, voice calls, not video; feature phones, not smartphones)

Hybrid model: - Telemedicine supplements clinics (not replaces) - Maintain in-person care for those who can’t access digital tools

Targeted support for poorest: - Free data plans (not just subsidized phones) - Voice-based apps (for illiterate users) - Minority language support - CHW home visits for those unable to use technology

Equity monitoring: - Track utilization by wealth, literacy, language from Month 1 - If disparities emerge, pause deployment and redesign

Question 3: How to fix the program now?

Immediate actions:

Reopen closed clinics in poorest areas (don’t rely solely on telemedicine)
Free data plans for poorest quintile (government subsidy to mobile operators)
Voice-based app version for illiterate users
Multilingual support (Chittagonian, Sylheti, tribal languages)
CHW-assisted telemedicine: CHWs help poorest households access platform (human-in-the-loop)

Lesson: AI can widen disparities if designed for already-privileged populations. Equity requires intentional design for most marginalized, not just “average” users. Monitor equity impacts from start and correct course quickly.

Key Takeaways

Validation ≠ Validation: AI validated in HIC populations often fails in LMICs (10-30% performance degradation common). Require local validation before deployment.
Low-Tech Often Better: MomConnect (SMS chatbot) achieved 4M+ users, 15:1 ROI. High-tech smartphone apps often fail in LMIC settings. Match technology to infrastructure.
Data Extraction ≠ Partnership: Demand benefit-sharing (equity, royalties, free access) when sharing LMIC data for AI development. Reject exploitation.
Algorithmic Bias Is Real: AI trained on Western populations shows 10-50% worse performance on underrepresented groups. Diversify training data, conduct bias audits.
Infrastructure Matters: 770M lack electricity, 2.8B lack internet. Design for offline, low-power, robust-to-poor-data-quality if targeting global health.
Equity Requires Intention: AI follows “inverse care law” and benefits privileged populations unless explicitly designed for marginalized groups. Monitor equity impacts from Day 1.
Capacity-Building > Extraction: Train local AI researchers, leave sustainable infrastructure. Extractive partnerships perpetuate dependency.
Physician Advocacy Role: Demand equitable AI development, reject unvalidated deployments, support open-source tools, partner with LMIC institutions as equals.