Obstetrics and Gynecology

OBGYN AI navigates two-patient scenarios where maternal and fetal interests may compete, health disparities where Black women face 2-3x higher maternal mortality, and medico-legal environments where obstetric malpractice rates exceed all other specialties. AI applications must address these realities while tackling fetal monitoring interpretation with 30-50% inter-observer agreement, preterm birth prediction where effective interventions remain elusive, and screening tools requiring nuanced counseling about positive predictive values.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI systems for fetal monitoring and pregnancy risk prediction
  • Understand AI applications in prenatal screening and ultrasound interpretation
  • Assess AI tools for cervical and breast cancer screening
  • Navigate ethical challenges specific to maternal-fetal medicine
  • Identify failure modes in obstetric AI (high-stakes, two-patient scenarios)
  • Recognize equity concerns in women’s health AI
  • Apply evidence-based frameworks for OBGYN AI adoption

The Clinical Context: OBGYN AI navigates two-patient scenarios, significant racial disparities (Black women face 2-3x higher maternal mortality), and the highest malpractice rates in medicine. AI must address these realities while proving clinical benefit beyond prediction accuracy.

Key AI Applications:

Obstetrics:

  • Fetal Monitoring (CTG): Sensitivity 85-95% for fetal compromise, but no RCT shows improved neonatal outcomes. Risk of automation bias increasing cesarean rates. Not ready for routine use.
  • Preterm Birth Prediction: AUC 0.70-0.80, but PPV only 10-20%. Limited actionable interventions. Does not address structural causes of disparities.
  • NIPT/Prenatal Screening: Well-validated for Trisomy 21 (>99% sensitivity). Critical to counsel on PPV for rarer conditions. Screening, not diagnostic.
  • Preeclampsia Prediction: AUC 0.85-0.90 with effective intervention available (aspirin). Most promising obstetric application.
  • PPH Prediction: Modest improvement (AUC 0.70-0.75). Many hemorrhages occur in low-risk women.

Gynecology:

  • Cervical Cancer Screening: Well-validated AI cytology and colposcopy. Greatest promise in low-resource settings.
  • Breast/Ovarian Cancer Risk: Improves risk stratification for screening intensity. Do not use for general ovarian cancer screening (USPSTF recommends against).
  • Surgical Planning: AI endometriosis detection 90-95% accurate. Useful adjunct for surgical planning.

Critical Concerns:

  • Equity: Training data underrepresents minority women. Algorithms may perpetuate disparities.
  • Ethics: Maternal autonomy paramount. AI provides information, not directives.
  • Validation: Demand prospective trials with outcome improvement, diverse population validation, and subgroup performance reporting.

Bottom Line: Preeclampsia prediction and cervical cancer screening show most promise. Fetal monitoring AI lacks outcome evidence. All OBGYN AI must be validated across racial/ethnic groups and preserve maternal autonomy.

Introduction

OBGYN AI faces unique challenges that distinguish it from other medical specialties. The two-patient nature of obstetric care means AI recommendations may optimize fetal outcomes at maternal expense, raising fundamental questions about whose interests the algorithm serves. Maternal mortality disparities are stark: Black women are 2-3x more likely to die from pregnancy-related causes than white women (Petersen et al., 2019), a disparity that persists across education and income levels. The medico-legal environment adds additional complexity, with obstetric malpractice rates exceeding all other specialties.

AI applications in this field must navigate these realities while addressing genuine clinical needs: fetal monitoring interpretation with inter-observer agreement of only κ=0.30-0.50, preterm birth prediction where effective interventions remain limited, and screening tools requiring nuanced counseling about positive predictive values. This chapter evaluates current evidence for OBGYN AI applications and provides frameworks for assessing whether specific tools are ready for clinical adoption.

Obstetric AI Applications

Fetal Monitoring and Interpretation

Electronic Fetal Monitoring (EFM) Interpretation:

Clinical problem: Cardiotocography (CTG) monitoring is universal in US labor management, but interpretation is highly subjective. Inter-observer agreement κ=0.30-0.50 (poor) (Blackwell et al., 2011).

AI approaches:

  • Automated CTG interpretation
  • Fetal heart rate pattern recognition
  • Prediction of fetal acidemia/hypoxia

Evidence:

Major limitations:

  • High false positive rates leading to increased cesarean sections
  • No RCT showing improved outcomes (Apgar scores, cord pH, neonatal morbidity)
  • Risk of automation bias (providers over-relying on AI categorizations)

Cochrane review conclusion: Insufficient evidence that computer analysis of CTG improves perinatal outcomes (Grivell et al., 2015). May increase intervention rates without clear benefit.

Despite high sensitivity for detecting fetal compromise (85-95%), no RCT has shown that automated CTG interpretation actually improves Apgar scores, cord pH, or neonatal morbidity (Grivell et al., 2015; Comert et al., 2018). The risk is automation bias leading to unnecessary cesarean sections. Not ready for routine clinical use until prospective trials demonstrate better outcomes, not just pattern recognition.

Preterm Birth Prediction

Clinical problem: Preterm birth affects 10% of US pregnancies (Martin et al., 2022) and is the leading cause of neonatal morbidity/mortality.

Traditional screening:

  • Cervical length ultrasound
  • Fetal fibronectin testing
  • Clinical history

AI enhancement:

  • Integrate EHR data (demographics, medical history, labs, medications)
  • Predict spontaneous preterm birth <37, <34, <28 weeks

Evidence:

Limitations:

  • Positive predictive value low (10-20%) due to low prevalence
  • Unclear how to act on predictions (progesterone, cerclage only effective in specific subgroups)
  • Social determinants of health (stress, racism, housing instability) not captured in EHR
  • Racial disparities in preterm birth (Black women 1.5x rate) (March of Dimes, 2015) not explained by medical factors

The research is promising (Arabi Belaghi et al., 2021), but clinical utility remains limited. With PPV only 10-20%, many false positives will be identified. Worse, effective interventions are lacking for most high-risk women. These algorithms do not address the root causes of preterm birth disparities: structural racism, chronic stress, and social determinants that do not appear in EHR data (Mailath-Pokorny et al., 2023).

Prenatal Genetic Screening and Ultrasound

Cell-Free DNA Screening (NIPT) Enhanced Reporting:

Application: AI analysis of sequencing data for aneuploidy detection

  • Trisomy 21, 18, 13 detection
  • Sex chromosome abnormalities
  • Microdeletion syndromes (emerging)

Evidence:

Important caveats:

  • NIPT is screening, not diagnostic (amniocentesis/CVS for confirmation)
  • False positives occur (especially low-prevalence conditions)
  • Incidental findings (maternal malignancy) require counseling infrastructure
  • Equity concern: Expensive ($500-2000), not always covered by insurance

NIPT is a well-validated screening tool (Norton et al., 2015), and AI enhancements improve detection of rare aneuploidies (Egbert et al., 2022). But counseling about limitations is essential, especially positive predictive values for rarer conditions. See the clinical scenarios below for why this matters.

Automated Fetal Ultrasound Analysis:

Applications:

  • Nuchal translucency measurement
  • Fetal biometry (head circumference, abdominal circumference, femur length)
  • Anatomic survey automated views and measurements
  • Placental localization

Evidence:

Limitations:

  • Image quality dependent (maternal habitus, fetal position)
  • Rare anomalies may be missed
  • Does not replace skilled sonographer/perinatologist interpretation
  • Liability concerns: who is responsible for missed anomalies?

AI-assisted biometry (Namburete et al., 2015) and anatomic landmark detection (Baumgartner et al., 2017) improve standardization and efficiency. But this cannot replace human expertise for comprehensive fetal assessment, and the liability question remains unresolved when AI misses a critical anomaly.

Maternal Risk Prediction

Preeclampsia Prediction Models:

Traditional screening:

  • First trimester: maternal factors + PAPP-A + PlGF
  • Fetal Medicine Foundation algorithm

AI enhancement:

  • ML models integrating clinical + biochemical + ultrasound data
  • Predict early-onset (<34 weeks) vs. late-onset preeclampsia

Evidence:

  • ML models AUC 0.85-0.90 for early-onset preeclampsia (Tan et al., 2018)
  • Better discrimination than traditional models
  • Published in Ultrasound in Obstetrics & Gynecology (Tan et al., 2018)
  • Prospective validation ongoing (ASPRE trial derivatives)

Clinical utility:

  • Aspirin prophylaxis reduces preeclampsia risk by 50-60% in high-risk women (Rolnik et al., 2017)
  • AI identifies women who benefit most from prophylaxis

This is promising. ML models achieve better discrimination than traditional algorithms (Tan et al., 2018), and an effective intervention (aspirin) exists for high-risk women. Improved risk stratification could enable truly targeted prevention. Prospective validation and cost-effectiveness analysis still needed.

Postpartum Hemorrhage (PPH) Prediction:

Challenge: PPH is the leading cause of maternal mortality globally and difficult to predict.

AI approaches:

  • Real-time prediction during labor
  • Integrate vitals, labs, medications, obstetric factors

Evidence:

Limitations:

  • Many PPH cases occur in low-risk women (unpredictable)
  • Interventions (uterotonics, surgical preparedness) already standard
  • Alert fatigue if predictions not actionable

Modest improvement over clinical risk factors (Venkatesh et al., 2020), but clinical benefit uncertain. The fundamental challenge: many hemorrhages occur in women with no identifiable risk factors, and prophylactic uterotonics are already routine. Research ongoing, but actionability remains the issue.

Gynecologic AI Applications

Cervical Cancer Screening

AI-Assisted Cytology and HPV Testing:

Traditional screening:

  • Pap smear cytology
  • HPV DNA testing
  • Co-testing strategies (ASCCP guidelines)

AI enhancement:

  • Automated cytology interpretation
  • HPV genotyping risk stratification
  • Colposcopy image analysis

Evidence:

  • Automated Pap cytology: Sensitivity 85-95% for HSIL (Bao et al., 2023)

  • Reduces false negatives by 10-20%

  • Published in Cancer Cytopathology (Bao et al., 2023)

  • AI colposcopy: Detects CIN2+ with sensitivity 90-95% (Hu et al., 2019)

  • Published in Journal of Lower Genital Tract Disease (Hu et al., 2019)

  • Particularly valuable in low-resource settings lacking cytopathology infrastructure

WHO pilot studies:

  • AI-based visual inspection with acetic acid (VIA) screening in low-middle income countries
  • Sensitivity comparable to expert clinicians (Xue et al., 2020)
  • Could improve access to screening where resources limited

Automated Pap cytology (Bao et al., 2023) and AI colposcopy (Hu et al., 2019) are well-validated adjuncts to cervical cancer screening. The real promise is improving access in under-resourced settings that lack cytopathology and colposcopy infrastructure. This is where AI could genuinely reduce global cancer mortality.

Breast and Ovarian Cancer Risk Prediction

Breast Cancer Risk Models Enhanced with AI:

Traditional models: Gail, Tyrer-Cuzick, BRCAPRO

AI enhancement:

  • Integrate mammographic density, SNPs, family history, reproductive factors
  • Polygenic risk scores

Evidence:

  • ML models improve breast cancer prediction AUC 0.65-0.70 (Yala et al., 2019)
  • Published in Radiology (Yala et al., 2019)
  • Identifies women who may benefit from enhanced screening (MRI, tomosynthesis)

ML models integrating mammographic density, polygenic risk scores, and reproductive factors (Yala et al., 2019) can help personalize screening recommendations. Use this to inform shared decision-making about screening intensity, not to replace the conversation.

Ovarian Cancer Early Detection:

Challenge: Ovarian cancer is often diagnosed at late stage. Screening strategies (CA-125, ultrasound) have high false positive rates.

AI approaches:

  • Multimarker panels analyzed with ML
  • Ultrasound-based ovarian mass characterization

Evidence:

  • O-RADS (Ovarian-Adnexal Reporting and Data System) with AI improves malignancy prediction (Cao et al., 2022)
  • Reduces unnecessary surgeries for benign masses
  • Published in Radiology (Cao et al., 2022)

Limitation: No screening strategy (including AI-enhanced) has been shown to reduce ovarian cancer mortality in average-risk women. USPSTF recommends against screening (Henderson et al., 2018).

AI can help characterize known ovarian masses (Cao et al., 2022), potentially reducing unnecessary surgeries for benign findings. But do not use this for general population screening. USPSTF recommendation against screening stands, regardless of AI enhancement.

Surgical Planning

Endometriosis Detection and Mapping:

AI applications:

  • MRI-based endometriosis detection
  • Surgical planning for deep infiltrating endometriosis
  • Predicting surgical complexity

Evidence:

  • Deep learning models detect endometriomas with 90-95% accuracy (Andres et al., 2020)
  • Published in European Radiology (Andres et al., 2020)
  • Helps surgeons plan approach and counsel patients about surgical complexity

Myomectomy/Hysterectomy Planning:

  • AI analysis of fibroid size, location, vascularity
  • Predicts surgical approach (laparoscopic vs. abdominal)
  • Estimates blood loss risk

Evidence: Limited but promising feasibility studies. Deep learning models detect endometriomas with 90-95% accuracy (Andres et al., 2020), helping surgeons plan approaches for complex deep infiltrating endometriosis. Useful adjunct for surgical planning, though clinical judgment remains central.

Equity and Racial Bias in OBGYN AI

Maternal Health Disparities and Algorithmic Bias

Documented Disparities in Maternal Outcomes:

1. Maternal Mortality:

  • Black women 2-3x more likely to die from pregnancy-related causes (Petersen et al., 2019)
  • Published in MMWR by CDC (Petersen et al., 2019)
  • Persists across education and income levels
  • Driven by structural racism, implicit bias, access to quality care

2. Preterm Birth:

  • Black women 50% higher rate of preterm birth (March of Dimes, 2015)
  • Not explained by traditional medical risk factors
  • Chronic stress from racism implicated (weathering hypothesis)

3. Cesarean Section Rates:

  • Black and Hispanic women have higher cesarean rates after controlling for medical indications (Edmonds et al., 2013)

How AI Can Worsen Disparities:

Training Data Bias:

  • Most datasets overrepresent white, insured women
  • Minority women underrepresented or data incomplete
  • Algorithms learn patterns that don’t generalize

Examples:

  • Preterm birth prediction models trained predominantly on white women may underperform in Black women
  • Preeclampsia risk calculators may misclassify Hispanic women (different biomarker distributions)
  • Fetal monitoring algorithms may have different accuracy across racial groups (not well studied)

Pulse Oximetry Bias:

  • Overestimates oxygen saturation in dark skin by 2-3% (Sjoding et al., 2020)
  • AI relying on pulse ox data inherits this measurement bias
  • Affects intrapartum monitoring of mothers and neonates

Mitigation Strategies:

  • Require diverse datasets for training and validation
  • Stratify performance reporting by race/ethnicity
  • Address social determinants of health in models
  • Engage community members in AI development
  • Monitor for bias after deployment
  • Invest in addressing root causes of disparities, not just prediction

Ethical Challenges in Maternal-Fetal Medicine

Two-Patient Dilemmas in Obstetric AI

Maternal-Fetal Conflict:

  • AI recommendations may optimize fetal outcomes at maternal expense (e.g., early cesarean delivery)
  • Maternal autonomy must be preserved
  • Cannot treat fetus as independent patient against maternal wishes

Example Scenario:

  • AI predicts 30% risk of stillbirth if pregnancy continues
  • Recommends immediate cesarean delivery at 35 weeks
  • Mother prefers expectant management to avoid surgery and prematurity risks
  • Ethically: Mother’s decision prevails (informed refusal)
  • Legally: Cannot compel cesarean delivery

Informed Consent Challenges:

  • How to communicate AI-generated risk predictions?
  • Uncertainty in predictions (confidence intervals rarely provided)
  • Risk of coercing decisions through statistical intimidation

Liability Concerns:

  • If AI predicts complication and physician doesn’t act, malpractice risk?
  • If AI wrong and intervention causes harm, who is liable?
  • Documentation burden: Must explain AI role and rationale for agreement/disagreement

Reproductive Autonomy:

  • Prenatal screening AI may influence pregnancy termination decisions
  • Access to screening differs by geography, insurance (equity)
  • Disability rights advocates concerned about selective termination

Principles for Ethical Obstetric AI:

  1. Maternal autonomy paramount
  2. AI provides information, not directives
  3. Shared decision-making framework
  4. Transparent communication of uncertainty
  5. Respect diverse values and preferences
  6. Address disparities, don’t worsen them

Professional Society Guidelines

Society for Maternal-Fetal Medicine (SMFM)

SMFM has been increasingly active in AI applications for maternal-fetal medicine, with dedicated sessions at annual meetings addressing practical implementation.

SMFM 2025 Pregnancy Meeting Highlights:

  • Luncheon Roundtable: Practical Tips for Incorporating AI and Clinical Informatics Into an MFM Practice explored how AI and clinical informatics enhance documentation, clinical decision-making, and workflow efficiency in MFM settings.

  • AI for Postpartum Hemorrhage Prediction: Presented research on AI models using routinely collected clinical data to generate PPH risk predictions at admission, during labor, and after delivery. Trained and validated on data from 12,807 women (386 with severe PPH morbidity).

  • BrightHeart AI for Fetal Cardiac Screening: First major presentation of AI technology for prenatal congenital heart defect detection. BrightHeart received FDA 510(k) clearance for its first AI software product in November 2024.

SMFM-Featured AI Applications:

  • Congenital Heart Defect Detection: AI-assisted fetal echocardiography to improve detection rates of CHDs, which are missed in 50% of cases prenatally.

  • Maternal Risk Stratification: Integration of social determinants, clinical factors, and biomarkers for personalized risk assessment.

For current SMFM guidance and publications, see: smfm.org

American College of Obstetricians and Gynecologists (ACOG)

ACOG has published guidance on technology in obstetric care and endorses evidence-based AI applications while emphasizing maternal autonomy and informed consent.

Key ACOG Principles:

  1. Technology must serve patients’ best interests and respect autonomous decision-making

  2. Informed consent essential for AI-assisted care, especially prenatal screening

  3. Equity considerations paramount: AI must not worsen existing maternal health disparities

  4. Clinical judgment remains central: AI provides decision support, not replacement

ACOG Committee Opinion #640 explicitly addresses prenatal genetic screening, emphasizing that invasive diagnostic testing should be offered when NIPT indicates increased risk and that patients should be counseled that NIPT is a screening test.

For ACOG resources: acog.org

Clinical Guidelines for AI Adoption

ACOG Principles for AI in Women’s Health

Before Adopting OBGYN AI:

  1. Demand robust validation:
    • Prospective studies showing improved outcomes (not just prediction accuracy)
    • Validation in diverse populations (race, ethnicity, SES, geography)
    • Transparent reporting of performance by subgroups
  2. Assess impact on maternal autonomy:
    • Does this support or constrain patient decision-making?
    • How will recommendations be communicated?
    • Can patients decline AI-assisted care?
  3. Evaluate equity implications:
    • Will this widen or narrow disparities?
    • Is it accessible regardless of insurance/geography?
    • Does training data reflect population diversity?
  4. Consider medico-legal landscape:
    • What does malpractice carrier advise?
    • Documentation requirements?
    • Informed consent necessary?
  5. Ensure multidisciplinary review:
    • Obstetricians, midwives, nurses, patients, ethicists
    • Diverse perspectives on benefits and risks

Safe Implementation:

  1. Pilot testing in low-stakes scenarios first
  2. Enhanced informed consent process
  3. Clear protocols for discordance (AI says X, clinician thinks Y)
  4. Systematic bias monitoring (outcomes by race/ethnicity)
  5. Patient feedback mechanisms
  6. Regular performance audits

Red Flags (Avoid These Systems):

No validation in diverse populations Claims to replace clinical judgment in high-stakes decisions Black-box models without explanation Vendor resists equity audits Recommends interventions without evidence base Doesn’t account for patient preferences/values

Future Directions

Near-Term (2-5 years):

  • Enhanced preeclampsia screening with targeted prevention
  • AI-assisted cervical cancer screening in low-resource settings
  • Standardized fetal biometry and anatomic survey
  • Improved endometriosis detection on MRI

Medium-Term (5-10 years):

  • Integration of social determinants of health into risk prediction
  • Real-time intrapartum decision support (with appropriate safeguards)
  • Personalized cesarean delivery risk counseling
  • AI-guided fertility treatment optimization

Long-Term (10+ years):

  • Predictive models for pregnancy complications incorporating genetics, environment, social factors
  • Continuous remote monitoring for high-risk pregnancies
  • AI-assisted robotic gynecologic surgery (surgeon-in-the-loop)

Unlikely Despite Hype:

  • AI replacing clinical judgment for delivery timing/mode
  • Fully automated prenatal diagnosis
  • Elimination of health disparities through AI alone (requires addressing root causes)

Conclusion

AI in obstetrics and gynecology must navigate unique ethical challenges: two-patient scenarios, profound health disparities, and reproductive autonomy. While AI shows promise for improving prenatal screening, cervical cancer detection, and risk stratification, deployment must center maternal autonomy, address rather than worsen disparities, and prove clinical benefit beyond prediction accuracy.

Obstetricians and gynecologists should demand robust evidence, transparent algorithms, and equity analyses before adopting AI systems. The goal is not just more accurate predictions, but healthier mothers and babies, especially those from communities bearing disproportionate burdens of maternal morbidity and mortality.

As the American College of Obstetricians and Gynecologists emphasizes: Technology must serve patients’ best interests and respect their autonomous decision-making (ACOG, 2021).

Check Your Understanding

Scenario 1: NIPT False Positive and Pregnancy Termination

You’re an obstetrician managing prenatal care for a 34-year-old G1P0 woman at 12 weeks gestation.

Patient presents: Requesting non-invasive prenatal testing (NIPT) for aneuploidy screening. She has private insurance that covers the test. You counsel her about the test’s purpose (screening, not diagnostic) and she consents.

Results at 13 weeks: NIPT positive for Trisomy 18 (Edward syndrome). The lab report states “High probability of Trisomy 18. Diagnostic testing recommended.”

Your counseling: You explain that NIPT has high sensitivity (>99%) but this is still a screening test. You recommend diagnostic confirmation with amniocentesis. You provide information about Trisomy 18 prognosis (most pregnancies end in miscarriage or stillbirth; liveborn infants typically survive days to weeks, rarely months).

Patient’s decision: “I cannot continue this pregnancy knowing my baby has a fatal condition. I want to terminate. I don’t want an amniocentesis. The NIPT result is clear enough.”

You respond: “I understand this is devastating news. However, NIPT can have false positives, especially for Trisomy 18. The positive predictive value depends on your age and other factors. In women your age, PPV is approximately 50-60% for Trisomy 18. There’s a real chance this is a false positive and your baby is healthy.”

Patient: “50-60% is high enough. I cannot take the risk. Please refer me for pregnancy termination.”

You refer her to a maternal-fetal medicine specialist who performs detailed ultrasound: normal nuchal translucency, normal anatomy visible at this gestational age, no features concerning for Trisomy 18.

MFM recommendation: Strong recommendation for amniocentesis before making decisions. Patient again declines.

Pregnancy is terminated at 14 weeks at patient’s request. She requests tissue karyotyping.

Karyotype result: 46,XX. Chromosomally normal female fetus.

Patient calls 3 weeks later: “The genetics counselor told me the baby was normal. How could the NIPT be wrong? You told me it was 99% accurate. I terminated a healthy pregnancy because of your test.”

Question 1: What went wrong in this case?

NIPT false positive for Trisomy 18 with inadequate counseling about positive predictive value and the critical importance of diagnostic confirmation before irreversible decisions.

Root causes:

  1. Misunderstanding of screening vs. diagnostic testing
    • Sensitivity (99%) is not the same as positive predictive value (50-60% for T18 at maternal age 34)
    • At low prevalence, even highly sensitive tests have significant false positive rates
  2. Laboratory reporting ambiguity
    • “High probability” language may be interpreted as diagnostic certainty
    • PPV not clearly stated on lab report
    • Recommendation for diagnostic testing was present but not sufficiently emphasized
  3. Insufficient counseling infrastructure
    • Pre-test counseling did not adequately prepare patient for possibility of false positive
    • Post-test counseling did not successfully communicate PPV vs. sensitivity distinction
    • Genetics counselor not involved until after termination
  4. Time pressure and patient anxiety
    • Patient wanted rapid decision-making
    • Anxiety about carrying potentially affected fetus overrode statistical reasoning

Question 2: Are you liable for malpractice?

Legal analysis:

Standard of care for NIPT counseling (per ACOG/ACMG guidelines):

  1. Pre-test counseling requirements:
    • NIPT is screening, not diagnostic
    • False positives and false negatives occur
    • Diagnostic testing (CVS or amniocentesis) required for definitive diagnosis
    • Decisions about pregnancy continuation should not be made based on screening alone
  2. Post-positive counseling requirements:
    • Explain positive predictive value (not just sensitivity)
    • Strongly recommend diagnostic confirmation
    • Involve genetics counselor for high-risk results
    • Document detailed counseling and patient decision-making

Plaintiff’s argument:

  • “Dr. Smith told me NIPT was 99% accurate, leading me to believe a positive result meant certainty”
  • “Dr. Smith did not adequately explain positive predictive value”
  • “I was not required to see a genetics counselor before termination decision”
  • “If I had understood there was 40-50% chance of false positive, I would have gotten amniocentesis”
  • Damages: Wrongful termination of healthy pregnancy, emotional distress, need for psychiatric care

Defense arguments:

  • Pre-test consent documented: Screening vs. diagnostic distinction documented in chart
  • Post-positive counseling documented: Chart note states “explained PPV ~50-60%, strongly recommended amnio before any decisions”
  • MFM referral made: Specialist also counseled patient and recommended amniocentesis
  • Autonomous patient decision: Patient capacity intact, made informed refusal of diagnostic testing despite counseling
  • Causation question: Patient chose termination despite appropriate counseling. No breach of standard of care

Likely outcome:

Liability depends heavily on documentation quality:

  • If counseling well-documented: Defense likely prevails. Patient made autonomous decision despite appropriate counseling
  • If counseling poorly documented: Plaintiff may win. Breach of standard to allow pregnancy termination based on screening alone without documented, detailed counseling about false positive risk
  • Informed refusal documentation critical: Must show patient understood and declined amniocentesis despite understanding risks

ACOG Committee Opinion #640 explicitly states: “Invasive diagnostic testing should be offered when NIPT indicates increased risk” and “patients should be counseled that NIPT is a screening test.”

Settlement likely given sympathetic plaintiff (terminated healthy pregnancy), even if standard of care met.

Question 3: How should NIPT counseling be structured to prevent this scenario?

Best practices for NIPT implementation:

1. Pre-Test Counseling (Mandatory)

Required elements (document in chart):

☐ NIPT is screening, not diagnostic
☐ False positives occur, especially for rarer aneuploidies (T18, T13)
☐ Positive predictive value varies by condition and maternal age
☐ Diagnostic testing required before pregnancy decisions
☐ Incidental findings possible (maternal malignancy, vanishing twin)
☐ Patient verbalized understanding of above

2. Post-Positive Result Protocol

IMMEDIATE STEPS:

  1. Do not disclose result by phone/portal without counseling infrastructure
  2. Schedule in-person visit with physician + genetics counselor
  3. Prepare counseling materials with condition-specific PPV for patient’s age

IN-PERSON COUNSELING:

  • Review what NIPT is (screening) and is not (diagnostic)
  • Provide PPV specific to result and patient’s age:
    • Trisomy 21 at age 34: PPV ~80-90%
    • Trisomy 18 at age 34: PPV ~50-60%
    • Trisomy 13 at age 34: PPV ~30-40%
    • Microdeletions: PPV <10% (very low)
  • Explain that 1 in 2 positive T18 results are false positives at her age
  • Review diagnostic options (CVS if <14 weeks, amniocentesis if >15 weeks)
  • Explain amniocentesis risk (~1/500 miscarriage) vs. benefit (diagnostic certainty)

REQUIRE GENETICS COUNSELOR INVOLVEMENT for any positive NIPT before pregnancy decisions

INFORMED REFUSAL PROCESS if patient declines diagnostic testing:

☐ Patient counseled about PPV for specific condition
☐ Patient understands significant false positive rate
☐ Patient offered amniocentesis/CVS, risks and benefits explained
☐ Patient declines diagnostic testing
☐ Patient understands pregnancy decisions based on screening alone carry risk of false positive
☐ Patient signature acknowledging understanding
☐ Physician signature
☐ Genetics counselor signature (if involved)

3. System-Level Safeguards

Laboratory reporting standards:

  • PPV must be clearly stated on result reports (not just sensitivity)
  • Avoid language like “high probability” without quantification
  • Flagged recommendation: “Diagnostic testing required before pregnancy decisions”

Clinical decision support:

  • EMR hard stop: Cannot place pregnancy termination referral order without documented genetics counseling for positive NIPT
  • Automated genetics counselor consultation for all positive NIPTs

Informed consent for termination:

  • Termination centers should require karyotype confirmation or documented informed refusal of diagnostic testing
  • Second physician confirmation of counseling

4. Vendor Accountability

When evaluating NIPT vendors:

REQUIRED: - PPV reported on all result letters (not just sensitivity/specificity) - Age-stratified performance data provided - Clear statement that test is screening, not diagnostic - Support for genetics counseling (materials, hotline)

RED FLAGS: - Marketing emphasizes “99% accurate” without PPV context - No mention of false positives - Results reported through patient portal without counseling - Pressure for rapid turn-around without counseling infrastructure

Lesson: NIPT is a powerful screening tool, but positive results require diagnostic confirmation before irreversible pregnancy decisions. Counseling must emphasize positive predictive value (not sensitivity), and systems must ensure genetics counseling is integrated into the care pathway. Documentation of informed refusal is essential if patients decline diagnostic testing.

Scenario 2: Electronic Fetal Monitoring AI and Cesarean Section

You’re a laborist covering a busy community hospital labor and delivery unit. The hospital recently implemented an FDA-cleared AI system for continuous cardiotocography (CTG) interpretation (brand name: “FetalGuard AI”).

System description: Real-time AI analysis of fetal heart rate tracings with categorization: - Category I (normal): Reassuring, continue labor - Category II (indeterminate): Close monitoring, consider interventions - Category III (abnormal): Immediate delivery indicated

Your experience: First 2 months, system seems helpful. It flags concerning tracings and reduces cognitive burden during busy shifts.

Case presentation: 28-year-old G2P1 at 39 weeks, spontaneous labor, epidural analgesia, oxytocin augmentation for slow progress.

Labor course: - Cervix: 4 cm → 7 cm over 4 hours (adequate progress) - Fetal heart rate: Baseline 140s, moderate variability, no decelerations - AI categorization: Category I (reassuring)

Hour 5 of labor: - AI alert: “Category II - Variable decelerations detected. Consider intervention.” - Your review: Tracing shows occasional mild variable decelerations (<30 seconds, <60 bpm drop), moderate variability maintained - Your assessment: Category II, likely cord compression, acceptable for vaginal delivery with close monitoring - Your plan: Continue labor, nurse to notify you of any Category III features

Hour 6 of labor: - AI escalation: “Category III - Concerning tracing. Immediate delivery recommended.” - Your review: Moderate variability still present, variable decelerations now more frequent (every 2-3 contractions), one prolonged deceleration to 90 bpm × 90 seconds (returned to baseline 140s) - Your assessment: Borderline Category II vs. III. Not clearly Category III by ACOG criteria (would need absent variability + recurrent decelerations)

You perform scalp stimulation: Accelerates to 160 bpm (reassuring, suggests no acidemia)

Your decision: Continue labor, very close monitoring. Cervix now 9 cm, pushing likely within 1 hour.

AI continues to alarm: “Category III - Immediate delivery recommended” every 5 minutes

Nurse: “Dr. Jones, the AI keeps saying Category III. Should we do a C-section?”

You: “I’m watching the tracing closely. This is Category II. The baby is responding to stimulation. She’s almost complete. Let’s get her through to delivery.”

30 minutes later: - Cervix complete (10 cm) - Patient begins pushing - AI: “Category III - Immediate delivery recommended” - Fetal heart rate: Baseline 130s with moderate variability, variable decelerations to 80s with pushing (common, usually benign)

Delivery after 45 minutes of pushing: - Live male infant, Apgar 8 at 1 minute, 9 at 5 minutes - Cord pH 7.28 (normal, >7.20) - No neonatal complications

Next day: Chart review by peer reviewer (standard QA process)

Peer reviewer’s note: “AI system categorized as Category III for 75 minutes prior to delivery. Physician did not perform cesarean section. Prolonged Category III exposure concerning. Recommend M&M review.”

M&M Committee: “Why did you override the AI recommendation for immediate delivery? The system is FDA-cleared and categorized this as Category III.”

Question 1: Did you breach the standard of care by not performing cesarean section when AI recommended immediate delivery?

Standard of care analysis:

ACOG Guidelines for Intrapartum Fetal Monitoring:

Category III criteria (requires immediate delivery): - Absent baseline variability AND any of: - Recurrent late decelerations - Recurrent variable decelerations - Bradycardia

OR - Sinusoidal pattern

Key element: Absent baseline variability is required for Category III classification (except sinusoidal pattern)

Your tracing: - Moderate variability present throughout - Variable and prolonged decelerations present - Does NOT meet ACOG definition of Category III (would be Category II)

AI misclassification: The AI system categorized tracings with moderate variability as Category III, which contradicts ACOG criteria.

Physician responsibility: - Standard of care requires physician interpretation of fetal monitoring, not blind adherence to AI - AI is clinical decision support, not a replacement for clinical judgment - Physicians must understand ACOG fetal monitoring definitions and apply them - Scalp stimulation with acceleration is reassuring and supports continuing labor

Defense argument:

  • Adherence to ACOG guidelines: Tracing did not meet Category III criteria per ACOG Practice Bulletin #106
  • Appropriate fetal assessment: Scalp stimulation showed reassuring response
  • Excellent outcome: Normal Apgar scores, normal cord pH
  • AI error, not physician error: AI system misclassified Category II as Category III
  • Standard of care met: Physician correctly interpreted tracing and made appropriate clinical decision

Verdict: No breach of standard of care. Physician correctly applied evidence-based guidelines despite AI misclassification.

Question 2: What if the outcome had been bad (low Apgar, neonatal encephalopathy)?

Different legal landscape with adverse outcome:

Plaintiff’s argument: - “The FDA-cleared AI system said Category III and recommended immediate delivery” - “Dr. Jones ignored the AI warning” - “If cesarean had been performed when AI first recommended (75 minutes earlier), baby would not have brain injury” - Causation: Delay in delivery caused hypoxic-ischemic encephalopathy

Defense argument: - ACOG guidelines do not define this as Category III - Cord pH of 7.05 (hypothetical) may reflect chronic placental insufficiency, not acute intrapartum event - Cesarean 75 minutes earlier would not have prevented outcome if chronic process - AI false positive led to inappropriate recommendation

Key legal issue: Competing standards

  • ACOG guidelines (physician interpretation) vs.
  • FDA-cleared AI (algorithm recommendation)

Jury question: “Which standard should the physician follow when they conflict?”

Risk: Jury may be swayed by “FDA-cleared” designation and believe AI is authoritative

Expert witness battle: - Plaintiff expert: “Hospital implemented this system and physician should follow it” - Defense expert: “ACOG guidelines are standard of care, AI is adjunct only”

Likely outcome: Defensible but risky - Strong defense if expert testimony supports ACOG guidelines as standard - Risk if jury perceives physician as “overriding” technology - Informed consent documentation critical (“I explained we use AI but I make final decisions”)

Question 3: How should hospitals implement fetal monitoring AI to avoid this liability trap?

Best practices for EFM AI implementation:

1. Validation Before Deployment

Require vendor to demonstrate: - Sensitivity/specificity for Category III detection against ACOG criteria (not vendor’s internal definitions) - False positive rate (critical for alert fatigue) - Validation in diverse populations (race/ethnicity, BMI, electrode type) - Prospective trial data showing improved neonatal outcomes (not just classification accuracy)

If vendor definitions differ from ACOG: - Do not implement until alignment achieved - Insist on ACOG-compliant categorization

2. Physician Training (Mandatory)

All laborists/obstetricians must complete: - ACOG fetal monitoring workshop - Review of Category I/II/III definitions - Understanding that AI is adjunct, not replacement - Policy: Physician interpretation is final authority - Scenarios where AI may misclassify (examples reviewed)

Key principle: “AI provides a second opinion, not an order”

3. Clinical Protocols

When AI and physician interpretation disagree:

PROTOCOL: AI-Physician Discordance

If AI categorizes as Category III but physician assessment is Category I/II:
1. Physician documents rationale for disagreement in chart
2. Physician performs additional fetal assessment (scalp stimulation, consider scalp pH if available)
3. Physician discusses with patient: "The AI system is concerned, but I believe the tracing is reassuring based on [rationale]. I recommend continuing labor with close monitoring."
4. Physician considers second opinion from colleague if time permits
5. AI alert acknowledged but clinical judgment prevails

Documentation template:
"AI system categorized tracing as Category III. My interpretation: Category II based on moderate variability present, decelerations consistent with cord compression, reassuring response to scalp stimulation. Continuing labor with continuous monitoring per ACOG guidelines."

4. Informed Consent

Admission consent should include:

“Our hospital uses an AI system to assist with fetal monitoring interpretation. This system provides real-time analysis of your baby’s heart rate. However, your physician makes all final decisions about your care. The AI is a tool to assist your physician, not a replacement for their judgment.”

5. Quality Assurance

Regular audits: - AI false positive rate (Category III alerts that didn’t meet ACOG criteria) - AI false negative rate (missed Category III tracings) - Cesarean section rate before/after AI (watch for increase from false positives) - Neonatal outcomes stratified by AI alert status - Bias assessment: AI performance by race/ethnicity, BMI

Trigger for re-evaluation: - False positive rate >20% - Cesarean section rate increase >5% without improved outcomes - Racial disparities in AI accuracy

6. Vendor Evaluation Questions

MUST ANSWER before purchase:

  1. “Does your system use ACOG Category I/II/III definitions exactly, or internal definitions?”
  2. “What is your false positive rate for Category III alerts?”
  3. “Provide data on cesarean section rates in hospitals using your system vs. controls”
  4. “Provide neonatal outcome data (Apgar, cord pH, NICU admission) in prospective trials”
  5. “Provide performance data stratified by maternal race, BMI, electrode type”
  6. “What happens legally if a physician overrides your recommendation and outcome is bad?”
  7. “Will you indemnify physicians for adverse outcomes when following ACOG guidelines that differ from your recommendations?”

RED FLAGS: - Vendor cannot provide false positive rate data - Vendor claims “physicians should always follow AI recommendations” - Vendor definitions of Category I/II/III differ from ACOG - No prospective outcome data, only retrospective classification accuracy - Vendor resists bias audits

Lesson: FDA clearance does not mean AI is infallible or replaces clinical judgment. Fetal monitoring AI must align with ACOG guidelines, and physicians must maintain authority to override AI when clinical judgment differs. Hospitals must establish clear protocols for AI-physician discordance and ensure physicians are trained to interpret tracings independently. Documentation of clinical reasoning when overriding AI is essential.

Scenario 3: Preeclampsia Prediction Algorithm and Racial Bias

You’re a maternal-fetal medicine specialist at an academic medical center. Your hospital recently implemented a preeclampsia prediction algorithm (vendor: “PreeclampSafe AI”) for all first-trimester patients.

Algorithm inputs (collected at 11-13 weeks): - Maternal age, BMI, race, obstetric history - Mean arterial pressure (MAP) - Uterine artery Doppler pulsatility index (PI) - Serum PAPP-A and PlGF

Algorithm output: - High risk (≥10% risk of early-onset preeclampsia <34 weeks): Aspirin 81mg daily + enhanced monitoring - Low risk (<10% risk): Routine prenatal care

Your experience: First 6 months, algorithm seems effective. It identifies high-risk patients, and aspirin compliance is good.

Month 7 - Quality review:

Your fellow presents data at division meeting:

Preeclampsia outcomes stratified by race:

Race/Ethnicity Patients Algorithm “High Risk” Developed Preeclampsia <34 wks Preeclampsia Cases Missed by Algorithm
White 450 68 (15%) 12 cases 2/12 (17% missed)
Black 280 28 (10%) 16 cases 8/16 (50% missed)
Hispanic 220 25 (11%) 9 cases 3/9 (33% missed)

Alarming finding: Algorithm identifies only 50% of early-onset preeclampsia cases in Black women, compared to 83% in White women.

Outcome of missed cases:

Case 1: 29-year-old Black G1P0, algorithm “low risk” (7% predicted risk) - Developed severe preeclampsia at 29 weeks - Delivered emergently for HELLP syndrome - Infant: 1200g, 8-week NICU stay, severe ROP requiring laser, chronic lung disease - Mother: ICU admission for eclampsia, 2 seizures despite magnesium sulfate

Case 2: 34-year-old Black G2P1, algorithm “low risk” (8% predicted risk) - Developed early-onset preeclampsia at 31 weeks - Placental abruption, emergency cesarean - Infant: 1400g, intraventricular hemorrhage grade III, long-term neurodevelopmental concerns - Mother: postpartum hemorrhage, required transfusion

Both patients were NOT prescribed aspirin because algorithm classified them as low-risk. If they had been identified as high-risk and prescribed aspirin, evidence suggests 50-60% risk reduction (Rolnik et al., 2017).

Question 1: Why is the algorithm underperforming in Black women?

Root causes of algorithmic racial bias:

1. Training data composition - Algorithm likely trained predominantly on White women - Biomarker distributions differ by race: - PAPP-A levels lower in Black women (biology, not pathology) (Spencer et al., 2005) - PlGF levels may differ - Uterine artery Doppler cutoffs developed in predominantly White cohorts

2. Algorithm inappropriately adjusts for race - “Race” entered as input variable - Algorithm may incorrectly lower risk estimates for Black women based on training data patterns - Confounding: Training data may have systematically under-diagnosed preeclampsia in Black women (due to bias in care access, delayed presentation)

3. Social determinants not captured - Chronic stress from racism (weathering hypothesis) - Food insecurity, housing instability - Limited prenatal care access - Neighborhood-level factors - These increase preeclampsia risk but are not in algorithm inputs

4. Measurement bias - Blood pressure measurements may be less accurate in Black women (cuff size, technique) - MAP calculation error propagates through algorithm

Question 2: Are you liable for adverse outcomes in patients misclassified by biased algorithm?

Legal analysis:

Standard of care for preeclampsia screening: - ACOG recommends aspirin for high-risk patients - High-risk criteria (ACOG Practice Bulletin #222): - History of preeclampsia, especially early-onset - Multifetal gestation - Chronic hypertension, diabetes, renal disease, autoimmune disease - Combination of moderate risk factors

Plaintiff’s argument (Case 1: 29-year-old with HELLP at 29 weeks):

  • “Dr. Smith’s hospital uses a preeclampsia algorithm that is racially biased”
  • “Algorithm missed 50% of Black women who developed preeclampsia, but only 17% of White women”
  • “If I had been prescribed aspirin, my baby would not have spent 8 weeks in NICU with severe complications”
  • “Hospital knew or should have known algorithm was biased and continued using it”
  • Civil Rights violation: Disparate impact. Black patients received inferior care
  • Damages: Neonatal complications, maternal morbidity, NICU costs, long-term disability

Defense arguments:

1. Standard of care met: - Algorithm is one tool among many - ACOG risk criteria also applied (patient had no traditional high-risk factors) - Clinical judgment incorporated

2. Causation uncertain: - Not all high-risk women develop preeclampsia even without aspirin - Aspirin reduces risk 50-60%, does not eliminate it - Cannot prove aspirin would have prevented this specific case

3. Algorithm bias unknown at time: - Hospital identified bias through QI process - Acting now to address it

4. Race-based differences may reflect biology, not bias: - Different biomarker distributions by race may be valid - Algorithm optimized for overall population performance

Plaintiff’s rebuttal:

Civil Rights Act Title VI: Hospitals receiving federal funds (Medicare/Medicaid) cannot discriminate based on race

  • Disparate impact doctrine: Even without intent, if AI system causes worse outcomes for racial minorities, potentially illegal
  • “We didn’t know about the bias” is not a defense if hospital failed to conduct equity audit before deployment
  • Continued use after discovery is especially indefensible

Likely outcome:

Significant liability risk, especially for cases after bias was discovered:

  • Before bias discovery: Defensible if standard ACOG risk criteria applied
  • After bias discovery: Difficult to defend continued use of known-biased algorithm
  • Class action risk: All Black women misclassified could join lawsuit
  • Regulatory action: CMS/OCR investigation possible for Title VI violation

Settlement likely given sympathetic plaintiffs (neonatal harm) and evidence of known racial bias

Question 3: How should hospitals address AI racial bias when discovered?

Immediate actions (within 1 week of discovery):

1. Suspend algorithm use OR implement race-stratified thresholds

Option A: Suspend entirely - Revert to ACOG clinical risk criteria - Notify all clinicians of suspension - Notify vendor of bias discovery

Option B: Emergency mitigation (if suspension not feasible) - Lower risk threshold for Black and Hispanic women - White women: ≥10% → aspirin - Black women: ≥5% → aspirin (more sensitive threshold to compensate for algorithm underperformance) - Hispanic women: ≥7% → aspirin - Temporary measure until re-validation or replacement

2. Patient notification

Notify all Black/Hispanic women classified as “low risk” in past 6 months:

“Our hospital has identified that the preeclampsia prediction algorithm used during your pregnancy may have underestimated risk for Black and Hispanic women. If you are still pregnant, we recommend re-evaluation for aspirin prophylaxis. If you have delivered, we apologize and are reviewing your care.”

Offer: - Re-evaluation by MFM specialist - Aspirin initiation if still pregnant and appropriate - Documentation review and outcomes analysis

3. Root cause analysis

Convene multidisciplinary team: - Maternal-fetal medicine specialists - Health equity experts - Biostatisticians - Patient advocates from affected communities - Hospital legal/risk management

Investigate: - Training data composition (% by race) - Performance metrics stratified by race (should have been done before deployment) - Algorithm decision tree (how is race variable used?) - Comparison to ACOG clinical criteria

4. Vendor accountability

Demand from vendor (PreeclampSafe AI):

  1. Full performance data by race/ethnicity from all deployment sites
  2. Explanation of bias source (training data? feature engineering? race adjustment?)
  3. Correction plan with timeline (re-training? different features?)
  4. Validation in diverse cohort before re-implementation
  5. Financial responsibility for outcomes in misclassified patients

If vendor is unresponsive or defensive: - Terminate contract - Report to FDA (if device is FDA-cleared, bias may violate regulations) - Notify other institutions using same algorithm

5. Policy changes

New institutional policy for all AI systems:

Equity Impact Assessment (Required Before Deployment):

☐ Validation data includes adequate representation of patient populations served
  - Minimum 30% patients from each major racial/ethnic group
☐ Performance metrics (sensitivity, specificity, PPV, NPV) reported separately by:
  - Race/ethnicity
  - Age
  - BMI
  - Insurance status
  - Language
☐ Algorithm does not use race as input variable (unless specific biological justification provided and validated)
☐ Clinical decision thresholds validated separately for each subgroup
☐ Monitoring plan for ongoing bias detection (quarterly audits)
☐ Health equity committee approval obtained

6. Ongoing monitoring (post-implementation)

Quarterly audits required:

Metric Overall White Black Hispanic Asian Other
% Classified High Risk
% Developed Early Preeclampsia
Sensitivity (% cases detected)
Specificity
PPV

Trigger for intervention: Sensitivity differs by >10% between racial groups → immediate review

7. Community engagement

Engage Black and Hispanic patient communities: - Explain what happened (transparent communication) - Apologize for bias and harm - Describe corrective actions - Invite feedback on AI governance processes - Ensure representation on AI ethics committee

8. Alternative approaches that reduce bias

Race-agnostic models: - Remove race as input variable - Use only biological/clinical variables validated across populations - Accept potentially lower overall performance if equity improved

Social determinants integration: - Add neighborhood-level deprivation index - Screen for food insecurity, housing instability - Incorporate chronic stress measures

Lower threshold for all: - Set algorithm threshold higher (e.g., ≥5% for everyone) - Increase aspirin prescribing overall - Trade-off: More aspirin prescriptions (cost, side effects) vs. fewer missed cases

Hybrid approach: - Use algorithm as one input, not sole decision-maker - Require clinician review of all “borderline” cases (e.g., 8-12% risk) - Clinical judgment can override algorithm

Lesson: AI algorithms in high-stakes obstetric settings must be rigorously validated for equity before deployment. Racial bias in preeclampsia prediction is particularly dangerous because it denies effective prophylaxis (aspirin) to the populations at highest baseline risk. Hospitals have legal, ethical, and regulatory obligations to detect and correct algorithmic bias. Once bias is discovered, continued use may constitute civil rights violations. Community engagement and transparent communication are essential to rebuilding trust after algorithmic harm.


References