Hematology-Oncology and Precision Medicine

Oncology generates vast, complex datasets that AI excels at parsing: lung screening CTs with 30+ nodules per scan, whole-slide pathology images with millions of cells, flow cytometry panels with 10-parameter immunophenotyping, genomic sequencing with hundreds of actionable variants. Cancer screening AI shows strong evidence (94-97% sensitivity for lung nodules, non-inferior mammography reading). Digital morphology analyzers are FDA-cleared and deployed. IBM Watson for Oncology failed spectacularly, offering unsafe recommendations without RCT validation.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI systems for cancer screening and early detection
  • Understand AI applications in pathology and radiology for cancer diagnosis
  • Assess AI tools for peripheral blood smear and bone marrow analysis
  • Navigate AI-assisted flow cytometry interpretation for hematologic malignancies
  • Apply genomic AI tools for treatment selection in solid tumors and leukemias
  • Evaluate AI in radiation therapy planning and CAR-T response prediction
  • Recognize limitations and failure modes of hematology-oncology AI
  • Apply evidence-based frameworks for adopting AI in cancer and blood disorder care

The Clinical Context:

Hematology and oncology generate vast, complex datasets: blood smears, bone marrow aspirates, flow cytometry panels, imaging studies, genomic sequencing, treatment responses. Cancer is not one disease but hundreds. Hematologic malignancies add another dimension of complexity with lineage determination, blast quantification, and measurable residual disease. AI excels at pattern recognition in these high-dimensional data types, but the stakes are extraordinary: treatment decisions affect survival, quality of life, and financial toxicity.

Key Applications:

  • Cancer screening AI (lung CT, mammography, colonoscopy): Well-validated, FDA-cleared, improving detection rates
  • Pathology AI (Gleason grading, HER2 scoring, Ki-67): Reduces inter-observer variability, FDA-cleared systems
  • Peripheral blood smear AI (Scopio, CellaVision): FDA-cleared, automates WBC differential and morphology
  • Bone marrow analysis AI: First FDA clearance 2024 (Scopio), automates cell classification and blast estimation
  • Flow cytometry AI: Automated gating and malignancy classification, >95% accuracy for B-cell malignancies
  • Genomic AI (variant interpretation, treatment matching): Accelerates precision medicine, requires tumor board integration
  • CAR-T response prediction: ML models achieving AUC 0.82 for relapse prediction in DLBCL
  • Radiation therapy AI (auto-contouring, plan optimization): Mature application, widely deployed

What Actually Works:

  • Lung cancer screening AI: 94-97% sensitivity for significant nodules, reduces false positives by 11%
  • Mammography AI: Non-inferior to double-reading in prospective trials
  • Digital morphology analyzers (CellaVision, Scopio): 15% reduction in turnaround time, FDA-cleared
  • Flow cytometry AI for B-cell malignancies: 96% accuracy, AUC 0.98 for classification
  • Radiation therapy auto-contouring: Reduces planning time from hours to minutes

What Doesn’t Work:

  • IBM Watson for Oncology: Unsafe recommendations, no RCT validation, withdrawn
  • Autonomous treatment decision-making: Cannot replace nuanced hematologist-oncologist judgment
  • Most bone marrow AI models: Limited external validation (only 3.7% of studies)

Critical Insights:

  • ASH Subcommittee on AI (2025): Few AI tools fully implemented due to data quality, equity, and validation gaps
  • Demand external validation before clinical adoption
  • AI assists tumor boards and morphology review, never replaces them
  • Flow cytometry AI limited to panels it was trained on

Essential Reading:

ASH Subcommittee review (Blood 2025), digital morphology validation (Diagnostics 2025), flow cytometry AI recommendations (Cytometry B 2024), bone marrow AI review (HemaSphere 2024).


Introduction: AI Across Hematology and Oncology

Hematology and oncology sit at the forefront of AI in medicine. These fields generate enormous volumes of complex data: peripheral blood smears, bone marrow aspirates, flow cytometry panels, imaging studies, pathology slides, genomic sequencing, treatment responses, survival outcomes. Each cancer type and hematologic disorder has distinct biology, staging systems, treatment algorithms, and prognoses. This complexity creates both opportunity and challenge for AI applications.

The opportunity: AI excels at pattern recognition in complex, high-dimensional data. Classifying cells on blood smears, detecting blast populations in bone marrow, interpreting flow cytometry patterns, identifying genomic variants, and predicting treatment response are tasks where machine learning can potentially augment human expertise. Hematology is particularly suited to AI given the structured nature of cell morphology and immunophenotyping data.

The challenge: Cancer is not one disease but hundreds. Hematologic malignancies add layers of complexity: lineage determination, blast quantification, measurable residual disease assessment. Sample sizes for rare disorders are limited. Treatment decisions involve nuanced tradeoffs between efficacy and toxicity, survival and quality of life, evidence-based guidelines and individual patient preferences. These decisions require expertise, empathy, and judgment that current AI cannot replicate.

This chapter examines AI applications across hematology and oncology: screening, diagnosis, blood and bone marrow morphology, flow cytometry, genomic analysis, treatment planning, and prognostication. We evaluate what works, what doesn’t, and how to avoid the premature deployment failures exemplified by IBM Watson for Oncology.


Cancer Screening and Early Detection

Lung Cancer Screening CT Analysis

Clinical context: Low-dose CT screening reduces lung cancer mortality by 20% in high-risk smokers. But screening programs face challenges: radiologist workload, inter-reader variability, and false positives requiring invasive follow-up.

AI enhancement: - Automated lung nodule detection and volumetry - Lung-RADS classification assistance - Reduction in false positives

Evidence: - Multiple FDA-cleared AI systems (Aidoc, Lunit, Optellum) - Sensitivity 94-97% for significant nodules (≥6mm) - Reduces radiologist reading time by 30-40% - Decreases false positives by 11% - Published in Nature Medicine

This is well-validated AI that’s actually deployed in clinical practice. Multiple FDA-cleared systems improve both efficiency and accuracy of lung cancer screening programs, reducing false positives while maintaining excellent sensitivity.

Implementation considerations: - AI assists radiologist interpretation, does not replace it - False negatives still occur (especially for ground-glass opacities, small nodules) - Integration with Lung-RADS reporting systems essential - Patient communication about AI-assisted interpretation

Breast Cancer Mammography AI

Evidence: - Deep learning matches or exceeds radiologist performance - Published in Nature - Prospective trial in Sweden: AI + single radiologist non-inferior to double-reading - Reduces screening recall rates by 15-20% - Detects 10-15% more cancers in some studies

Strong evidence base, with deployment expanding in Europe and FDA-cleared systems available in the US. The prospective Swedish trial showing AI plus one radiologist matching two radiologists is particularly compelling. Genuine workflow efficiency with maintained accuracy.

Limitations: - Performance varies by breast density (dense breasts more challenging) - Most training data from screening populations (may not generalize to diagnostic mammography) - Racial bias: many systems trained predominantly on white women - Does not replace clinical judgment for complex cases

Colorectal Cancer AI Colonoscopy

Application: Real-time polyp detection during colonoscopy

Evidence: - AI-assisted colonoscopy increases adenoma detection rate (ADR) by 10-15% absolute - Published in Gastroenterology - Reduces miss rates for clinically significant polyps - FDA-cleared systems (Medtronic GI Genius, others)

Limitations: - Does not improve detection of flat or subtle serrated polyps - May increase procedure time - Cost-effectiveness debated

AI-assisted colonoscopy has FDA clearance and RCT evidence showing improved adenoma detection rates. The ADR increase (10-15% absolute) is meaningful, but whether this translates to long-term cancer prevention remains unproven. Quality metric improvement? Yes. Cancer prevention proof? Not yet.


Pathology AI

Prostate Cancer Gleason Grading

Application: AI analysis of prostate biopsy specimens

Evidence: - FDA-cleared system (Paige Prostate) for Gleason scoring - Reduces inter-pathologist variability - Sensitivity 98% for clinically significant cancer (Grade Group ≥2) - Flags suspicious regions for pathologist review - Published in Archives of Pathology & Laboratory Medicine

Paige Prostate has FDA clearance and genuinely augments pathologist workflow without attempting to replace expertise. Reduces grading variability, which matters for treatment decisions. Gleason 3+4 vs. 4+3 determines therapy intensity.

Implementation: - AI pre-screens slides, flags suspicious areas - Pathologist reviews flagged regions and renders final diagnosis - Reduces time spent on benign tissue - Standardizes grading criteria

Breast Cancer Pathology

Applications: - HER2 scoring from IHC - Ki-67 quantification - Lymph node metastasis detection

Evidence: - AI matches pathologist accuracy for HER2 2+ vs. 3+ discrimination - Automated Ki-67 scoring reduces variability - Sentinel lymph node metastasis detection: sensitivity 92-95% - Published in JAMA Oncology

Well-validated for specific tasks. HER2 and Ki-67 scoring variability is a genuine clinical problem. Standardization through AI quantification improves treatment decision consistency.


Radiology AI for Cancer Staging

Automated Tumor Segmentation and Measurement

Applications: - Automated RECIST measurements for treatment response - Tumor volumetry (more accurate than 2D diameter) - Longitudinal tracking

Evidence: - AI-based volumetry more reproducible than manual RECIST - Predicts treatment response earlier than conventional criteria - Published in Radiology

Useful for clinical trials and longitudinal monitoring. FDA-cleared systems available. Volumetric measurements beat 2D RECIST for reproducibility. This matters when assessing treatment response.

Lymph Node Metastasis Detection

Applications: - Automated detection of suspicious lymph nodes on CT/MRI - PET-CT analysis for staging

Evidence: - AI models show promise but variable performance - Sensitivity 70-85%, specificity 80-90% (not sufficient to replace human interpretation) - False negatives problematic (understaging)

Research ongoing, but these aren’t reliable enough for autonomous staging decisions. False negatives mean understaging. Missing N2 disease changes treatment from surgery to neoadjuvant therapy. Can’t afford those misses.


Genomic and Molecular AI

Tumor Mutation Profiling and Treatment Selection

Applications: - NGS data analysis for actionable mutations - FDA-approved targeted therapy matching - Tumor mutational burden (TMB) calculation for immunotherapy eligibility

Evidence: - AI tools accelerate variant interpretation - Foundation Medicine, Tempus, others use ML for treatment recommendations - Published in Nature Genetics

Limitations: - Interpretation of variants of unknown significance (VUS) remains challenging - Off-label treatment recommendations not always evidence-based - Insurance coverage variable

AI tools accelerate variant interpretation, and platforms like Foundation Medicine and Tempus are valuable for precision oncology. But these must be integrated with multidisciplinary tumor board review. Genomic data without clinical context leads to off-label recommendations that may not benefit patients.

Liquid Biopsy and Minimal Residual Disease (MRD)

Application: ctDNA analysis for MRD detection after curative-intent surgery/treatment

Evidence: - Multiple platforms (Guardant, Natera, others) show MRD predicts recurrence - Detects recurrence months before imaging - Published in NEJM

Critical gap: No RCT showing that MRD-directed interventions improve outcomes

Multiple platforms detect minimal residual disease months before imaging, impressive technology. But here’s the problem: we don’t have RCTs showing that treating MRD-positive patients actually improves outcomes. Detecting recurrence early without proven intervention creates anxiety and risks overtreatment. Promising biomarker, but clinical utility remains unproven.


Treatment Planning and Delivery

Radiation Therapy AI

Auto-Contouring: - AI-automated organ-at-risk (OAR) and tumor volume delineation - Reduces planning time from hours to minutes - Commercially available systems widely deployed

Evidence: - AI contours comparable to expert radiation oncologists for most OARs - Dice similarity coefficients 0.85-0.95 for critical structures - Published in International Journal of Radiation Oncology Biology Physics

Treatment Plan Optimization: - AI-generated IMRT/VMAT plans - Knowledge-based planning using historical data - Plan quality improvements (better OAR sparing)

FDA-cleared, widely deployed. Auto-contouring reduces planning time from hours to minutes and consistency in radiation therapy planning. This is one of the more mature oncology AI applications.

Systemic Therapy Selection AI

Challenge: Complex decision-making involving tumor characteristics, patient factors, evidence quality, goals of care

AI approaches: - IBM Watson for Oncology (discontinued due to poor performance, see Chapter 1) - Newer systems integrating guidelines + patient data

Evidence: - Mixed at best - Concordance with oncologist decisions 50-90% depending on cancer type - Does not account for patient preferences, quality of life considerations, financial toxicity

The Watson for Oncology failure demonstrates the dangers of premature deployment (see Chapter 1 for full details). Complex treatment decisions involving efficacy-toxicity tradeoffs cannot be reduced to pattern matching. No current AI system should autonomously recommend systemic therapy. Tumor boards exist for good reason.


Clinical Trial Matching

AI-Assisted Trial Eligibility Screening

Application: Analyze EHR data to identify patients potentially eligible for clinical trials

Evidence: - Increases trial enrollment by 20-40% - Reduces time to identify eligible patients - Published in JCO Clinical Cancer Informatics

Limitations: - Eligibility algorithms only capture structured EHR data (miss nuanced exclusions) - Requires human review for final determination - Doesn’t solve root problems (trial design, access barriers, mistrust)

AI increases trial enrollment by identifying potentially eligible patients from EHR data. Useful screening tool. But it won’t replace detailed eligibility assessment. Too many nuanced exclusion criteria don’t show up in structured data.


Prognostication and Survival Prediction

ML-Based Survival Models

Applications: - Predict overall survival, progression-free survival - Integrate clinical + genomic + imaging features

Evidence: - ML models often outperform traditional nomograms (c-index 0.70-0.80 vs. 0.65-0.70) - Meta-analysis in BMJ

Critical limitations: - Predictions at individual patient level uncertain (wide confidence intervals) - Can’t capture all relevant factors (patient goals, social support, unmeasured confounders) - Risk of self-fulfilling prophecies (predicted short survival leads to less aggressive treatment leads to shorter survival)

Ethical concerns: - Prognostic algorithms may influence treatment intensity, hospice referral - Vulnerable to bias (if training data underrepresents certain populations) - Must not be sole basis for withholding treatment

ML models often outperform traditional nomograms, but individual predictions remain uncertain. These may inform discussions about prognosis, but they should never dictate treatment decisions. Predicted survival is not actual survival. Communicate the uncertainty transparently, and recognize the risk of self-fulfilling prophecies.


Hematology AI: Blood, Bone Marrow, and Beyond

The ASH Subcommittee on Artificial Intelligence published a comprehensive review in Blood (2025) noting that while AI and ML have significant potential for enhancing diagnostic accuracy, risk stratification, and treatment response prediction in hematology, few tools have been fully implemented in clinical practice due to challenges related to data quality, equity, and validation standards.

Peripheral Blood Smear Analysis

Clinical context: Manual peripheral blood smear review is labor-intensive, requires expertise, and suffers from inter-observer variability. Automated digital morphology systems promise standardization and efficiency.

FDA-cleared systems:

  • CellaVision (DM1200, DC-1): The most widely deployed digital morphology platform globally. Uses image recognition to pre-classify WBCs for technologist review. FDA-cleared since the 2000s, with continuous algorithm refinements.

  • Scopio Labs X100: Full-field imaging approach with AI-powered cell detection and classification. Fourth FDA clearance received July 2025 for enhanced RBC and platelet capabilities. Integrates WBC differential, RBC morphology evaluation, and platelet estimates.

Evidence:

  • Real-world validation shows 15.8% reduction in morphology turnaround time compared to traditional workflows
  • Enables remote review, eliminating backlogs and reducing shift requirements
  • High accuracy for routine differentials, though atypical cells still require expert review

Limitations:

  • Performance varies with smear quality and staining consistency
  • Atypical lymphocytes, reactive changes, and rare cell populations require human verification
  • Training data predominantly from reference laboratories may not generalize to all settings

Digital morphology represents one of the more mature hematology AI applications, with genuine workflow improvements. The technology augments rather than replaces morphologist expertise.

Bone Marrow Aspirate Analysis

Clinical context: Bone marrow morphology is essential for diagnosing leukemias, myelodysplastic syndromes, lymphomas, and other hematologic disorders. Manual review is time-consuming and subject to inter-pathologist variability, particularly for blast enumeration and dysplasia assessment.

FDA-cleared systems:

  • Scopio Labs Full-Field Bone Marrow Aspirate (FF-BMA): Received De Novo FDA clearance in April 2024, establishing a new regulatory category for all-digital bone marrow aspirate analysis. Automates cell detection, classification, blast estimation, and myeloid-to-erythroid ratio calculation.

Research systems:

  • Morphogo (convolutional neural network trained on 2.8 million bone marrow cell images): Achieves high accuracy for cell classification
  • Mayo Clinic Histogram of Cell Types (HCT): Deep learning system generating automated cytological fingerprints from bone marrow aspirates (published in Communications Medicine 2022). Achieved 0.97 accuracy for region detection and 0.75 mean average precision for cell classification.

Evidence from systematic reviews:

A comprehensive review in HemaSphere (December 2024) examined AI-based cell classification methods in bone marrow aspirate smears from 2019-2024:

  • Deep learning models achieve 89-94% accuracy for leukemia classification
  • MobileViTv2 hybrid model achieved 98% accuracy for multiple myeloma, 96% for ALL and lymphoma detection
  • Ensemble CNN models (ResNeXt101, ResNeXt50, ResNet50) demonstrated 89% accuracy for ALL diagnosis on external validation

Critical limitation: Only 3.7% of published studies (3 of 81) conducted external validation on independent datasets. Most models show strong performance within training data but generalizability remains unproven.

Flow Cytometry AI

Clinical context: Flow cytometry is essential for diagnosing and classifying hematologic malignancies, requiring expert pattern recognition across high-dimensional immunophenotyping data. Manual gating is time-intensive and subject to inter-operator variability.

Evidence:

Published recommendations for AI in clinical flow cytometry (Cytometry Part B: Clinical Cytometry, 2024) outline the current state:

  • Automated classifiers achieve >95% accuracy for diagnosing B-cell malignancies
  • One validated pipeline achieved 96.02% overall accuracy, 99.26% specificity, 83.57% sensitivity for B-cell malignancy classification (AUC 0.9764)
  • Classifications include: CLL, B-ALL, diffuse large B-cell lymphoma, hairy cell leukemia, plasma cell neoplasms, and various B-NHL subtypes

Augmented Human Intelligence study (American Journal of Clinical Pathology, 2021):

  • UMAP dimension reduction combined with random forest classification
  • Achieves automated diagnosis without manual gating on specific populations
  • Demonstrates potential for decision support in routine diagnostics

Implementation considerations:

  • AI models are limited to the antibody panels they were trained on
  • Low-risk applications (QA/QC flagging, panel ordering) appropriate for early adoption
  • High-stakes diagnostic decisions still require expert review
  • Sensitivity for small pathological populations (including measurable residual disease) remains a development priority

CAR-T Cell Therapy Response Prediction

Clinical context: CAR-T therapy achieves complete remission in 30-40% of relapsed/refractory DLBCL patients, but approximately 57% eventually relapse. Predicting response and toxicity could guide patient selection and monitoring.

Evidence:

Multicenter AI model for early relapse prediction (Blood Advances, 2025):

  • 416 adult DLBCL patients receiving axicabtagene ciloleucel across University of California Health Systems (2017-2024)
  • ML model using age and 6 routine laboratory tests (LDH, CRP, ferritin, hematocrit, platelet count, prothrombin time) achieved AUC 0.82 for predicting early relapse
  • Decision curve analysis showed positive net benefit across 0-0.7 threshold range

Deep learning image analysis:

  • Pre-treatment CT and PET imaging analyzed for 770 lymph node lesions from 39 patients
  • Patient-level response prediction achieved 81% accuracy, 75% sensitivity, 88% specificity using 12-month outcomes

Computational modeling:

  • Models calibrated on 209 leukemia patients dissect mechanisms behind heterogeneous responses
  • Predict responders, non-responders, and CD19+/CD19- relapse patterns

These models show promise for risk stratification but require prospective validation before clinical implementation.

Sickle Cell Disease AI

Clinical context: Vaso-occlusive crises (VOCs) cause the majority of SCD hospitalizations. Predictive models could enable preventive intervention.

Evidence:

  • AI models for VOC prediction using biomarkers, clinical data, and patient-reported symptoms under development
  • Organ failure prediction models using vital sign features achieved 96% sensitivity, 98% specificity for predicting deterioration up to 6 hours before onset
  • XGBoost models predict acute kidney injury 12 hours before onset in hospitalized SCD patients

Anticoagulation Management

Clinical context: Warfarin’s narrow therapeutic index and significant inter-individual variability make dosing challenging. ML models integrating clinical and pharmacogenomic data show promise.

Evidence:

  • Deep reinforcement learning model trained on 28,232 patients from four pivotal RCTs (RE-LY, ENGAGE AF-TIMI 48, ARISTOTLE, ROCKET AF) optimizes time in therapeutic INR range
  • ML algorithms for cardiac surgery patients guide initial warfarin dosing using routinely available clinical data
  • Genetic integration (CYP2C9, VKORC1) improves predictions, though validation across diverse populations remains limited

ASH Position on AI Implementation

The ASH Subcommittee on AI highlights critical challenges for clinical implementation:

Current barriers:

  • Data quality issues across institutions
  • Equity concerns (training data predominantly from academic centers)
  • Lack of regulatory frameworks and safety standards
  • Limited external validation studies

Recommendations:

  • Prospective validation required before clinical deployment
  • Performance reporting stratified by demographics
  • Clear delineation of AI limitations and failure modes
  • Human oversight maintained for diagnostic and treatment decisions

IBM Watson for Oncology: The Cautionary Tale

(Covered extensively in Chapter 1, summarized here)

What happened: - Unsafe treatment recommendations - Training on synthetic cases, not real-world evidence - Geographic inappropriateness - Oncologists lost trust

Lessons: - Precision oncology requires deep expertise, not just pattern matching - Black-box recommendations unacceptable for high-stakes decisions - Marketing does not equal clinical validation - Financial incentives can override evidence

Why oncologists must be skeptical: - Cancer treatment decisions involve tradeoffs (efficacy vs. toxicity, survival vs. QOL) - Guidelines provide frameworks, not algorithms - Patient preferences central - AI cannot replace nuanced judgment


Equity and Bias in Oncology AI

Cancer Disparities and Algorithmic Bias

Documented Disparities in Cancer Outcomes:

  • Black patients have higher cancer mortality across most cancer types despite similar incidence
  • Hispanic patients less likely to receive guideline-concordant care
  • Rural patients face access barriers to specialized oncology care
  • Low-income patients experience financial toxicity limiting treatment adherence

How AI Can Worsen Disparities:

Training Data Bias: - Most cancer datasets from academic medical centers (affluent, insured patients) - Genomic databases overrepresent European ancestry - Imaging AI trained on specific scanner types and protocols

Examples: - Breast cancer screening AI trained predominantly on white women may have lower sensitivity in Black women - Genomic classifiers may misclassify variants in underrepresented populations - Treatment recommendation AI trained on insured patients may not account for financial toxicity concerns

Mitigation: - Require diverse training datasets - Validate across demographic subgroups - Report performance stratified by race, ethnicity, SES - Address root causes of disparities (access, bias, social determinants)


Implementation Guidelines for Oncology AI

ASCO Principles for AI in Oncology

Before Adopting Oncology AI:

  1. Demand high-quality evidence:
    • Prospective validation studies
    • External validation in diverse populations
    • Clinical outcomes (not just prediction accuracy)
  2. Ensure transparency:
    • Explainable AI (especially for treatment decisions)
    • Clear description of training data
    • Known failure modes disclosed
  3. Maintain human oversight:
    • AI assists, never replaces, oncologist judgment
    • Multidisciplinary tumor board review remains standard
  4. Assess equity:
    • Performance in underrepresented populations
    • Access considerations (cost, technology requirements)
  5. Consider patient preferences:
    • Some patients prefer human-only decision-making
    • Informed consent when AI significantly influences care

Safe Implementation:

  • Pilot testing in low-stakes applications first
  • Parallel validation (AI + standard approach)
  • Clear escalation pathways for AI-human disagreement
  • Systematic monitoring for bias and errors
  • Patient feedback mechanisms

Red Flags:

  • Claims of autonomous treatment decision-making
  • No validation in diverse populations
  • Black-box recommendations without rationale
  • Vendor resistance to independent evaluation
  • Replacing rather than augmenting tumor boards

Conclusion

AI in oncology holds immense promise, from earlier cancer detection to personalized treatment selection to accelerating drug discovery. But IBM Watson’s failure demonstrates the perils of premature deployment. Oncologists must demand rigorous evidence, transparent algorithms, and proof of clinical benefit before integrating AI into high-stakes cancer care decisions.

The goal is not just better predictions, but better outcomes for patients, especially those from communities bearing disproportionate cancer burdens.

Check Your Understanding

Scenario 1: Genomic Classifier Misinterpretation and Overtreatment

You’re a medical oncologist at a comprehensive cancer center. Your practice routinely uses the Oncotype DX Breast Recurrence Score® for treatment decisions in hormone receptor-positive, HER2-negative breast cancer.

Patient: 52-year-old woman with newly diagnosed breast cancer - Pathology: T1c (1.8 cm), N0 (0/3 sentinel nodes), ER+ 95%, PR+ 80%, HER2-, Ki-67 15% - Stage: IA (T1cN0M0) - Performance status: Excellent, no comorbidities

Oncotype DX ordered: Tumor tissue sent for 21-gene expression assay

Result: Recurrence Score = 26 (intermediate risk) - Lab interpretation: “Intermediate risk. Consider chemotherapy benefit uncertain. May consider further testing or clinical trial.”

TAILORx trial reference (you review evidence): - Recurrence Score 21-25: Chemotherapy benefit for women ≤50 years old - Recurrence Score 21-25: No chemotherapy benefit for women >50 years old - Patient is 52 → Falls into “no chemotherapy benefit” group

Your initial recommendation: Endocrine therapy alone (tamoxifen or AI for 5-10 years). No chemotherapy benefit demonstrated in TAILORx for her age and Recurrence Score.

Patient responds: “I want to do everything possible. My neighbor had breast cancer and got chemo. Why aren’t you recommending it for me?”

You explain TAILORx findings: For women over 50 with Recurrence Score 21-25, chemotherapy did not improve survival compared to endocrine therapy alone.

Patient: “But the test said ‘intermediate risk’ and ‘consider chemotherapy.’ The lab wouldn’t say that if I didn’t need it.”

Patient sees second opinion oncologist (at outside institution):

Second oncologist’s interpretation: “Your Recurrence Score is 26, which is intermediate-high. I recommend chemotherapy followed by endocrine therapy to maximize cure.”

Patient returns to you confused: “Dr. Johnson says I should get chemo. Why are you saying I don’t need it?”

You review: Second oncologist appears to have misinterpreted or not applied TAILORx data. Recurrence Score 26 in a 52-year-old does not benefit from chemotherapy per level 1 evidence.

Patient chooses second opinion oncologist’s recommendation: Receives TC chemotherapy (docetaxel + cyclophosphamide) × 4 cycles

Chemotherapy course: - Cycle 1: Febrile neutropenia, hospitalized 5 days, IV antibiotics - Cycle 2: Dose-reduced, severe fatigue, neuropathy developing - Cycle 3: Further dose reduction, persistent neuropathy - Cycle 4: Completed, but patient develops grade 2 peripheral neuropathy (permanent)

Patient outcome: - Completes endocrine therapy - 5-year follow-up: No recurrence (excellent prognosis, as expected) - Permanent peripheral neuropathy affecting hands and feet (difficulty with fine motor tasks) - Lasting anxiety and financial toxicity from chemotherapy ($150,000 in costs, $30,000 out-of-pocket)

Patient later learns (from support group discussion) that chemotherapy may not have been necessary for her specific situation.

Patient files complaint with state medical board against second opinion oncologist for recommending unnecessary chemotherapy.

Question 1: What went wrong in this case?

Misinterpretation of genomic classifier results and failure to apply high-quality clinical trial evidence.

Root causes:

1. Misunderstanding of Oncotype DX Recurrence Score interpretation - Recurrence Score is not treatment recommendation but prognostic information - TAILORx trial provides treatment guidance: Score + age determines chemotherapy benefit - Second oncologist either: - Did not know TAILORx results - Misapplied them (patient age 52, not ≤50) - Ignored them in favor of “reflexive” chemo for intermediate scores

2. “Intermediate risk” label confusion - Lab report language (“consider chemotherapy, benefit uncertain”) ambiguous - Patient interpreted “consider chemotherapy” as recommendation - Second oncologist may have anchored on “intermediate risk” without applying TAILORx

3. Communication failure - First oncologist (you) correctly applied evidence but patient sought second opinion - Second oncologist contradicted first opinion without clear discussion of evidence differences - Patient caught in conflicting recommendations without framework to evaluate

4. Cognitive biases (second oncologist) - Availability bias: “My patients with intermediate scores get chemo” - Omission bias: Fear of not treating > fear of overtreating - Anchoring: “Recurrence Score 26 = chemo” without nuance

Question 2: Is the second oncologist liable for recommending unnecessary chemotherapy?

Legal analysis:

Standard of care for breast cancer treatment decisions: - Treatment should be guided by best available evidence - Major clinical trials (TAILORx) establish standards for chemotherapy benefit - Shared decision-making required - Informed consent must include risks and benefits

Plaintiff’s argument:

  • “Dr. Johnson recommended chemotherapy that provided no survival benefit”
  • “TAILORx trial clearly showed no benefit for my age and Recurrence Score”
  • “I suffered permanent neuropathy, hospitalization, and financial toxicity from unnecessary treatment”
  • “Dr. Johnson either didn’t know the evidence or ignored it”
  • Damages: Permanent neuropathy, chemotherapy complications, financial toxicity ($30,000 out-of-pocket), pain and suffering

Defense arguments:

1. Treatment was within standard of care: - Oncotype DX interpretation has some subjectivity - Recurrence Score 26 is near threshold (25 vs. 26) - Some oncologists offer chemotherapy for scores 26-30 based on patient anxiety - Patient autonomy: She wanted aggressive treatment

2. Informed consent obtained: - Patient was counseled about chemotherapy risks - She chose to proceed - Complications (febrile neutropenia, neuropathy) are known risks, not negligence

3. TAILORx not universally applied: - Some oncologists still offer chemo for intermediate scores - Practice variation exists - No absolute contraindication to chemotherapy

Plaintiff’s rebuttal:

TAILORx is level 1 evidence (randomized controlled trial, >10,000 patients): - Specifically addressed this question: Chemotherapy benefit for Recurrence Score 11-25? - Answer: No benefit for women >50 years old - Patient was 52 → Clear evidence of no benefit

Standard of care requires applying best evidence: - Recommending treatment with proven lack of benefit is below standard - Analogy: Prescribing antibiotic for viral URI despite evidence it doesn’t help

Informed consent was inadequate: - If Dr. Johnson had said, “A large trial showed chemotherapy provides no survival benefit for women your age with this score, but I recommend it anyway,” would patient have proceeded? - Likely not. Consent requires explaining lack of evidence for benefit

Guideline support: - NCCN guidelines incorporate TAILORx findings - Chemotherapy “may be considered” for scores 26-30 in some guidelines, but evidence of benefit lacking for age >50

Likely outcome:

Moderate-to-high liability risk:

  • Strong plaintiff case: Clear evidence (TAILORx) of no benefit, patient suffered permanent harm
  • Sympathetic plaintiff: Unnecessary chemotherapy with permanent neuropathy
  • Expert testimony critical: Oncology experts would likely testify that TAILORx evidence should have been applied

Potential outcomes: - Settlement likely ($100,000-$500,000 range). Permanent neuropathy, unnecessary treatment well-documented - Trial: Plaintiff-favorable if expert consensus is that TAILORx standard of care - Medical board: May find failure to apply evidence constitutes substandard care

Mitigating factors for defense: - If informed consent documentation shows patient explicitly chose chemotherapy despite being told “trial showed no benefit for your age” - If second oncologist documented rationale (e.g., “patient’s high anxiety, preferences for aggressive treatment”)

Key issue: Was patient told “evidence shows no benefit” or just “chemotherapy is an option”?

Question 3: How should genomic classifiers be used to avoid overtreatment?

Best practices for Oncotype DX and similar assays:

1. Understand What Genomic Classifiers Provide

Oncotype DX Recurrence Score: - Prognostic: Estimates recurrence risk with endocrine therapy alone - Predictive (for chemotherapy benefit): Requires integration with clinical trial data (TAILORx)

NOT a treatment decision in isolation. Must apply evidence:

Recurrence Score Age ≤50 Age >50
0-10 (Low) Endocrine alone Endocrine alone
11-25 (Intermediate) Chemo benefit YES Chemo benefit NO
26-100 (High) Chemo recommended Chemo recommended

2. Shared Decision-Making Framework

When discussing intermediate Recurrence Scores (11-25) with patients:

For patients >50 years old:

“Your Recurrence Score is [X], which is intermediate risk. A large clinical trial called TAILORx studied over 10,000 women with breast cancer like yours. They found that for women over 50 with intermediate scores, adding chemotherapy to hormonal therapy did NOT improve survival compared to hormonal therapy alone.

My recommendation: Hormonal therapy alone (e.g., tamoxifen or aromatase inhibitor for 5-10 years).

Chemotherapy would: - NOT improve your survival based on trial evidence - Cause side effects (fatigue, nausea, hair loss, neuropathy risk, infection risk) - Cost $100,000-$150,000 - Take 3-4 months

Do you have questions about why I’m recommending against chemotherapy?“

For patients ≤50 years old with scores 16-25:

“Your Recurrence Score is [X]. The TAILORx trial showed that for women 50 and younger with intermediate scores, chemotherapy provided a small survival benefit, about 2-3% absolute reduction in recurrence risk at 9 years.

We should discuss: - Your personal risk tolerance - The small benefit (97% do fine with hormonal therapy alone, chemo helps 2-3% avoid recurrence) - Side effects of chemotherapy - Your preferences about treatment intensity

This is a close call where either hormonal therapy alone or chemotherapy + hormonal therapy is reasonable.”

3. Combat Cognitive Biases

Oncologist biases to recognize:

Omission bias: “I’d rather overtreat than undertreat” - Counter: Overtreatment causes real harm (neuropathy, infections, financial toxicity) - Chemotherapy without proven benefit is harm, not help

Availability bias: “I’ve always given chemo for intermediate scores” - Counter: TAILORx published 2018. Update practice based on evidence

Patient pressure: “Doctor, I want to do everything” - Counter: “Everything that HELPS, yes. Chemotherapy that doesn’t help is NOT doing everything. It’s exposing you to harm without benefit.”

4. Communicate Uncertainty Appropriately

When evidence is clear (TAILORx for age >50, score 11-25): - Do NOT say: “Chemotherapy is an option we could consider” - DO say: “The evidence shows chemotherapy does not improve survival for women your age with this score. I do not recommend it.”

When evidence is less clear (e.g., score 26-30, age >50): - DO say: “Your score is 26, which is just above the range where trials showed no chemotherapy benefit. We don’t have definitive evidence for scores 26-30 in women over 50. Some oncologists offer chemotherapy, others do not. Here are the considerations…”

5. Laboratory Reporting Improvements

Better Oncotype DX report format (include age and TAILORx guidance):

Recurrence Score: 26 (Intermediate Risk)

TREATMENT GUIDANCE (based on TAILORx trial):
- Patient age: 52 years old
- For women >50 with Recurrence Score 11-25: No chemotherapy benefit demonstrated
- For Recurrence Score 26-30: Limited data; chemotherapy benefit unclear for age >50

RECOMMENDATION: Discuss with oncologist. Endocrine therapy strongly recommended.
Chemotherapy benefit uncertain for this specific age/score combination.

6. Tumor Board and Second Opinion Processes

When second opinions diverge:

  • Tumor board discussion to reconcile evidence interpretation
  • Clear documentation of rationale for recommendations
  • Patient provided with summary: “Dr. A recommends X because [evidence Y]. Dr. B recommends Z because [rationale W]. Here’s how they differ…”

7. Audit and Accountability

Institutional review: - Track Oncotype DX scores vs. chemotherapy administration - Flag cases where chemotherapy given despite TAILORx indicating no benefit - Peer review for appropriateness - Feedback to oncologists

Example audit trigger: - Patient age >50 + Recurrence Score 11-25 + chemotherapy given → Requires documented rationale

8. Patient Decision Aids

Provide visual tools:

Your Situation:
- Age: 52
- Recurrence Score: 24

Without any treatment: 15% chance of recurrence in 9 years
With hormonal therapy alone: 7% chance of recurrence (85% effective)
With hormonal therapy + chemotherapy: 7% chance of recurrence (NO ADDITIONAL BENEFIT)

Chemotherapy WILL cause:
- Hair loss (100%)
- Fatigue (90%)
- Nausea (60%)
- Infection risk (20%)
- Permanent neuropathy (5-10%)
- Financial cost ($30,000 out-of-pocket)

Benefits of chemotherapy for you: NONE (based on TAILORx trial)

Lesson: Genomic classifiers like Oncotype DX provide prognostic information, but treatment decisions require integrating this data with high-quality clinical trial evidence (e.g., TAILORx). Oncologists must apply evidence-based guidelines, resist cognitive biases favoring overtreatment, and communicate clearly about proven vs. uncertain benefits. Recommending chemotherapy when evidence shows no benefit exposes patients to harm without benefit and may constitute substandard care.

Scenario 2: Lung Nodule AI False Negative and Delayed Diagnosis

You’re a radiologist at a community hospital that recently implemented an FDA-cleared AI system (Lunit INSIGHT CXR) for chest X-ray interpretation and lung nodule detection.

System description: - Analyzes chest X-rays for lung nodules, infiltrates, pneumothorax, other findings - Flags abnormalities for radiologist review - Deployed as “concurrent reader” (AI analysis available while you read)

Your experience: 6 months deployed, generally helpful for detecting subtle findings, occasional false positives (you override)

Case presentation: 67-year-old man - Indication: Routine chest X-ray before elective hip replacement surgery - History: 40 pack-year smoking history (quit 5 years ago), no respiratory symptoms - Patient: Feels well, no cough, no weight loss, no hemoptysis

Chest X-ray performed: PA and lateral views

AI analysis: “No significant abnormality detected. Low suspicion for nodule. No urgent findings.”

Your interpretation: - Lungs clear - No infiltrate, no effusion - Heart size normal - Impression: “No acute cardiopulmonary process”

You sign report without further workup recommendations

6 months later: Patient develops persistent cough

Returns for chest X-ray:

Radiologist review (different radiologist): - Right upper lobe mass, 4.5 cm - Comparison to prior 6 months ago: “In retrospect, subtle 1.2 cm nodule visible in right apex on prior X-ray, now grown to 4.5 cm”

CT chest ordered: - Right upper lobe mass, 4.5 cm, spiculated - Mediastinal lymphadenopathy - No distant metastases

Biopsy: Adenocarcinoma of lung

Staging: Stage IIIA (T3N2M0) - locally advanced, not surgical candidate

Treatment: Concurrent chemoradiation (definitive intent, not curative)

Retrospective review of original chest X-ray:

Radiologist panel reviews (3 thoracic radiologists independently): - All 3 identify subtle 1.2 cm nodule in right apex on original X-ray - Consensus: “Nodule was visible but subtle. Missed by both AI and original radiologist. Would recommend CT follow-up for nodule in smoker.”

Patient outcome: - Completes chemoradiation - 18-month follow-up: Local progression, develops brain metastases - Stage IV disease, palliative intent treatment - Prognosis: 12-18 months median survival

If nodule detected at 1.2 cm 6 months earlier: - Likely Stage IA (T1bN0M0) - Surgical resection (lobectomy) - 5-year survival: 70-80% (vs. <15% with Stage IV)

Patient files malpractice lawsuit against you and hospital:

Allegations: - Missed lung nodule on chest X-ray - Failure to recommend follow-up imaging (CT) for smoker with nodule - AI system failure contributed to miss - 6-month delay resulted in progression from curable (Stage I) to incurable (Stage III→IV) disease

Question 1: What went wrong: AI false negative, radiologist error, or both?

Root causes:

1. AI false negative - Lunit INSIGHT CXR failed to detect 1.2 cm nodule in right apex - Retrospective review: 3/3 expert radiologists identified nodule → nodule was visible - AI limitation: Apical nodules challenging (rib overlap, clavicle overlap, lower contrast)

Why did AI miss it? - Training data may have underrepresented subtle apical nodules - Algorithm threshold set for higher specificity (reduce false positives) at cost of sensitivity - 1.2 cm nodule near detection limit for chest X-ray AI

2. Radiologist error (independent of AI) - You also missed the nodule on independent read - Common miss: Apical nodules are “satisfaction of search” blind spot - No AI alert may have contributed to complacency: “AI says no nodule, must be fine”

3. Automation bias - Cognitive bias: Trusting AI negative result without independent thorough search - “AI didn’t flag anything → I can read quickly” - Parallel reading (AI + radiologist simultaneous) may paradoxically reduce sensitivity if radiologist defers to AI

4. System implementation failure - Was AI sensitivity/specificity validated on apical lung nodules specifically? - Was there training for radiologists on known AI limitations? - Was there protocol for high-risk patients (smokers) to have lower threshold for CT?

Question 2: Who is liable: radiologist, hospital, AI vendor?

Legal analysis:

Standard of care for chest X-ray interpretation: - Radiologists must systematically review all lung zones including apices - Incidental nodules in smokers should prompt recommendation for CT follow-up per Fleischner Society guidelines - AI is adjunct, not replacement for radiologist interpretation

Plaintiff’s argument:

Against radiologist (you): - “Nodule was visible. Three experts identified it retrospectively” - “Dr. Smith missed the nodule despite being a trained radiologist” - “Failure to detect nodule and recommend CT follow-up in high-risk smoker” - “6-month delay caused progression from Stage I (80% 5-year survival) to Stage IV (15% 5-year survival)”

Against hospital: - “Hospital implemented AI system that FAILED, gave false reassurance” - “AI said ‘no abnormality’ which contributed to radiologist missing nodule” - “Hospital did not adequately validate AI system for apical nodules” - “Hospital did not train radiologists about AI limitations” - Vicarious liability: Radiologist is hospital employee

Against AI vendor: - “AI system marketed as ‘lung nodule detection’ but missed 1.2 cm nodule” - “FDA clearance does not mean infallible” - “Vendor did not disclose false negative rate for apical nodules”

Defense arguments:

Radiologist defense:

1. Difficult case (not negligence): - Apical nodules are subtle, commonly missed - Even expert radiologists miss 20-30% of lung nodules on chest X-ray - Miss rate acceptable: Not every miss is malpractice

2. AI false negative contributed: - Radiologist relied on AI (as intended by hospital protocol) - AI failure created false reassurance

3. Causation uncertain: - Cannot prove 6-month delay changed outcome - Tumor biology (growth rate) may indicate aggressive cancer that would have metastasized anyway

Hospital defense:

1. AI system was FDA-cleared: - Hospital conducted reasonable due diligence - AI vendor claimed high sensitivity for nodule detection - Reasonable to implement

2. AI is adjunct, not replacement: - Radiologist has ultimate responsibility - Hospital trained radiologists that AI is supplemental

3. Standard of care met: - Concurrent reading (AI + radiologist) is standard practice - No protocol violation

AI vendor defense:

1. FDA clearance demonstrates safety and effectiveness: - AI system met FDA performance thresholds - No AI is 100% sensitive

2. Radiologist missed nodule independently: - AI false negative AND radiologist false negative - Radiologist has ultimate responsibility - AI is decision support, not diagnostic device

3. Learned intermediary doctrine: - Radiologist is trained professional who interprets AI output - Vendor not liable for radiologist’s clinical decisions

Plaintiff’s rebuttal:

Against “difficult case” defense: - 3/3 experts identified nodule on retrospective review → nodule was detectable - High-risk patient (smoker) → lower threshold for recommending CT

Against “AI contributed” defense: - Standard of care is systematic independent review by radiologist - Cannot blame AI for own failure to detect visible nodule - Automation bias is radiologist’s cognitive error

Against “causation uncertain” defense: - Staging difference: IA vs. IIIA/IV is dramatic - Survival difference: 80% vs. <15% five-year survival - Growth over 6 months documented: 1.2 cm → 4.5 cm = rapid growth - Jury will likely find delay caused worse outcome

Likely outcome:

High liability risk for radiologist and hospital:

  • Sympathetic plaintiff: Missed cancer, now Stage IV, poor prognosis
  • Visible nodule: Experts identified it → not “impossible” case
  • High-risk patient: Smoker with nodule → should have recommended CT per Fleischner
  • Causation strong: Staging difference and survival data compelling

Settlement very likely: $500,000-$2,000,000 range - Radiologist/hospital likely primary defendants - AI vendor may contribute to settlement but “learned intermediary” doctrine provides significant protection

Comparative fault: - Radiologist: 60-70% (missed nodule independently) - AI vendor: 30-40% (false negative contributed)

Question 3: How should lung nodule AI be implemented to avoid false negatives and automation bias?

Best practices for chest X-ray AI deployment:

1. Understand AI Limitations Before Deployment

Performance validation (REQUIRED):

Ask vendor: - “What is sensitivity for nodules by location?” (apical vs. peripheral vs. central) - “What is sensitivity for nodules by size?” (≤5mm, 6-10mm, 11-20mm, >20mm) - “What is false negative rate in high-risk population (smokers)?” - “Provide external validation data (not just internal development set)”

Site-specific validation: - Run AI on historical cases with known nodules - Calculate sensitivity/specificity on YOUR scanner, YOUR population - Identify failure modes: What types of nodules does AI miss?

Example findings: - Sensitivity 85% for peripheral nodules, only 60% for apical nodules - Sensitivity 90% for nodules >10mm, only 70% for nodules 6-10mm

If sensitivity unacceptable for high-risk population: Do not deploy, or adjust protocol

2. Training Radiologists on AI Limitations

Mandatory education:

  • AI is adjunct, not replacement
  • Known failure modes (apical nodules, small nodules, ground-glass opacities)
  • Automation bias risk: “AI says negative → I read carelessly”
  • Systematic search pattern INDEPENDENT of AI
  • High-risk patients (smokers) require extra scrutiny regardless of AI

Key message: “Read every X-ray as if the AI doesn’t exist. Then use AI as second opinion, not first opinion.”

3. Workflow Design: Independent Read THEN AI Review

Problem with concurrent reading: Radiologist sees AI output while reading → automation bias

Better workflow:

STEP 1: Radiologist reads X-ray independently (AI output hidden)
↓
STEP 2: Radiologist documents preliminary interpretation
↓
STEP 3: AI output revealed
↓
STEP 4: If AI finds something radiologist missed → re-review
↓
STEP 5: Final interpretation (integrate AI + radiologist findings)

Benefit: Reduces automation bias by forcing independent interpretation first

4. Clinical Decision Support for High-Risk Patients

EHR integration:

When chest X-ray ordered for patient with smoking history: - Alert radiologist: “Patient is high-risk (smoker). Consider CT if any suspicious finding.” - Pre-populate recommendation template: “If nodule detected, recommend CT chest per Fleischner guidelines”

Fleischner Society guidelines (embedded in AI workflow):

Nodule size Smoker Recommendation
<6mm Yes Optional CT at 12 months
6-8mm Yes CT at 6-12 months, then 18-24 months
>8mm Yes CT at 3 months, consider PET-CT, biopsy

AI should auto-generate recommendation if nodule detected: “1.2 cm nodule detected in smoker → recommend CT chest in 3 months per Fleischner”

5. Second Read Protocol for High-Risk Cases

Double-read policy: - All chest X-rays in smokers reviewed by TWO radiologists or AI + radiologist with mandatory documentation - Reduces miss rate by 10-20%

AI as second reader: - If AI flags nodule, radiologist MUST review and document agreement or disagreement - If radiologist disagrees with AI finding, must document reason

6. Audit and Feedback

Systematic review: - Track chest X-rays with subsequent CT showing nodule - Retrospective review: Was nodule visible on X-ray? - If yes → miss event → root cause analysis

Radiologist-specific feedback: - “You missed 3 apical nodules in past 6 months. Review these cases for learning” - Targeted education on personal blind spots

AI performance monitoring: - Track false negatives (nodule on subsequent CT, not flagged by AI) - If false negative rate >15% → re-evaluate AI system

7. Informed Consent for AI-Assisted Imaging

Should patients be told AI is used?

Arguments for disclosure: - Transparency - Some patients may want to know - Medicolegal protection (“We use AI to improve detection”)

Arguments against: - May confuse patients - AI is internal radiologist tool, not separate test

Compromise: General disclosure without detail - Hospital website: “We use advanced computer analysis to assist radiologists in detecting abnormalities” - Radiology report footer: “AI-assisted interpretation”

8. Vendor Accountability

Questions for lung nodule AI vendors:

  1. “What is false negative rate for apical lung nodules specifically?”
  2. “Provide data on nodules missed by AI that were visible to radiologists”
  3. “What post-market surveillance do you conduct to track false negatives?”
  4. “What is your liability if AI false negative contributes to delayed diagnosis?”
  5. “Will you indemnify radiologists for cases where AI false negative was contributing factor?”

RED FLAGS: - Vendor cannot provide location-specific or size-specific sensitivity data - Vendor claims “95% sensitivity” without subgroup analysis - Vendor resists post-market surveillance or false negative tracking - No mechanism for reporting AI failures

Lesson: Lung nodule detection AI improves sensitivity for many findings, but false negatives occur, especially for apical nodules and subtle lesions. Radiologists must maintain independent systematic search patterns, avoid automation bias, and understand AI limitations. Workflow should enforce independent reads before revealing AI output. High-risk patients (smokers) require extra scrutiny and adherence to Fleischner guidelines regardless of AI findings. False negatives resulting in delayed cancer diagnosis carry significant malpractice risk for radiologists and hospitals.

Scenario 3: Liquid Biopsy MRD-Directed Treatment and Overtreatment

You’re a medical oncologist specializing in gastrointestinal cancers. Your institution recently began offering circulating tumor DNA (ctDNA) testing for minimal residual disease (MRD) detection in post-operative colorectal cancer patients.

Test: Guardant Reveal™ (tumor-informed ctDNA assay for MRD) - Analyzes patient-specific tumor mutations - Detects residual ctDNA after curative-intent surgery - Predicts recurrence risk

Patient: 58-year-old man with colon cancer - Stage: IIA (T3N0M0) - tumor invaded through muscularis propria, no lymph node involvement - Surgery: Right hemicolectomy, R0 resection (clear margins) - Pathology: Moderately differentiated adenocarcinoma, 0/18 lymph nodes positive, no high-risk features - Microsatellite status: Microsatellite stable (MSS)

Standard treatment for Stage IIA colon cancer (per NCCN guidelines): - Surgery alone (observation) - Adjuvant chemotherapy NOT recommended for average-risk Stage II - Chemotherapy benefit: 3-5% absolute improvement in survival - Toxicity: Neuropathy, diarrhea, hand-foot syndrome - Risk-benefit generally unfavorable unless high-risk features

Your initial plan: Surveillance (no chemotherapy)

Patient enrolled in institutional ctDNA study: Guardant Reveal testing at 4 weeks post-op

ctDNA result at 4 weeks post-op: POSITIVE (MRD detected) - 2 tumor-specific mutations detected in plasma at low levels - Interpretation: “MRD detected. High risk of recurrence.”

You review CIRCULATE-Japan trial data (Kotani et al., 2023): - ctDNA-positive Stage II/III colon cancer patients: 80% recurrence rate at 3 years - ctDNA-negative patients: 10% recurrence rate at 3 years - ctDNA very strong prognostic biomarker

Your assessment: Patient is MRD-positive → 80% chance of recurrence. Standard surveillance means recurrence will be detected when metastatic (incurable).

You discuss with patient:

“Your blood test shows very small amounts of tumor DNA still circulating, which means you’re at high risk of recurrence, about 80% chance of cancer coming back within 3 years. Normally we don’t recommend chemotherapy for Stage II colon cancer, but in your case, I think we should consider adjuvant chemotherapy to try to eliminate residual cancer cells.”

Patient: “Will chemotherapy prevent recurrence?”

You: “We don’t have definitive proof that chemotherapy helps ctDNA-positive patients, but it makes biological sense. The alternative is watching and waiting for cancer to come back, which I don’t recommend.”

Patient: “If there’s an 80% chance of recurrence, I want to do everything possible. Let’s do chemo.”

Treatment: FOLFOX chemotherapy (oxaliplatin + 5-FU + leucovorin) × 6 months (12 cycles)

Chemotherapy course: - Cycle 3: Oxaliplatin dose reduction for developing peripheral neuropathy - Cycle 6: Grade 2 neuropathy, cold sensitivity severe - Cycle 9: Grade 3 neuropathy (difficulty buttoning shirts, dropping objects), oxaliplatin discontinued - Completes 5-FU/leucovorin only for cycles 10-12

Post-chemotherapy: - ctDNA testing at 3 months post-chemo: NEGATIVE (MRD cleared) - Surveillance imaging: No evidence of disease

2-year follow-up: No recurrence, disease-free

3-year follow-up: Still no recurrence

Patient outcome: - Cancer-free (excellent) - Permanent grade 2 peripheral neuropathy (affects hands and feet, permanent disability) - Financial toxicity: $180,000 chemotherapy cost, $40,000 out-of-pocket - Quality of life: Neuropathy limits hobbies (guitar playing, woodworking), early retirement due to disability

Patient reflects: - Joined online support group for colon cancer survivors - Learns from group that no randomized trials prove ctDNA-directed chemotherapy improves survival - Learns that 80% recurrence rate is with observation only, not known if chemotherapy changes this - Wonders: “Did I really need chemotherapy, or would I have been one of the 20% who never recurred?”

Patient feels uncertain about whether he was helped or harmed by MRD-directed treatment.

Question 1: Was MRD-directed chemotherapy appropriate for this patient?

Evidence review:

What we know (prognostic value of ctDNA/MRD): - ctDNA positivity strongly predicts recurrence (validated in multiple studies) - ctDNA-positive patients: 60-80% recurrence rate - ctDNA-negative patients: 5-15% recurrence rate

What we DON’T know (predictive value for treatment benefit): - Does chemotherapy in ctDNA-positive patients reduce recurrence? UNKNOWN - Does chemotherapy change 80% recurrence risk to lower risk? UNKNOWN - Do ctDNA-positive patients benefit MORE from chemotherapy than ctDNA-negative? UNKNOWN

Critical distinction: Prognostic ≠ Predictive

Prognostic: Tells you risk (ctDNA does this well) Predictive: Tells you if treatment will help (ctDNA has NOT been proven to do this)

Randomized trials in progress (not yet completed): - DYNAMIC (Australia): ctDNA-directed chemotherapy vs. standard care for Stage II colon cancer - COBRA (US): Similar design - Results not yet available (as of 2025)

Current evidence: Chemotherapy for Stage IIA colon cancer based on ctDNA positivity is UNPROVEN

Question 2: What should have been communicated to the patient?

Honest informed consent conversation:

“Your ctDNA test is positive, which means you’re at higher risk of recurrence, about 80% chance over 3 years if we do observation only. The question is: Will chemotherapy reduce that risk?

What we know: - You’re high-risk (ctDNA tells us that)

What we DON’T know: - Whether chemotherapy will help you - Whether chemo reduces your 80% risk to, say, 30%, or whether it doesn’t change it at all - We have NO clinical trial data answering this question yet

Two options:

Option 1: Chemotherapy now - Potential benefit: Might reduce recurrence risk (but unproven) - Definite harms: Neuropathy (30-40% chance permanent), financial toxicity, 6 months of treatment - We’re treating based on biological reasoning, not evidence

Option 2: Surveillance with serial ctDNA - Monitor ctDNA every 3 months - If ctDNA rises or imaging shows recurrence: Treat at that time - Risk: May detect recurrence when metastatic (harder to cure) - Benefit: Avoid chemotherapy toxicity if you’re in the 20% who don’t recur

My recommendation: I cannot strongly recommend chemotherapy because we lack evidence it helps. But I understand if you prefer to treat now given high risk. This is YOUR decision, and either choice is reasonable given uncertainty.”

Key elements: - Distinguish prognostic vs. predictive - Acknowledge lack of evidence for treatment benefit - Present both options as reasonable - Emphasize shared decision-making

Question 3: How should MRD testing be used to avoid premature or harmful interventions?

Best practices for ctDNA/MRD-directed therapy:

1. Distinguish Prognostic from Predictive Biomarkers

Prognostic biomarker: - Tells you risk - Example: ctDNA positivity → 80% recurrence risk - Does NOT tell you if treatment helps

Predictive biomarker: - Tells you if treatment will benefit - Example: KRAS wild-type in CRC → anti-EGFR therapy helps - Requires clinical trial validation

ctDNA is currently PROGNOSTIC ONLY in adjuvant setting, not yet validated as predictive

2. Evidence Requirements Before MRD-Directed Treatment

Level 1 evidence needed (randomized controlled trial): - Question: Does chemotherapy in ctDNA-positive patients improve outcomes compared to observation? - Trial design: ctDNA-positive patients randomized to chemotherapy vs. observation - Endpoint: Recurrence-free survival, overall survival

Until RCT results available: ctDNA-directed chemotherapy is investigational

3. Informed Consent for MRD Testing

Pre-test counseling (REQUIRED):

“We’re offering a blood test that detects microscopic cancer DNA. This test tells us your recurrence risk but does NOT tell us if chemotherapy will help you.

If test is positive: - You’re high-risk for recurrence (60-80% chance) - We will discuss chemotherapy, BUT there’s no proof chemotherapy helps ctDNA-positive patients - You may face decision about unproven treatment

If test is negative: - You’re low-risk (5-15% chance of recurrence) - Standard surveillance recommended

Do you want this test, understanding it may create uncertainty about treatment decisions?”

Some patients may decline MRD testing to avoid treatment dilemmas

4. When MRD-Directed Treatment May Be Reasonable

Scenarios where unproven therapy is more justifiable:

High-risk disease + established chemotherapy benefit: - Stage III colon cancer (chemotherapy already standard) - ctDNA-positive after chemotherapy → consider extended or intensified treatment - Rationale: Already treating this stage; MRD guides intensity

Less justifiable: - Stage II colon cancer (no standard chemotherapy benefit) - MRD-positive → start chemotherapy based solely on MRD - Rationale: Adding unproven treatment with real toxicity

5. Alternative Strategies to Avoid Overtreatment

Surveillance-based approach:

Post-op MRD testing:

If MRD-NEGATIVE → Routine surveillance

If MRD-POSITIVE →
  Option A: Discuss chemotherapy (unproven) vs. close surveillance
  Option B: Serial ctDNA monitoring every 8-12 weeks
    - If ctDNA clears spontaneously → Continue surveillance
    - If ctDNA persists or rises → Intensify surveillance, consider treatment
    - If imaging detects recurrence → Treat at that time

Rationale: Some MRD-positive patients clear spontaneously; avoid overtreating them

6. Clinical Trial Enrollment

BEST option for MRD-positive patients: - Enroll in clinical trial (DYNAMIC, COBRA, others) - Contribute to evidence base - Avoid premature standard-of-care designation

If no trial available: - Discuss uncertainty transparently - Document shared decision-making - Consider tumor board review

7. Manage Patient Expectations

Avoid creating panic:

Poor communication: “Your test is positive. We need to start chemo immediately or cancer will come back!”

Better communication: “Your test shows higher risk, which is concerning. We need to discuss whether chemotherapy is right for you, given that we don’t yet know if it helps in this situation. Let’s review your options carefully.”

8. Financial Counseling

MRD testing is expensive: - Guardant Reveal: $3,000-$5,000 - Insurance coverage variable - Out-of-pocket costs significant

Chemotherapy based on MRD: - $100,000-$200,000 for FOLFOX × 6 months - If treatment is unproven, insurance may deny - Financial toxicity can be devastating

Discuss costs upfront: “This test costs $4,000. If positive, chemotherapy based on it may not be covered by insurance since it’s investigational.”

9. Audit and Accountability

Institutional review: - Track MRD testing → chemotherapy decisions - Peer review: Was chemotherapy appropriate given evidence? - Outcomes: Did MRD-directed chemotherapy improve outcomes vs. historical controls?

Example audit finding: - 40 MRD-positive Stage II patients - 35 received chemotherapy (88%) - Recurrence rate: 15% (vs. predicted 80% with observation) - Questions: - Did chemotherapy reduce recurrence, or would 15% have been the actual rate anyway? - Did 25 patients get unnecessary chemotherapy?

Without RCT, cannot answer these questions definitively

10. Ethical Considerations

Primum non nocere (first, do no harm): - Chemotherapy causes real harm (neuropathy, infections, financial toxicity) - Giving unproven treatment is harm without proven benefit - Burden of proof: Treatment should be proven beneficial before standard use

Equipoise: - If truly uncertain whether chemotherapy helps, randomized trial is ethical and necessary - Offering unproven chemotherapy outside trial may deprive science of answer

Patient autonomy: - Some patients prefer aggressive treatment despite uncertainty - Informed consent allows this - BUT must be truly informed (not coerced by incomplete information)

Lesson: Minimal residual disease (MRD) detection via ctDNA is a powerful prognostic biomarker but has NOT been proven to predict chemotherapy benefit in the adjuvant setting. Randomized trials are ongoing. Until results are available, MRD-directed chemotherapy is investigational and should be offered only with transparent informed consent emphasizing uncertainty, preferably within clinical trials. Treating based solely on MRD positivity risks overtreating patients who would not recur and exposing them to chemotherapy toxicity without proven benefit. Distinguishing prognostic from predictive biomarkers is essential to avoid premature interventions.


References