[Pathology and Laboratory Medicine]{.chapter-title}

doi:10.5281/zenodo.18251405

Pathology and Laboratory Medicine

Pathology is inherently visual pattern recognition: microscopic examination of cells and tissues to diagnose disease. This makes it theoretically ideal for computer vision AI, which excels at identifying visual patterns in images. From histopathology slide analysis to hematology cell classification to clinical chemistry quality control, AI promises to enhance diagnostic accuracy, reduce inter-observer variability, and improve laboratory efficiency.

Learning Objectives

After reading this chapter, you will be able to:

Evaluate AI systems for digital pathology and histopathology interpretation, including FDA-cleared prostate and breast cancer detection algorithms
Critically assess AI applications in cytopathology (Pap smears, thyroid FNA) and hematopathology (peripheral blood smears, bone marrow analysis)
Understand AI tools for clinical laboratory workflow optimization, quality control, and critical value detection
Analyze the unique regulatory landscape for laboratory AI, including FDA clearance pathways and CLIA implications
Recognize failure modes specific to pathology AI, including scanner variability, staining heterogeneity, and rare diagnosis challenges
Navigate medico-legal implications when AI assists (or misses) pathologic diagnoses
Apply evidence-based frameworks for evaluating pathology AI before clinical deployment

Chapter Summary (TL;DR)

The Clinical Context:

Pathology may be the medical specialty most transformed by AI in the next decade. The transition from glass slides to digital pathology (whole-slide imaging) has created millions of digitized specimens, rich training data for computer vision algorithms. Pathology is fundamentally pattern recognition: identifying cancer cells among normal tissue, grading tumor differentiation, quantifying biomarker expression. These are tasks where AI excels.

Unlike radiology (where imaging modalities vary) or clinical medicine (where symptoms are subjective), pathology has: - Standardized specimens: H&E staining protocols, immunohistochemistry methods - Archived training data: Decades of diagnosed cases with known outcomes - Clear ground truth: Biopsy diagnoses confirmed by clinical outcome - Quantifiable features: Cell count, nuclear-to-cytoplasmic ratio, mitotic figures

Result: Pathology AI is among the most clinically mature AI applications in medicine, with multiple FDA-cleared devices, growing evidence base, and accelerating adoption.

But challenges remain: Scanner variability, staining heterogeneity, rare diagnoses with limited training data, and medico-legal questions about liability when algorithms miss critical diagnoses.

Key Applications:

Prostate cancer detection and Gleason grading: Paige Prostate FDA-cleared 2021, 98% sensitivity for clinically significant cancer, reduces inter-pathologist grading variability
Breast cancer HER2 scoring: PathAI HER2 FDA-cleared, reduces equivocal (2+) cases by 30%, improves trastuzumab treatment decisions
Lymph node metastasis detection: Google/Verily LYNA algorithm, 99% sensitivity for breast cancer metastases, published in JAMA (2018)
Cervical cytology Pap screening: FDA-cleared systems reduce false negatives by 10-20%, improve screening sensitivity
Workflow optimization and case triage: AI flags high-risk cases for expedited pathologist review, reduces turnaround time
Peripheral blood smear analysis: Automated differential counts and blast detection, variable performance across analyzers
Bone marrow biopsy analysis: Research stage, not yet FDA-cleared, challenges with cellular heterogeneity
Rare tumor diagnosis: AI trained on common cancers performs poorly on rare subtypes (sarcomas, rare lymphomas)
Fully autonomous diagnosis without pathologist review: No FDA-cleared systems for autonomous diagnosis; all require pathologist confirmation
“One-size-fits-all” algorithms across scanner types: Most algorithms scanner-specific; generalization across vendors remains unsolved

What Actually Works:

Paige Prostate cancer detection: 98.0% sensitivity, 97.2% specificity for Gleason ≥7 cancer, FDA-cleared 2021, deployed in 50+ labs
PathAI HER2 scoring: Concordance with expert pathologists 95%, reduces equivocal cases requiring FISH confirmation by 30%
LYNA breast metastasis detection: 99% sensitivity, 91% specificity for lymph node metastases, halves pathologist reading time in validation study
BD FocalPoint cervical cytology: 10-15% improvement in detection rate for high-grade squamous intraepithelial lesions (HSIL)

What Doesn’t Work:

Scanner-agnostic algorithms: AI trained on Aperio scanners often fails on Leica or Hamamatsu; each vendor requires separate validation
Algorithms for rare diagnoses: Angiosarcoma, desmoplastic melanoma, rare lymphoma subtypes lack training data; AI confidently misdiagnoses
Stain-independent analysis: H&E algorithms fail when staining intensity varies; requires standardized protocols
Autonomous diagnosis without human review: FDA requires pathologist confirmation for all AI-assisted diagnoses; zero approved autonomous systems

Critical Insights:

Pathology AI is augmentation, not automation: All FDA-cleared pathology AI systems are “computer-assisted detection” (CADe) or “computer-assisted diagnosis” (CADx). They assist pathologists, not replace them

Scanner standardization is the hidden challenge: Algorithms validated on one scanner brand often perform poorly on others; this limits portability

Quantitative immunohistochemistry is AI’s sweet spot: HER2 scoring, PD-L1 tumor proportion score, Ki-67 proliferation index. AI reduces inter-observer variability where pathologists already struggle

Rare diagnoses are AI’s Achilles heel: Algorithms trained on 10,000 breast cancers confidently misdiagnose the angiosarcoma they’ve never seen

Workflow integration matters more than accuracy: An algorithm that reduces turnaround time by 4 hours (through case triage) may improve patient outcomes more than 2% accuracy gain

Medico-legal landscape is evolving: If AI misses a cancer, is the pathologist liable for not catching the algorithm’s error? If pathologist disagrees with AI and turns out wrong, is that negligence?

Clinical Bottom Line:

Pathology AI is the most clinically mature AI application in medicine. Embrace it, but as augmentation, not replacement.

For common diagnoses with abundant training data (prostate cancer, breast cancer metastases, cervical dysplasia), FDA-cleared algorithms match or exceed average pathologist performance and reduce inter-observer variability.

For rare diagnoses, complex cases, and situations where clinical context matters, AI struggles. An algorithm that achieves 98% accuracy on prostate biopsies will confidently misdiagnose the prostatic adenocarcinoma mimicking benign prostatic hyperplasia.

Demand evidence: - Ask vendors for FDA clearance documentation (not just submission) - Request scanner-specific validation data (your exact scanner model) - Validate locally on 500-1,000 cases before deployment - Monitor performance stratified by diagnosis, specimen type, and staining protocol

Medico-Legal Considerations:

Pathologist remains legally responsible: AI-assisted diagnoses don’t transfer liability to algorithm vendor; pathologist signs out report
Standard of care is evolving: As AI becomes widely adopted, failing to use available AI tools may become negligence (failure to use “available technology”)
Documentation requirements: Note AI assistance in pathology report; document when you disagree with algorithm
FDA clearance ≠ guaranteed performance: 510(k) clearance requires only “substantial equivalence,” not superiority to human pathologists
Informed consent unclear: Should patients be told their pathology was AI-assisted? Current practice: No, but this may change
Liability for false negatives: If AI misses metastatic focus in lymph node and pathologist concurs, who is liable? (Spoiler: The pathologist)
Liability for false positives: If AI flags benign tissue as cancer and pathologist disagrees but orders unnecessary treatment, is relying on clinical judgment adequate defense?

Essential Reading:

Pantanowitz L et al. (2020). “An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study.” Lancet Digital Health 2:e407-e416. [Paige Prostate validation]
Steiner DF et al. (2018). “Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer.” American Journal of Surgical Pathology 42:1636-1646. (Steiner et al., 2018) [LYNA breast metastasis detection; AI-assisted pathologists achieved 91% sensitivity vs. 83% unassisted for micrometastases]
Bao H et al. (2020). “Artificial intelligence-assisted cytology for detection of cervical intraepithelial neoplasia or invasive cancer: A multicenter, clinical-based, observational study.” Gynecologic Oncology 159:171-178. (Bao et al., 2020) [Cervical cytology AI validation study]
Hanna MG et al. (2020). “Validation of a digital pathology system including remote review during the COVID-19 pandemic.” Modern Pathology 33:2115-2127. (Hanna et al., 2020) [Digital pathology workflow validation; 100% major diagnostic equivalency between digital and glass]
Lin E et al. (2023). “Digital pathology and artificial intelligence as the next chapter in diagnostic hematopathology.” Seminars in Diagnostic Pathology 40:88-94. (Lin et al., 2023) [Hematopathology AI review]
Soleymanjahi S et al. (2024). “Artificial Intelligence–Assisted Colonoscopy for Polyp Detection: A Systematic Review and Meta-analysis.” Annals of Internal Medicine 177:1652-1663. (Soleymanjahi et al., 2024) [AI-assisted colonoscopy increases polyp detection; impacts upstream specimen volume for pathologists]

Introduction: Why Pathology Is AI’s Ideal Specialty

Pathology is pattern recognition. A pathologist examines a biopsy slide, identifies malignant cells among normal tissue, grades tumor differentiation, counts mitotic figures, and integrates these visual features with clinical context to render a diagnosis.

This is exactly what computer vision AI does: detect patterns in images.

Unlike clinical medicine (where symptoms are subjective and examination findings nuanced), pathology offers:

1. Digitized specimens: Whole-slide imaging scanners convert glass slides to gigapixel digital images

2. Abundant training data: Decades of archived specimens with confirmed diagnoses

3. Standardized protocols: H&E staining, immunohistochemistry methods largely consistent across labs

4. Quantifiable features: Cell count, nuclear size, mitotic index, staining intensity can be measured objectively

5. Clear ground truth: Biopsy diagnoses confirmed by clinical outcome, surgical pathology, molecular testing

The result: Pathology AI has advanced faster than almost any other medical specialty. Multiple FDA-cleared algorithms. Growing evidence base. Accelerating clinical adoption.

But pathology AI faces unique challenges:

Scanner variability: Algorithms trained on Aperio scanners may fail on Leica or Hamamatsu
Staining heterogeneity: H&E staining intensity varies across labs, confusing algorithms
Rare diagnoses: Limited training data for uncommon tumors, leading to confident misdiagnoses
Clinical context: AI sees pixels, not patient history (the chest wall mass in a 70-year-old smoker is likely different from identical-appearing mass in 25-year-old)

Part 1: Histopathology AI. Prostate and Breast Cancer

Prostate Cancer Detection and Gleason Grading

Prostate cancer diagnosis from needle core biopsies is one of the most common pathology specimens. Gleason grading (scoring architectural patterns) predicts prognosis and guides treatment. But Gleason grading has notorious inter-observer variability. Different pathologists assign different scores to the same slide.

The AI solution: Computer vision algorithms trained on tens of thousands of digitized prostate biopsies learn to detect cancer foci and assign Gleason patterns.

Paige Prostate (FDA-Cleared 2021)

FDA clearance: First AI-based cancer detection algorithm cleared by FDA (2021)

Training data: 60,000+ whole-slide prostate biopsy images from 13 institutions

Performance: - Sensitivity for Gleason ≥7 cancer: 98.0% - Specificity: 97.2% - Negative predictive value: 99.5% (very few missed cancers) - Gleason grading concordance with expert GU pathologists: 87% (comparable to inter-pathologist agreement)

Clinical validation: Pantanowitz et al. (2020) conducted blinded validation across 16 pathologists and 71 prostate biopsies (Pantanowitz et al., 2020): - Paige Prostate detected 3 clinically significant cancers (Gleason 3+4, 4+3) that 2/16 pathologists initially missed - Zero false negatives for Gleason ≥4+3 cancer (high-grade) - Reduced Gleason grading variability by 23% compared to unaided pathologists

How pathologists use it: 1. Prostate needle biopsy digitally scanned (whole-slide imaging) 2. Paige algorithm analyzes slide, highlights suspicious regions 3. Pathologist reviews flagged areas + entire slide 4. Pathologist renders final diagnosis (algorithm is advisory, not determinative)

Current deployment: 50+ pathology labs in U.S., analyzing thousands of prostate biopsies monthly

Why this works: - Common diagnosis: Prostate cancer is most common male cancer; abundant training data - Standardized specimen: Needle core biopsies processed similarly across labs - Clear patterns: Gleason architecture (glands, cribriform, solid sheets) are well-defined visual features - Addresses real problem: Inter-pathologist Gleason grading variability is well-documented clinical issue

Limitations: - Scanner-specific: Validated on Philips IntelliSite Ultra Fast scanners; performance on other scanners requires separate validation - Gleason 3+3 vs. 3+4 boundary: Algorithm struggles with borderline cases (as do pathologists) - Mimics: Atrophy, adenosis, atypical adenomatous hyperplasia can fool algorithm (and pathologists)

Breast Cancer Lymph Node Metastasis Detection

Sentinel lymph node biopsy in breast cancer requires pathologists to scan entire lymph nodes for micrometastases (sometimes <0.2mm clusters of cells). Tedious, error-prone work.

Google/Verily LYNA (Lymph Node Assistant)

Published: Steiner et al. (2018) in AJSP (Steiner et al., 2018)

Training data: 399 whole-slide images of sentinel lymph nodes

Performance: - Sensitivity for micrometastases (≥0.2mm): 99% - False positive rate: 9% (acceptable for triage/flagging tool) - Halves pathologist reading time: Pathologists using LYNA reviewed slides 50% faster with equivalent accuracy

Clinical validation: Six pathologists reviewed 130 lymph node slides without LYNA, then with LYNA: - Without LYNA: 83% sensitivity for micrometastases (17% false negative rate) - With LYNA: 99% sensitivity (1% false negative rate) - Time per slide: 61 seconds (LYNA-assisted) vs. 116 seconds (unaided)

Why this works: - Clear visual target: Metastatic breast cancer cells look different from lymphocytes - Tedious human task: Scanning entire lymph node for rare micrometastases - High clinical stakes: Missing micrometastasis changes staging and treatment

Why this didn’t deploy widely: - Not FDA-cleared: Google/Verily published research but didn’t pursue commercial FDA clearance - Scanner specificity: Algorithm performance depends on specific scanning protocol - Workflow integration challenges: Many labs haven’t adopted whole-slide imaging yet

The lesson: Impressive research results don’t guarantee clinical adoption. FDA clearance, vendor support, and workflow integration matter as much as algorithmic accuracy.

Breast Cancer HER2 Scoring

HER2 immunohistochemistry scoring (0, 1+, 2+, 3+) determines trastuzumab (Herceptin) eligibility for breast cancer patients. But scoring is subjective: - 3+ = strong complete membrane staining ≥10% of cells → Treat with trastuzumab - 2+ = equivocal → Requires FISH testing to confirm HER2 amplification - 0/1+ = negative → No trastuzumab

The problem: Inter-observer variability. Different pathologists score the same slide differently, especially for borderline 2+/3+ cases.

The AI solution: Quantitative image analysis measures membrane staining intensity objectively.

PathAI AIM-HER2 (Research Use Only, NOT FDA-cleared):

PathAI launched AIM-HER2 Breast Cancer in July 2023 as an AI-powered HER2 scoring algorithm. Developed using 157,000 tissue annotations and consensus scores from over 65 expert breast pathologists on 4,000+ slides.

Performance in research settings (PathAI, 2023): - Concordance with expert breast pathologists: Improved inter-rater agreement, particularly at 0/1+ and 1+/2+ cutoffs - Reduces equivocal scoring: Better distinction between borderline cases - Standardizes scoring: Decreases inter-pathologist variability

Current status: AIM-HER2 is for research use only and not cleared for diagnostic procedures. PathAI’s AISight Dx platform received FDA 510(k) clearance (K243391, June 2025) for digital pathology image management, but the HER2 scoring algorithm itself does not have diagnostic clearance.

Clinical potential (once cleared): - Faster trastuzumab decisions (no FISH delay for clear 3+ cases) - Reduced FISH testing could save $1,000-1,500 per case - More reproducible scoring for clinical trial consistency

AI-Assisted Colonoscopy and Pathology Workflow Impact

While not strictly pathology AI, AI-assisted colonoscopy systems that improve polyp detection directly impact pathology specimen volume.

Evidence from meta-analysis: A 2024 systematic review and meta-analysis of 44 RCTs (36,201 colonoscopies) found that computer-aided detection (CADe) systems significantly increase adenoma detection (Soleymanjahi et al., 2024):

Adenoma detection rate (ADR): 44.7% with CADe vs. 36.7% without
Advanced colorectal neoplasia (ACN) detection: 12.7% vs. 11.5%
Trade-off: Resection of approximately 2 additional nonneoplastic polyps per 10 colonoscopies

Implications for pathologists: - Increased specimen volume: Higher polyp detection translates to more polyp specimens for histopathologic examination - More nonneoplastic polyps: The increased detection includes both neoplastic and nonneoplastic polyps, increasing routine workload - Upstream AI affecting downstream pathology: This represents an important category where AI in one specialty creates workflow changes in pathology

Current status: Multiple FDA-cleared CADe systems for colonoscopy polyp detection are in clinical use, making this one of the most mature AI applications affecting GI pathology practice.

Part 2: Cytopathology AI

Cervical Cytology (Pap Smear) Screening

Pap smear screening for cervical dysplasia has dramatically reduced cervical cancer incidence. But cytotechnologists manually review hundreds of slides per day looking for rare dysplastic cells, tedious work prone to false negatives.

AI applications: - Automated screening: Flag slides with suspected abnormalities for cytotechnologist review - Quality control: Re-review slides initially called negative to catch false negatives - Workflow optimization: Prioritize worklist based on AI-assessed abnormality likelihood

FDA-cleared systems: - BD FocalPoint: Evolved from AutoPap 300QC (FDA-cleared 1998 for smear use, 2002 for SurePath slides). BD acquired in 2006 and renamed to BD FocalPoint Slide Profiler. The BD FocalPoint GS Imaging System received FDA PMA approval in December 2008 for location-guided cervical cancer screening. Note: This is pre-deep learning rule-based image analysis, not modern neural network AI. - Hologic ThinPrep Imaging System: Computer-assisted cytology screening

Performance: Liu et al. (2024) systematic review and meta-analysis of 77 studies on cervical cytology AI (Liu et al., 2024): - Pooled sensitivity for AI-assisted Pap smears: 95% (95% CI 91-98%) - Specificity: 94% (95% CI 89-97%) - Reduction in false negatives: 10-20% compared to unaided cytotechnologists

How cytotechnologists use it: - AI pre-screens slides, ranks by abnormality likelihood - Cytotechnologist reviews high-risk slides first - AI flags slides initially called negative for second review - Cytotechnologist renders final diagnosis

Clinical benefit: - Higher sensitivity for detecting high-grade dysplasia - Workflow efficiency (prioritizes high-risk cases) - Quality control (catches false negatives before sign-out)

Limitations: - High false positive rate: 12-15% specificity means many normal slides flagged - Doesn’t detect all lesions: Glandular abnormalities (adenocarcinoma in situ) often missed - Preparation-dependent: Algorithms validated on ThinPrep may not work on SurePath

Part 3: Hematopathology AI

Peripheral Blood Smear Analysis

Complete blood count (CBC) analyzers provide automated cell counts, but peripheral blood smear review by humans identifies: - Blast cells (leukemia diagnosis) - Abnormal red cell morphology (hemolytic anemia) - Parasites (malaria, babesiosis) - Left shift (bands, myelocytes indicating infection)

AI applications: - Automated differential counts: Classify neutrophils, lymphocytes, monocytes, eosinophils, basophils - Blast detection: Flag suspected leukemia cases - Malaria detection: Quantify parasitemia in endemic areas

Performance: Variable and analyzer-dependent. Lin et al. (2023) review found (Lin et al., 2023): - Differential count concordance with manual review: 80-95% depending on cell type - Blast detection sensitivity: 85-92% (but high false positive rate from atypical lymphocytes, immature granulocytes) - Malaria parasite detection: 90-95% sensitivity (better than human microscopists in low-parasitemia cases)

Current use: - Research stage for most applications - FDA-cleared automated differential analyzers exist but require manual confirmation for abnormal results - Malaria detection AI deployed in endemic regions (sub-Saharan Africa, Southeast Asia)

Limitations: - Morphologic subtlety: Distinguishing reactive lymphocytes from lymphoma cells requires expertise AI lacks - Rare cells: Abnormal cells (blasts, atypical lymphocytes) are infrequent; limited training data - Preparation artifacts: Smear quality (cell distribution, staining) affects algorithm performance

Bone Marrow Biopsy Analysis

Bone marrow analysis requires: - Cellularity assessment: Hypocellular, normocellular, hypercellular - Blast count: Percentage of blasts determines AML diagnosis threshold (≥20%) - Cellular morphology: Dysplasia, maturation abnormalities, infiltrative process

AI research: Multiple academic studies show promise for: - Automated blast counting (correlation r=0.90-0.95 with manual counts) - Cellularity assessment - Megakaryocyte quantification

Why this isn’t clinically deployed yet: - Heterogeneity: Bone marrow has far more cell types than peripheral blood - Context matters: Blast count interpretation requires clinical history (prior chemo, transplant status) - Rare diagnoses: Hairy cell leukemia, systemic mastocytosis, metastatic disease have limited training data - No FDA-cleared systems: All current work is research-stage

Part 4: Clinical Laboratory AI and Workflow Optimization

Quality Control and Critical Value Detection

Clinical chemistry and hematology labs already use rules-based algorithms extensively: - Delta checks: Flag results that changed dramatically from prior (possible specimen mix-up or critical illness) - Critical value alerts: Automatic notification for life-threatening results (potassium >6.5 mEq/L) - Quality control monitoring: Levey-Jennings charts, Westgard rules

AI enhancements: Machine learning improves these by learning lab-specific patterns: - Predict when QC will fail (proactive instrument maintenance) - Reduce false positive delta check alerts (by learning patient-specific patterns) - Identify systematic errors (reagent lot problems, instrument drift)

Current use: Research and proprietary implementations in large reference labs. Not yet widely adopted in community hospital labs.

Microbiology AI

Applications: - Automated colony counting: Computer vision counts bacterial colonies on agar plates - Species identification from imaging: Distinguish E. coli from Klebsiella based on colony morphology - Antibiotic resistance prediction: Predict susceptibility from genomic data (whole-genome sequencing) - Blood culture positivity prediction: Predict which blood cultures will grow bacteria (to prioritize processing)

Performance: Schinkel et al. (2023) ML model for blood culture positivity: - Predicted positive cultures with AUC 0.78 using clinical data (fever, WBC, prior cultures) - Enabled faster processing of high-risk cultures

Current use: - Automated colony counting: Commercially available, FDA-cleared systems - Genomic resistance prediction: Research stage, expensive (WGS not routine) - Blood culture prediction: Research stage

Part 5: The Scanner Standardization Challenge (The Failure No One Talks About)

Here’s a dirty secret in pathology AI: Most algorithms work well only on the scanner they were trained on.

An algorithm trained on Aperio scanners often performs poorly on Leica or Hamamatsu scanners, even for identical tissue. Why?

Technical reasons: - Color space differences: Scanners capture RGB values differently - Compression artifacts: JPEG compression algorithms vary - Focus algorithms: Autofocus creates slightly different focal planes - Illumination: LED vs. halogen light sources, color temperature differences

Clinical consequences: - Labs must validate algorithms on their specific scanner before deployment - Switching scanner vendors may require re-validating all AI algorithms - “Scanner-agnostic” algorithms don’t exist yet (despite vendor claims)

Real-world example: One academic pathology department deployed prostate cancer detection AI validated on their Aperio scanners. When they upgraded to Leica scanners (for faster scanning), the algorithm’s sensitivity dropped from 98% to 84% because color profiles differed.

They had to: 1. Halt clinical use 2. Re-validate on Leica scanners using 1,000 internal cases 3. Work with vendor to retrain algorithm 4. Get institutional IRB approval for new validation

Cost: 6 months delay + $50,000 validation cost.

The lesson: Ask vendors explicitly: “Was this algorithm validated on our exact scanner model and software version?”

Part 6: Rare Diagnosis Problem (AI’s Achilles Heel)

AI algorithms are trained on thousands of examples. Breast cancer, prostate cancer, colon adenocarcinoma, all with abundant training data.

But what about: - Angiosarcoma (rare vascular malignancy) - Desmoplastic melanoma (melanoma mimicking benign scar tissue) - Primary bone lymphoma - Hepatosplenic T-cell lymphoma

These diagnoses occur once per year in most pathology departments. Algorithms have never seen them.

What happens when AI encounters rare diagnosis?

Scenario 1: AI confidently misdiagnoses - Algorithm trained on 10,000 benign prostate biopsies and 5,000 adenocarcinomas sees its first granulomatous prostatitis - Misclassifies as adenocarcinoma with 92% confidence (granulomas have increased cellularity, architectural distortion) - If pathologist trusts algorithm, patient gets unnecessary treatment

Scenario 2: AI refuses to render opinion - Better outcome: Algorithm recognizes it’s outside training distribution - Flags case for expert review - But most current algorithms don’t have this capability (they always output a prediction)

The solution (still research-stage): - Outlier detection: Train algorithms to recognize when they’re seeing something outside training data - Confidence calibration: Algorithms should express low confidence for rare diagnoses - Human-in-the-loop: All AI-assisted diagnoses require pathologist confirmation (current FDA requirement)

Current clinical approach: - Use AI only for common diagnoses where it’s been validated - Maintain high index of suspicion for rare diagnoses - When AI prediction conflicts with clinical context, trust your judgment

Part 7: Implementation Framework

Before Adopting Pathology AI

Questions to ask vendors:

“What is the FDA clearance status?”
- 510(k) cleared? PMA approved? Investigational?
- FDA clearance ≠ guaranteed performance, but provides baseline validation
“Was this algorithm validated on our specific scanner?”
- Exact scanner model and software version
- If no, plan for local validation study
“What diagnoses has this been validated for?”
- Algorithms validated for prostate cancer don’t work for bladder cancer
- Ask for published validation studies for each diagnosis
“What is sensitivity and specificity for clinically significant disease?”
- Gleason ≥7 prostate cancer (not all cancers)
- Micrometastases ≥0.2mm (not isolated tumor cells)
- HSIL+ cervical lesions (not ASCUS)
“How does this integrate with our digital pathology workflow?”
- PACS integration? LIS integration?
- Does pathologist see AI annotations directly on slide viewer?
“What happens when the algorithm encounters a rare diagnosis?”
- Does it output low confidence? Flag for human review?
- Or does it confidently misdiagnose?
“Can we validate locally before clinical deployment?”
- 500-1,000 cases covering range of diagnoses
- Compare algorithm performance to sign-out diagnoses
- Stratify by diagnosis, specimen type, staining protocol
“What is the cost structure?”
- Per-slide fee? Annual license? Scanner-tied?
- Hidden costs: Scanner upgrade requirements, IT integration
“Who is liable if the algorithm misses a cancer?”
- Read vendor contract carefully
- Most disclaim liability; pathologist remains responsible
“Can you provide references from pathologists using this clinically?”
- Not research collaborators. Actual clinical users
- Ask about false positives, workflow disruptions, performance on their scanner

Red Flags (Walk Away If You See These)

“Scanner-agnostic” claims without validation data: All algorithms are scanner-dependent
Validated for “cancer detection” without specifying types: Each cancer requires separate validation
No published peer-reviewed validation studies: Internal white papers insufficient
Claims of “autonomous diagnosis”: FDA requires human confirmation; autonomous claims are misleading
“Works on all staining protocols”: Stain intensity variation affects all algorithms

Part 8: Cost-Benefit Reality

What Does Pathology AI Cost?

Prostate cancer detection: - Paige Prostate: ~$15-25 per slide - ROI: Reduced Gleason grading variability, fewer unnecessary biopsies

HER2 scoring: - PathAI: ~$20-30 per slide - ROI: Reduces FISH confirmatory testing (~$1,000-1,500 saved per avoided FISH)

Lymph node metastasis detection: - Not commercially available (LYNA research-only)

Cervical cytology: - BD FocalPoint: Bundled into instrument contract - ROI: Higher sensitivity for HSIL, workflow efficiency

Digital pathology infrastructure (prerequisite for most AI): - Whole-slide scanner: $150,000-500,000 depending on capacity - PACS (digital pathology image management): $50,000-200,000 - IT infrastructure: Network bandwidth, storage (1 slide = 1-5 GB) - Total infrastructure cost before any AI: $300,000-1,000,000

Do These Tools Save Money?

HER2 scoring: YES - Reduces equivocal (2+) cases by 30% - At $1,200/FISH test × 30% reduction × 500 HER2 cases/year = $180,000 saved - Algorithm cost: $20 × 500 = $10,000 - Net savings: $170,000/year

Prostate cancer detection: MAYBE - Reduces unnecessary repeat biopsies (fewer false negatives) - Reduces Gleason grading variability (more consistent treatment decisions) - Hard to quantify dollar savings, but likely cost-effective for high-volume labs

Cervical cytology: PROBABLY - 10-15% improvement in HSIL detection → earlier treatment → reduced cervical cancer incidence - Cost-effectiveness studies show modest benefit

Lymph node metastasis: WOULD IF DEPLOYED - Halves pathologist reading time → increase throughput or reduce staffing - But not commercially available, so savings theoretical

Digital pathology infrastructure: UNCLEAR - Enables remote sign-out, telepathology consultations, AI applications - Saves physical slide storage space - But upfront costs are substantial ($300K-1M) - ROI depends on lab volume and utilization

Part 9: The Future of Pathology AI

What’s Coming in the Next 5 Years

Likely to reach clinical use:

Expanded cancer detection algorithms: Colon polyp classification, bladder cancer grading, melanoma diagnosis
Molecular pathway prediction from H&E: Predict ER/PR/HER2 status from morphology (reducing immunostaining costs)
Prognostic algorithms: Predict recurrence risk from histology + clinical data (refining current scores)
Workflow automation: Automated case triage, specimen tracking, quality control

Promising but uncertain:

Scanner-agnostic algorithms: Stain normalization and domain adaptation to work across scanner vendors
Multiplex imaging AI: Analyze multi-color immunofluorescence (tumor microenvironment, immune infiltrates)
Real-time intraoperative diagnosis: AI analyzing frozen sections during surgery (replacing human pathologist)

Overhyped and unlikely:

Fully autonomous diagnosis without pathologist: FDA won’t approve it; pathologists won’t accept it; medicolegal landscape doesn’t support it
“One algorithm for all diagnoses”: Each tissue and diagnosis requires specific training
AI replacing pathologists: Augmentation yes, replacement no

The rate-limiting factors: - Scanner standardization (biggest technical challenge) - FDA regulatory pathway clarity - Medicolegal precedent (when AI-assisted diagnosis goes wrong) - Pathologist workflow integration and trust

Key Takeaways

10 Principles for Pathology AI

Pathology AI is the most mature medical AI application: Multiple FDA-cleared devices, growing evidence base, accelerating adoption
Scanner specificity is the hidden challenge: Algorithms validated on one scanner often fail on others; demand scanner-specific validation data
AI excels at quantitative tasks: HER2 scoring, Ki-67 quantification, mitotic count, all reducing inter-observer variability
Rare diagnoses are AI’s Achilles heel: Algorithms confidently misdiagnose diseases they’ve never seen; maintain clinical suspicion
Augmentation, not automation: All FDA-cleared pathology AI requires pathologist confirmation; no autonomous diagnosis
Workflow integration determines success: Algorithm accuracy matters less than smooth PACS/LIS integration
HER2 scoring AI has clearest ROI: Reduces equivocal cases requiring expensive FISH confirmation
Digital pathology infrastructure is expensive: $300K-1M upfront before any AI deployment
Demand local validation: Validate algorithms on 500-1,000 internal cases before clinical use
Medicolegal landscape is evolving: Pathologist remains liable for AI-assisted diagnoses; document appropriately

Clinical Scenario: Evaluating Prostate Cancer Detection AI

Scenario: Your Pathology Department Is Considering AI for Prostate Biopsies

The pitch: A vendor demonstrates AI that detects prostate adenocarcinoma and assigns Gleason grades. They show you: - Sensitivity 98% for Gleason ≥7 cancer - “Reduces inter-pathologist variability by 30%” - FDA 510(k) cleared - Cost: $20 per slide

Your department processes 2,000 prostate biopsies/year. The chair asks for your recommendation.

Questions to Ask:

“Was this validated on our specific scanner?”
- We use Leica Aperio GT450. Was the algorithm trained and validated on this exact model?
- If no, we need local validation before clinical deployment
“What is performance for Gleason grade groups?”
- Sensitivity/specificity for Grade Group 1 (3+3) vs. 2 (3+4) vs. 3+ (4+3, 4+4, 4+5)?
- Grade Group 2 vs. 3 distinction most clinically important (active surveillance vs. treatment)
“How does this handle mimics?”
- Atypical adenomatous hyperplasia, atrophy, high-grade PIN
- False positive rate for benign mimics?
“What is the workflow integration?”
- Integrates with our PACS (Philips IntelliSite)?
- Do pathologists see AI annotations directly on slide viewer?
- Adds extra clicks/steps to sign-out workflow?
“Can we do local validation study?”
- Test on 500 internal prostate biopsies (mix of benign, Gleason 3+3, 3+4, 4+3, ≥4+4)
- Compare algorithm grades to our sign-out diagnoses
- Stratify performance by Gleason pattern
“What is the cost-benefit?”
- 2,000 biopsies/year × $20 = $40,000/year
- Benefits: Reduced inter-observer variability, fewer missed cancers, pathologist confidence
- Quantifiable ROI unclear (not like HER2 where we save FISH costs)
“Who is liable if algorithm misses cancer?”
- Vendor contract liability clause?
- Pathologist still signs out report → pathologist liable
“How do we document AI assistance?”
- Add statement to pathology report: “Computer-assisted detection used”?
- Document when we disagree with algorithm?

Red Flags in This Scenario:

“Reduces variability by 30%” without baseline data: What was inter-observer variability before? Calculated how?

510(k) clearance doesn’t specify scanner compatibility: Need explicit validation on our scanner

No mention of performance on Grade Group 2 vs. 3 distinction: Most clinically important classification, should be reported separately

$20/slide × 2,000 = $40,000/year without clear ROI: For HER2, we save FISH costs; for prostate, savings less clear

Check Your Understanding

Scenario 1: The AI-Detected Micrometastasis

Clinical situation: You’re reviewing a sentinel lymph node from a breast cancer patient. The AI algorithm (LYNA-like system) flags a 0.3mm cluster of cells as suspicious for metastatic carcinoma. You review the flagged area. The cells are slightly enlarged with increased nuclear-to-cytoplasmic ratio, but you’re not certain they’re malignant. Could be reactive sinus histiocytosis.

Without the AI flag, you might have called this negative. With the AI flag, you’re uncertain.

Question 1: Do you sign out as “positive for micrometastasis” based on the AI flagging?

Click to reveal answer

Answer: No, not based solely on AI flagging. Get a second opinion from a breast pathology expert.

Reasoning:

Why you shouldn’t call positive based on AI alone: 1. AI has false positives: Even 91% specificity (LYNA study) = 9% false positive rate 2. Reactive histiocytes can mimic carcinoma: Enlarged cells with increased N:C ratio occur in reactive nodes 3. Clinical consequences are significant: Micrometastasis diagnosis changes staging (N0 → N1mi), may influence adjuvant therapy decisions 4. Pathologist judgment remains standard: AI is assistive, not determinative

What you should do: 1. Get expert consultation: Send digital slide to breast pathology expert for second opinion 2. Consider IHC: Cytokeratin stain (AE1/AE3 or CAM5.2) would confirm epithelial cells vs. histiocytes 3. Review clinical context: Large primary tumor, lymphovascular invasion, high grade → higher pretest probability of nodal metastasis 4. Document uncertainty: If you call it positive, note it was small focus requiring IHC confirmation or expert consultation

The AI’s value: - Flagged an area you might have missed on routine review - Prompted closer examination and consideration of IHC - But doesn’t replace pathologist judgment for equivocal findings

Bottom line: Use AI to flag suspicious areas for closer review, but don’t diagnose solely based on AI flagging when morphology is equivocal.

If IHC confirms metastatic carcinoma: AI was correct, you caught early metastasis If IHC shows histiocytes: AI was false positive, but better to overcall (with IHC confirmation) than miss metastasis

Scenario 2: The Scanner Upgrade Disaster

Clinical situation: Your department has been using prostate cancer detection AI successfully for 18 months on Aperio AT2 scanners. Sensitivity 97%, very few false negatives, pathologists trust it.

Your institution decides to upgrade to Leica Aperio GT450 scanners for faster throughput (4x scanning speed). Same vendor (Leica), just newer model.

After the scanner upgrade, you notice the AI is flagging many more false positives, with benign glands marked as suspicious. And one case where you caught a Gleason 4+3 cancer that the AI missed (it called the slide benign).

Question 2: What went wrong, and what should you do?

Click to reveal answer

Answer: Scanner upgrade broke AI algorithm calibration. Halt clinical AI use immediately and revalidate.

What went wrong:

Scanner variability problem: 1. Different color profiles: GT450 captures RGB values differently than AT2 (different camera sensors) 2. Different compression: Image compression algorithms may differ 3. Different focus: Autofocus algorithm produces slightly different focal planes 4. Algorithm wasn’t trained on GT450 images: Trained on AT2, doesn’t generalize to GT450 despite being same vendor

Why this is dangerous: - You caught one false negative (Gleason 4+3 missed) - How many did you not catch? - Patients may be getting false reassurance from benign AI calls - Medicolegal liability: If AI-missed cancer causes harm, you’re liable

What you should do immediately:

Halt AI-assisted sign-out for clinical cases:
- Revert to manual review without AI assistance
- Better to lose AI benefits than risk false negatives
Notify vendor:
- Scanner upgrade broke algorithm performance
- Request algorithm retraining/recalibration for GT450
Retrospective review of cases signed out since scanner upgrade:
- How many cases signed out with AI assistance post-upgrade?
- Re-review all with special attention to AI-negative calls
- Identify any missed diagnoses requiring patient notification
Local validation study on GT450:
- Test algorithm on 500 internal prostate biopsies scanned on GT450
- Compare to sign-out diagnoses
- Calculate sensitivity/specificity on new scanner
- Only resume clinical use if performance acceptable
Update validation protocols:
- Document that algorithm is scanner-model-specific
- Any future scanner changes require revalidation
- Consider this in scanner procurement decisions (vendor lock-in)

Lessons learned: - Scanner specificity is real: Even upgrading within same vendor can break AI - Continuous monitoring essential: Track AI performance over time, watch for degradation - Plan for scanner changes: Budget time and resources for revalidation when upgrading scanners

Prevention for next time: - Before scanner upgrade, ask vendor: “Will this require AI revalidation?” - Plan revalidation into upgrade timeline - Consider parallel scanning (old + new scanners) during transition for validation

Scenario 3: The Rare Sarcoma Misdiagnosis

Clinical situation: A 45-year-old woman has a breast mass biopsied. Core needle biopsy submitted. Your lab uses breast cancer detection AI that’s been excellent for usual ductal and lobular carcinomas.

The AI flags the case as “high-grade ductal carcinoma, 95% confidence.” You review the slide. The cells are indeed high-grade, pleomorphic, with mitotic activity. But the architecture doesn’t look like typical ductal carcinoma. It’s more spindle-cell, fascicular growth pattern.

You’re not a breast pathology expert. The AI is 95% confident. Your general pathology training says “looks malignant, high-grade.”

Question 3: Do you sign out as high-grade ductal carcinoma based on AI’s 95% confidence?

Click to reveal answer

Answer: No. This is likely angiosarcoma or other sarcoma, not carcinoma. Get expert consultation immediately.

Why the AI is wrong:

The rare diagnosis problem: 1. AI trained on ductal and lobular carcinomas: Seen thousands of typical breast cancers 2. Never seen angiosarcoma: Occurs in <1% of breast malignancies 3. Spindle-cell pattern outside training distribution: Algorithm defaults to “high-grade carcinoma” because it’s the closest thing it knows 4. High confidence is misleading: 95% confidence just means “95% similar to high-grade carcinomas in training set.” It doesn’t mean it is carcinoma

What this actually is: - Breast angiosarcoma: Rare vascular malignancy - Spindle-cell morphology: Fascicular growth pattern, not glandular architecture - Mimics carcinoma: High-grade, mitotically active - Treatment differs: Angiosarcoma requires wide excision, often radiation; different chemotherapy than carcinoma

What happens if you call it carcinoma: - Patient gets inappropriate treatment (mastectomy + sentinel node biopsy, but angiosarcoma doesn’t metastasize to axillary nodes typically) - Oncologist prescribes breast cancer chemotherapy (ineffective for angiosarcoma) - Patient harm from wrong diagnosis and wrong treatment

What you should do:

Recognize the morphology doesn’t fit:
- Spindle cells ≠ ductal carcinoma
- Fascicular pattern ≠ typical breast cancer architecture
- Trust your morphologic assessment over AI confidence score
Get expert consultation:
- Send to breast pathology expert at academic center
- Describe: “High-grade spindle-cell malignancy, AI flagged as carcinoma but morphology atypical”
Order IHC panel:
- Vascular markers (CD31, CD34, ERG) → positive in angiosarcoma
- Epithelial markers (cytokeratins, GATA3) → negative in angiosarcoma
- IHC will distinguish carcinoma from sarcoma
Document AI assistance and your clinical reasoning:
- “Computer-assisted detection system flagged as ductal carcinoma; however, spindle-cell morphology atypical for breast carcinoma. IHC and expert consultation recommended.”
Report this case to AI vendor:
- False positive (confidently misdiagnosed rare sarcoma as carcinoma)
- Vendor should add angiosarcoma cases to training set to prevent future misdiagnoses

Lessons learned: - AI confidence scores are misleading for rare diagnoses: 95% confident doesn’t mean 95% accurate. It just means 95% similar to training data - Morphology trumps algorithm: When pattern doesn’t fit AI prediction, trust your morphologic assessment - Rare diagnoses are AI blind spots: Algorithms haven’t seen them, will misclassify as “closest common diagnosis” - Always integrate clinical context: 45-year-old woman with breast mass. Angiosarcoma is rare but recognized entity; consider it

Bottom line: Never diagnose based solely on AI prediction when morphology is atypical. AI is a tool, not truth.

Professional Society Guidelines on AI in Pathology

CAP Guidelines on Digital Pathology and AI

The College of American Pathologists has established comprehensive guidance for AI validation in pathology:

Whole Slide Imaging (WSI) Validation Guidelines (2022 Update):

Published in Archives of Pathology & Laboratory Medicine, the CAP guideline provides:

3 strong recommendations (SRs)
9 good practice statements (GPSs)
GRADE framework for evidence evaluation
Specific validation protocols for diagnostic use

Key Validation Principle: Laboratories are not restricted to FDA-approved AI systems. However, a non-FDA-approved system may be employed for clinical testing only if adequate validation has been performed by the laboratory, and the regulatory status should be documented in surgical reports.

CAP AI Resources and Programs

AI Studio (2024): CAP provides members a secure, interactive environment to:

Experiment with emerging AI tools in pathology
Explore foundation models
Build confidence in AI applications

Recent Publications (Archives of Pathology & Laboratory Medicine, 2025):

“Introduction to Generative Artificial Intelligence: Contextualizing the Future” (February 2025)
“Harnessing the Power of Generative Artificial Intelligence in Pathology Education” (February 2025)
“Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice” (February 2025)
“Bridging the Clinical-Computational Transparency Gap in Digital Pathology” (2024)

Digital Pathology CPT Codes (2024)

CAP worked with the AMA CPT Editorial Panel to establish:

30 new digital pathology add-on codes for 2024
Category III add-on codes 0751T-0763T
New codes 0827T-0856T

These codes capture additional clinical staff work associated with digitizing glass microscope slides for primary diagnosis and AI algorithm use.

American Society for Clinical Pathology (ASCP)

ASCP provides complementary guidance on:

Laboratory quality management for AI systems
Technologist training requirements for digital pathology
Integration of AI into laboratory workflows

Implementation Guidance: ASCP emphasizes that AI implementation requires: - Validated workflows for specimen handling and digitization - Quality control procedures for scanner calibration - Clear documentation of AI involvement in diagnosis

Digital Pathology Association (DPA)

The DPA provides regulatory and implementation resources:

Healthcare regulatory information for digital pathology
Best practices for clinical deployment
Vendor evaluation frameworks