Pathology and Laboratory Medicine
Pathology is inherently visual pattern recognition: microscopic examination of cells and tissues to diagnose disease. This makes it theoretically ideal for computer vision AI, which excels at identifying visual patterns in images. From histopathology slide analysis to hematology cell classification to clinical chemistry quality control, AI promises to enhance diagnostic accuracy, reduce inter-observer variability, and improve laboratory efficiency. This chapter examines evidence-based AI applications across laboratory medicine.
After reading this chapter, you will be able to:
- Evaluate AI systems for digital pathology and histopathology interpretation, including FDA-cleared prostate and breast cancer detection algorithms
- Critically assess AI applications in cytopathology (Pap smears, thyroid FNA) and hematopathology (peripheral blood smears, bone marrow analysis)
- Understand AI tools for clinical laboratory workflow optimization, quality control, and critical value detection
- Analyze the unique regulatory landscape for laboratory AI, including FDA clearance pathways and CLIA implications
- Recognize failure modes specific to pathology AI, including scanner variability, staining heterogeneity, and rare diagnosis challenges
- Navigate medico-legal implications when AI assists (or misses) pathologic diagnoses
- Apply evidence-based frameworks for evaluating pathology AI before clinical deployment
Introduction: Why Pathology Is AI’s Ideal Specialty
Pathology is pattern recognition. A pathologist examines a biopsy slide, identifies malignant cells among normal tissue, grades tumor differentiation, counts mitotic figures, and integrates these visual features with clinical context to render a diagnosis.
This is exactly what computer vision AI does: detect patterns in images.
Unlike clinical medicine (where symptoms are subjective and examination findings nuanced), pathology offers:
1. Digitized specimens: Whole-slide imaging scanners convert glass slides to gigapixel digital images
2. Abundant training data: Decades of archived specimens with confirmed diagnoses
3. Standardized protocols: H&E staining, immunohistochemistry methods largely consistent across labs
4. Quantifiable features: Cell count, nuclear size, mitotic index, staining intensity can be measured objectively
5. Clear ground truth: Biopsy diagnoses confirmed by clinical outcome, surgical pathology, molecular testing
The result: Pathology AI has advanced faster than almost any other medical specialty. Multiple FDA-cleared algorithms. Growing evidence base. Accelerating clinical adoption.
But pathology AI faces unique challenges:
- Scanner variability: Algorithms trained on Aperio scanners may fail on Leica or Hamamatsu
- Staining heterogeneity: H&E staining intensity varies across labs, confusing algorithms
- Rare diagnoses: Limited training data for uncommon tumors, leading to confident misdiagnoses
- Clinical context: AI sees pixels, not patient history (the chest wall mass in a 70-year-old smoker is likely different from identical-appearing mass in 25-year-old)
This chapter examines what works, what doesn’t, and how to evaluate pathology AI critically before deployment.
Part 1: Histopathology AI. Prostate and Breast Cancer
Prostate Cancer Detection and Gleason Grading
Prostate cancer diagnosis from needle core biopsies is one of the most common pathology specimens. Gleason grading (scoring architectural patterns) predicts prognosis and guides treatment. But Gleason grading has notorious inter-observer variability. Different pathologists assign different scores to the same slide.
The AI solution: Computer vision algorithms trained on tens of thousands of digitized prostate biopsies learn to detect cancer foci and assign Gleason patterns.
Paige Prostate (FDA-Cleared 2021)
FDA clearance: First AI-based cancer detection algorithm cleared by FDA (2021)
Training data: 60,000+ whole-slide prostate biopsy images from 13 institutions
Performance: - Sensitivity for Gleason ≥7 cancer: 98.0% - Specificity: 97.2% - Negative predictive value: 99.5% (very few missed cancers) - Gleason grading concordance with expert GU pathologists: 87% (comparable to inter-pathologist agreement)
Clinical validation: Pantanowitz et al. (2020) conducted blinded validation across 16 pathologists and 71 prostate biopsies (Pantanowitz et al., 2020): - Paige Prostate detected 3 clinically significant cancers (Gleason 3+4, 4+3) that 2/16 pathologists initially missed - Zero false negatives for Gleason ≥4+3 cancer (high-grade) - Reduced Gleason grading variability by 23% compared to unaided pathologists
How pathologists use it: 1. Prostate needle biopsy digitally scanned (whole-slide imaging) 2. Paige algorithm analyzes slide, highlights suspicious regions 3. Pathologist reviews flagged areas + entire slide 4. Pathologist renders final diagnosis (algorithm is advisory, not determinative)
Current deployment: 50+ pathology labs in U.S., analyzing thousands of prostate biopsies monthly
Why this works: - Common diagnosis: Prostate cancer is most common male cancer; abundant training data - Standardized specimen: Needle core biopsies processed similarly across labs - Clear patterns: Gleason architecture (glands, cribriform, solid sheets) are well-defined visual features - Addresses real problem: Inter-pathologist Gleason grading variability is well-documented clinical issue
Limitations: - Scanner-specific: Validated on Philips IntelliSite Ultra Fast scanners; performance on other scanners requires separate validation - Gleason 3+3 vs. 3+4 boundary: Algorithm struggles with borderline cases (as do pathologists) - Mimics: Atrophy, adenosis, atypical adenomatous hyperplasia can fool algorithm (and pathologists)
Breast Cancer Lymph Node Metastasis Detection
Sentinel lymph node biopsy in breast cancer requires pathologists to scan entire lymph nodes for micrometastases (sometimes <0.2mm clusters of cells). Tedious, error-prone work.
Google/Verily LYNA (Lymph Node Assistant)
Published: Steiner et al. (2018) in AJSP (Steiner et al., 2018)
Training data: 399 whole-slide images of sentinel lymph nodes
Performance: - Sensitivity for micrometastases (≥0.2mm): 99% - False positive rate: 9% (acceptable for triage/flagging tool) - Halves pathologist reading time: Pathologists using LYNA reviewed slides 50% faster with equivalent accuracy
Clinical validation: Six pathologists reviewed 130 lymph node slides without LYNA, then with LYNA: - Without LYNA: 83% sensitivity for micrometastases (17% false negative rate) - With LYNA: 99% sensitivity (1% false negative rate) - Time per slide: 61 seconds (LYNA-assisted) vs. 116 seconds (unaided)
Why this works: - Clear visual target: Metastatic breast cancer cells look different from lymphocytes - Tedious human task: Scanning entire lymph node for rare micrometastases - High clinical stakes: Missing micrometastasis changes staging and treatment
Why this didn’t deploy widely: - Not FDA-cleared: Google/Verily published research but didn’t pursue commercial FDA clearance - Scanner specificity: Algorithm performance depends on specific scanning protocol - Workflow integration challenges: Many labs haven’t adopted whole-slide imaging yet
The lesson: Impressive research results don’t guarantee clinical adoption. FDA clearance, vendor support, and workflow integration matter as much as algorithmic accuracy.
Breast Cancer HER2 Scoring
HER2 immunohistochemistry scoring (0, 1+, 2+, 3+) determines trastuzumab (Herceptin) eligibility for breast cancer patients. But scoring is subjective: - 3+ = strong complete membrane staining ≥10% of cells → Treat with trastuzumab - 2+ = equivocal → Requires FISH testing to confirm HER2 amplification - 0/1+ = negative → No trastuzumab
The problem: Inter-observer variability. Different pathologists score the same slide differently, especially for borderline 2+/3+ cases.
The AI solution: Quantitative image analysis measures membrane staining intensity objectively.
PathAI AIM-HER2 (Research Use Only, NOT FDA-cleared):
PathAI launched AIM-HER2 Breast Cancer in July 2023 as an AI-powered HER2 scoring algorithm. Developed using 157,000 tissue annotations and consensus scores from over 65 expert breast pathologists on 4,000+ slides.
Performance in research settings (PathAI, 2023): - Concordance with expert breast pathologists: Improved inter-rater agreement, particularly at 0/1+ and 1+/2+ cutoffs - Reduces equivocal scoring: Better distinction between borderline cases - Standardizes scoring: Decreases inter-pathologist variability
Current status: AIM-HER2 is for research use only and not cleared for diagnostic procedures. PathAI’s AISight Dx platform received FDA 510(k) clearance (K243391, June 2025) for digital pathology image management, but the HER2 scoring algorithm itself does not have diagnostic clearance.
Clinical potential (once cleared): - Faster trastuzumab decisions (no FISH delay for clear 3+ cases) - Reduced FISH testing could save $1,000-1,500 per case - More reproducible scoring for clinical trial consistency
AI-Assisted Colonoscopy and Pathology Workflow Impact
While not strictly pathology AI, AI-assisted colonoscopy systems that improve polyp detection directly impact pathology specimen volume.
Evidence from meta-analysis: A 2024 systematic review and meta-analysis of 44 RCTs (36,201 colonoscopies) found that computer-aided detection (CADe) systems significantly increase adenoma detection (Soleymanjahi et al., 2024):
- Adenoma detection rate (ADR): 44.7% with CADe vs. 36.7% without
- Advanced colorectal neoplasia (ACN) detection: 12.7% vs. 11.5%
- Trade-off: Resection of approximately 2 additional nonneoplastic polyps per 10 colonoscopies
Implications for pathologists: - Increased specimen volume: Higher polyp detection translates to more polyp specimens for histopathologic examination - More nonneoplastic polyps: The increased detection includes both neoplastic and nonneoplastic polyps, increasing routine workload - Upstream AI affecting downstream pathology: This represents an important category where AI in one specialty creates workflow changes in pathology
Current status: Multiple FDA-cleared CADe systems for colonoscopy polyp detection are in clinical use, making this one of the most mature AI applications affecting GI pathology practice.
Part 2: Cytopathology AI
Cervical Cytology (Pap Smear) Screening
Pap smear screening for cervical dysplasia has dramatically reduced cervical cancer incidence. But cytotechnologists manually review hundreds of slides per day looking for rare dysplastic cells, tedious work prone to false negatives.
AI applications: - Automated screening: Flag slides with suspected abnormalities for cytotechnologist review - Quality control: Re-review slides initially called negative to catch false negatives - Workflow optimization: Prioritize worklist based on AI-assessed abnormality likelihood
FDA-cleared systems: - BD FocalPoint: Evolved from AutoPap 300QC (FDA-cleared 1998 for smear use, 2002 for SurePath slides). BD acquired in 2006 and renamed to BD FocalPoint Slide Profiler. The BD FocalPoint GS Imaging System received FDA PMA approval in December 2008 for location-guided cervical cancer screening. Note: This is pre-deep learning rule-based image analysis, not modern neural network AI. - Hologic ThinPrep Imaging System: Computer-assisted cytology screening
Performance: Bao et al. (2023) meta-analysis of cervical cytology AI across 23 studies (Bao et al., 2023): - Pooled sensitivity for HSIL+ (high-grade squamous intraepithelial lesion): 94.2% - Specificity: 88.5% - Reduction in false negatives: 10-20% compared to unaided cytotechnologists
How cytotechnologists use it: - AI pre-screens slides, ranks by abnormality likelihood - Cytotechnologist reviews high-risk slides first - AI flags slides initially called negative for second review - Cytotechnologist renders final diagnosis
Clinical benefit: - Higher sensitivity for detecting high-grade dysplasia - Workflow efficiency (prioritizes high-risk cases) - Quality control (catches false negatives before sign-out)
Limitations: - High false positive rate: 12-15% specificity means many normal slides flagged - Doesn’t detect all lesions: Glandular abnormalities (adenocarcinoma in situ) often missed - Preparation-dependent: Algorithms validated on ThinPrep may not work on SurePath
Part 3: Hematopathology AI
Peripheral Blood Smear Analysis
Complete blood count (CBC) analyzers provide automated cell counts, but peripheral blood smear review by humans identifies: - Blast cells (leukemia diagnosis) - Abnormal red cell morphology (hemolytic anemia) - Parasites (malaria, babesiosis) - Left shift (bands, myelocytes indicating infection)
AI applications: - Automated differential counts: Classify neutrophils, lymphocytes, monocytes, eosinophils, basophils - Blast detection: Flag suspected leukemia cases - Malaria detection: Quantify parasitemia in endemic areas
Performance: Variable and analyzer-dependent. Saba et al. (2023) review found (Saba et al., 2023): - Differential count concordance with manual review: 80-95% depending on cell type - Blast detection sensitivity: 85-92% (but high false positive rate from atypical lymphocytes, immature granulocytes) - Malaria parasite detection: 90-95% sensitivity (better than human microscopists in low-parasitemia cases)
Current use: - Research stage for most applications - FDA-cleared automated differential analyzers exist but require manual confirmation for abnormal results - Malaria detection AI deployed in endemic regions (sub-Saharan Africa, Southeast Asia)
Limitations: - Morphologic subtlety: Distinguishing reactive lymphocytes from lymphoma cells requires expertise AI lacks - Rare cells: Abnormal cells (blasts, atypical lymphocytes) are infrequent; limited training data - Preparation artifacts: Smear quality (cell distribution, staining) affects algorithm performance
Bone Marrow Biopsy Analysis
Bone marrow analysis requires: - Cellularity assessment: Hypocellular, normocellular, hypercellular - Blast count: Percentage of blasts determines AML diagnosis threshold (≥20%) - Cellular morphology: Dysplasia, maturation abnormalities, infiltrative process
AI research: Multiple academic studies show promise for: - Automated blast counting (correlation r=0.90-0.95 with manual counts) - Cellularity assessment - Megakaryocyte quantification
Why this isn’t clinically deployed yet: - Heterogeneity: Bone marrow has far more cell types than peripheral blood - Context matters: Blast count interpretation requires clinical history (prior chemo, transplant status) - Rare diagnoses: Hairy cell leukemia, systemic mastocytosis, metastatic disease have limited training data - No FDA-cleared systems: All current work is research-stage
Part 4: Clinical Laboratory AI and Workflow Optimization
Quality Control and Critical Value Detection
Clinical chemistry and hematology labs already use rules-based algorithms extensively: - Delta checks: Flag results that changed dramatically from prior (possible specimen mix-up or critical illness) - Critical value alerts: Automatic notification for life-threatening results (potassium >6.5 mEq/L) - Quality control monitoring: Levey-Jennings charts, Westgard rules
AI enhancements: Machine learning improves these by learning lab-specific patterns: - Predict when QC will fail (proactive instrument maintenance) - Reduce false positive delta check alerts (by learning patient-specific patterns) - Identify systematic errors (reagent lot problems, instrument drift)
Current use: Research and proprietary implementations in large reference labs. Not yet widely adopted in community hospital labs.
Microbiology AI
Applications: - Automated colony counting: Computer vision counts bacterial colonies on agar plates - Species identification from imaging: Distinguish E. coli from Klebsiella based on colony morphology - Antibiotic resistance prediction: Predict susceptibility from genomic data (whole-genome sequencing) - Blood culture positivity prediction: Predict which blood cultures will grow bacteria (to prioritize processing)
Performance: Schinkel et al. (2023) ML model for blood culture positivity: - Predicted positive cultures with AUC 0.78 using clinical data (fever, WBC, prior cultures) - Enabled faster processing of high-risk cultures
Current use: - Automated colony counting: Commercially available, FDA-cleared systems - Genomic resistance prediction: Research stage, expensive (WGS not routine) - Blood culture prediction: Research stage
Part 5: The Scanner Standardization Challenge (The Failure No One Talks About)
Here’s a dirty secret in pathology AI: Most algorithms work well only on the scanner they were trained on.
An algorithm trained on Aperio scanners often performs poorly on Leica or Hamamatsu scanners, even for identical tissue. Why?
Technical reasons: - Color space differences: Scanners capture RGB values differently - Compression artifacts: JPEG compression algorithms vary - Focus algorithms: Autofocus creates slightly different focal planes - Illumination: LED vs. halogen light sources, color temperature differences
Clinical consequences: - Labs must validate algorithms on their specific scanner before deployment - Switching scanner vendors may require re-validating all AI algorithms - “Scanner-agnostic” algorithms don’t exist yet (despite vendor claims)
Real-world example: One academic pathology department deployed prostate cancer detection AI validated on their Aperio scanners. When they upgraded to Leica scanners (for faster scanning), the algorithm’s sensitivity dropped from 98% to 84% because color profiles differed.
They had to: 1. Halt clinical use 2. Re-validate on Leica scanners using 1,000 internal cases 3. Work with vendor to retrain algorithm 4. Get institutional IRB approval for new validation
Cost: 6 months delay + $50,000 validation cost.
The lesson: Ask vendors explicitly: “Was this algorithm validated on our exact scanner model and software version?”
Part 6: Rare Diagnosis Problem (AI’s Achilles Heel)
AI algorithms are trained on thousands of examples. Breast cancer, prostate cancer, colon adenocarcinoma, all with abundant training data.
But what about: - Angiosarcoma (rare vascular malignancy) - Desmoplastic melanoma (melanoma mimicking benign scar tissue) - Primary bone lymphoma - Hepatosplenic T-cell lymphoma
These diagnoses occur once per year in most pathology departments. Algorithms have never seen them.
What happens when AI encounters rare diagnosis?
Scenario 1: AI confidently misdiagnoses - Algorithm trained on 10,000 benign prostate biopsies and 5,000 adenocarcinomas sees its first granulomatous prostatitis - Misclassifies as adenocarcinoma with 92% confidence (granulomas have increased cellularity, architectural distortion) - If pathologist trusts algorithm, patient gets unnecessary treatment
Scenario 2: AI refuses to render opinion - Better outcome: Algorithm recognizes it’s outside training distribution - Flags case for expert review - But most current algorithms don’t have this capability (they always output a prediction)
The solution (still research-stage): - Outlier detection: Train algorithms to recognize when they’re seeing something outside training data - Confidence calibration: Algorithms should express low confidence for rare diagnoses - Human-in-the-loop: All AI-assisted diagnoses require pathologist confirmation (current FDA requirement)
Current clinical approach: - Use AI only for common diagnoses where it’s been validated - Maintain high index of suspicion for rare diagnoses - When AI prediction conflicts with clinical context, trust your judgment
Part 7: Implementation Framework
Before Adopting Pathology AI
Questions to ask vendors:
- “What is the FDA clearance status?”
- 510(k) cleared? PMA approved? Investigational?
- FDA clearance ≠ guaranteed performance, but provides baseline validation
- “Was this algorithm validated on our specific scanner?”
- Exact scanner model and software version
- If no, plan for local validation study
- “What diagnoses has this been validated for?”
- Algorithms validated for prostate cancer don’t work for bladder cancer
- Ask for published validation studies for each diagnosis
- “What is sensitivity and specificity for clinically significant disease?”
- Gleason ≥7 prostate cancer (not all cancers)
- Micrometastases ≥0.2mm (not isolated tumor cells)
- HSIL+ cervical lesions (not ASCUS)
- “How does this integrate with our digital pathology workflow?”
- PACS integration? LIS integration?
- Does pathologist see AI annotations directly on slide viewer?
- “What happens when the algorithm encounters a rare diagnosis?”
- Does it output low confidence? Flag for human review?
- Or does it confidently misdiagnose?
- “Can we validate locally before clinical deployment?”
- 500-1,000 cases covering range of diagnoses
- Compare algorithm performance to sign-out diagnoses
- Stratify by diagnosis, specimen type, staining protocol
- “What is the cost structure?”
- Per-slide fee? Annual license? Scanner-tied?
- Hidden costs: Scanner upgrade requirements, IT integration
- “Who is liable if the algorithm misses a cancer?”
- Read vendor contract carefully
- Most disclaim liability; pathologist remains responsible
- “Can you provide references from pathologists using this clinically?”
- Not research collaborators. Actual clinical users
- Ask about false positives, workflow disruptions, performance on their scanner
Red Flags (Walk Away If You See These)
- “Scanner-agnostic” claims without validation data: All algorithms are scanner-dependent
- Validated for “cancer detection” without specifying types: Each cancer requires separate validation
- No published peer-reviewed validation studies: Internal white papers insufficient
- Claims of “autonomous diagnosis”: FDA requires human confirmation; autonomous claims are misleading
- “Works on all staining protocols”: Stain intensity variation affects all algorithms
Part 8: Cost-Benefit Reality
What Does Pathology AI Cost?
Prostate cancer detection: - Paige Prostate: ~$15-25 per slide - ROI: Reduced Gleason grading variability, fewer unnecessary biopsies
HER2 scoring: - PathAI: ~$20-30 per slide - ROI: Reduces FISH confirmatory testing (~$1,000-1,500 saved per avoided FISH)
Lymph node metastasis detection: - Not commercially available (LYNA research-only)
Cervical cytology: - BD FocalPoint: Bundled into instrument contract - ROI: Higher sensitivity for HSIL, workflow efficiency
Digital pathology infrastructure (prerequisite for most AI): - Whole-slide scanner: $150,000-500,000 depending on capacity - PACS (digital pathology image management): $50,000-200,000 - IT infrastructure: Network bandwidth, storage (1 slide = 1-5 GB) - Total infrastructure cost before any AI: $300,000-1,000,000
Do These Tools Save Money?
HER2 scoring: YES - Reduces equivocal (2+) cases by 30% - At $1,200/FISH test × 30% reduction × 500 HER2 cases/year = $180,000 saved - Algorithm cost: $20 × 500 = $10,000 - Net savings: $170,000/year
Prostate cancer detection: MAYBE - Reduces unnecessary repeat biopsies (fewer false negatives) - Reduces Gleason grading variability (more consistent treatment decisions) - Hard to quantify dollar savings, but likely cost-effective for high-volume labs
Cervical cytology: PROBABLY - 10-15% improvement in HSIL detection → earlier treatment → reduced cervical cancer incidence - Cost-effectiveness studies show modest benefit
Lymph node metastasis: WOULD IF DEPLOYED - Halves pathologist reading time → increase throughput or reduce staffing - But not commercially available, so savings theoretical
Digital pathology infrastructure: UNCLEAR - Enables remote sign-out, telepathology consultations, AI applications - Saves physical slide storage space - But upfront costs are substantial ($300K-1M) - ROI depends on lab volume and utilization
Part 9: The Future of Pathology AI
What’s Coming in the Next 5 Years
Likely to reach clinical use:
- Expanded cancer detection algorithms: Colon polyp classification, bladder cancer grading, melanoma diagnosis
- Molecular pathway prediction from H&E: Predict ER/PR/HER2 status from morphology (reducing immunostaining costs)
- Prognostic algorithms: Predict recurrence risk from histology + clinical data (refining current scores)
- Workflow automation: Automated case triage, specimen tracking, quality control
Promising but uncertain:
- Scanner-agnostic algorithms: Stain normalization and domain adaptation to work across scanner vendors
- Multiplex imaging AI: Analyze multi-color immunofluorescence (tumor microenvironment, immune infiltrates)
- Real-time intraoperative diagnosis: AI analyzing frozen sections during surgery (replacing human pathologist)
Overhyped and unlikely:
- Fully autonomous diagnosis without pathologist: FDA won’t approve it; pathologists won’t accept it; medicolegal landscape doesn’t support it
- “One algorithm for all diagnoses”: Each tissue and diagnosis requires specific training
- AI replacing pathologists: Augmentation yes, replacement no
The rate-limiting factors: - Scanner standardization (biggest technical challenge) - FDA regulatory pathway clarity - Medicolegal precedent (when AI-assisted diagnosis goes wrong) - Pathologist workflow integration and trust
Key Takeaways
10 Principles for Pathology AI
Pathology AI is the most mature medical AI application: Multiple FDA-cleared devices, growing evidence base, accelerating adoption
Scanner specificity is the hidden challenge: Algorithms validated on one scanner often fail on others; demand scanner-specific validation data
AI excels at quantitative tasks: HER2 scoring, Ki-67 quantification, mitotic count, all reducing inter-observer variability
Rare diagnoses are AI’s Achilles heel: Algorithms confidently misdiagnose diseases they’ve never seen; maintain clinical suspicion
Augmentation, not automation: All FDA-cleared pathology AI requires pathologist confirmation; no autonomous diagnosis
Workflow integration determines success: Algorithm accuracy matters less than smooth PACS/LIS integration
HER2 scoring AI has clearest ROI: Reduces equivocal cases requiring expensive FISH confirmation
Digital pathology infrastructure is expensive: $300K-1M upfront before any AI deployment
Demand local validation: Validate algorithms on 500-1,000 internal cases before clinical use
Medicolegal landscape is evolving: Pathologist remains liable for AI-assisted diagnoses; document appropriately
Clinical Scenario: Evaluating Prostate Cancer Detection AI
Scenario: Your Pathology Department Is Considering AI for Prostate Biopsies
The pitch: A vendor demonstrates AI that detects prostate adenocarcinoma and assigns Gleason grades. They show you: - Sensitivity 98% for Gleason ≥7 cancer - “Reduces inter-pathologist variability by 30%” - FDA 510(k) cleared - Cost: $20 per slide
Your department processes 2,000 prostate biopsies/year. The chair asks for your recommendation.
Questions to Ask:
- “Was this validated on our specific scanner?”
- We use Leica Aperio GT450. Was the algorithm trained and validated on this exact model?
- If no, we need local validation before clinical deployment
- “What is performance for Gleason grade groups?”
- Sensitivity/specificity for Grade Group 1 (3+3) vs. 2 (3+4) vs. 3+ (4+3, 4+4, 4+5)?
- Grade Group 2 vs. 3 distinction most clinically important (active surveillance vs. treatment)
- “How does this handle mimics?”
- Atypical adenomatous hyperplasia, atrophy, high-grade PIN
- False positive rate for benign mimics?
- “What is the workflow integration?”
- Integrates with our PACS (Philips IntelliSite)?
- Do pathologists see AI annotations directly on slide viewer?
- Adds extra clicks/steps to sign-out workflow?
- “Can we do local validation study?”
- Test on 500 internal prostate biopsies (mix of benign, Gleason 3+3, 3+4, 4+3, ≥4+4)
- Compare algorithm grades to our sign-out diagnoses
- Stratify performance by Gleason pattern
- “What is the cost-benefit?”
- 2,000 biopsies/year × $20 = $40,000/year
- Benefits: Reduced inter-observer variability, fewer missed cancers, pathologist confidence
- Quantifiable ROI unclear (not like HER2 where we save FISH costs)
- “Who is liable if algorithm misses cancer?”
- Vendor contract liability clause?
- Pathologist still signs out report → pathologist liable
- “How do we document AI assistance?”
- Add statement to pathology report: “Computer-assisted detection used”?
- Document when we disagree with algorithm?
Red Flags in This Scenario:
“Reduces variability by 30%” without baseline data: What was inter-observer variability before? Calculated how?
510(k) clearance doesn’t specify scanner compatibility: Need explicit validation on our scanner
No mention of performance on Grade Group 2 vs. 3 distinction: Most clinically important classification, should be reported separately
$20/slide × 2,000 = $40,000/year without clear ROI: For HER2, we save FISH costs; for prostate, savings less clear
Check Your Understanding
Scenario 1: The AI-Detected Micrometastasis
Clinical situation: You’re reviewing a sentinel lymph node from a breast cancer patient. The AI algorithm (LYNA-like system) flags a 0.3mm cluster of cells as suspicious for metastatic carcinoma. You review the flagged area. The cells are slightly enlarged with increased nuclear-to-cytoplasmic ratio, but you’re not certain they’re malignant. Could be reactive sinus histiocytosis.
Without the AI flag, you might have called this negative. With the AI flag, you’re uncertain.
Question 1: Do you sign out as “positive for micrometastasis” based on the AI flagging?
Click to reveal answer
Answer: No, not based solely on AI flagging. Get a second opinion from a breast pathology expert.
Reasoning:
Why you shouldn’t call positive based on AI alone: 1. AI has false positives: Even 91% specificity (LYNA study) = 9% false positive rate 2. Reactive histiocytes can mimic carcinoma: Enlarged cells with increased N:C ratio occur in reactive nodes 3. Clinical consequences are significant: Micrometastasis diagnosis changes staging (N0 → N1mi), may influence adjuvant therapy decisions 4. Pathologist judgment remains standard: AI is assistive, not determinative
What you should do: 1. Get expert consultation: Send digital slide to breast pathology expert for second opinion 2. Consider IHC: Cytokeratin stain (AE1/AE3 or CAM5.2) would confirm epithelial cells vs. histiocytes 3. Review clinical context: Large primary tumor, lymphovascular invasion, high grade → higher pretest probability of nodal metastasis 4. Document uncertainty: If you call it positive, note it was small focus requiring IHC confirmation or expert consultation
The AI’s value: - Flagged an area you might have missed on routine review - Prompted closer examination and consideration of IHC - But doesn’t replace pathologist judgment for equivocal findings
Bottom line: Use AI to flag suspicious areas for closer review, but don’t diagnose solely based on AI flagging when morphology is equivocal.
If IHC confirms metastatic carcinoma: AI was correct, you caught early metastasis If IHC shows histiocytes: AI was false positive, but better to overcall (with IHC confirmation) than miss metastasis
Scenario 2: The Scanner Upgrade Disaster
Clinical situation: Your department has been using prostate cancer detection AI successfully for 18 months on Aperio AT2 scanners. Sensitivity 97%, very few false negatives, pathologists trust it.
Your institution decides to upgrade to Leica Aperio GT450 scanners for faster throughput (4x scanning speed). Same vendor (Leica), just newer model.
After the scanner upgrade, you notice the AI is flagging many more false positives, with benign glands marked as suspicious. And one case where you caught a Gleason 4+3 cancer that the AI missed (it called the slide benign).
Question 2: What went wrong, and what should you do?
Click to reveal answer
Answer: Scanner upgrade broke AI algorithm calibration. Halt clinical AI use immediately and revalidate.
What went wrong:
Scanner variability problem: 1. Different color profiles: GT450 captures RGB values differently than AT2 (different camera sensors) 2. Different compression: Image compression algorithms may differ 3. Different focus: Autofocus algorithm produces slightly different focal planes 4. Algorithm wasn’t trained on GT450 images: Trained on AT2, doesn’t generalize to GT450 despite being same vendor
Why this is dangerous: - You caught one false negative (Gleason 4+3 missed) - How many did you not catch? - Patients may be getting false reassurance from benign AI calls - Medicolegal liability: If AI-missed cancer causes harm, you’re liable
What you should do immediately:
- Halt AI-assisted sign-out for clinical cases:
- Revert to manual review without AI assistance
- Better to lose AI benefits than risk false negatives
- Notify vendor:
- Scanner upgrade broke algorithm performance
- Request algorithm retraining/recalibration for GT450
- Retrospective review of cases signed out since scanner upgrade:
- How many cases signed out with AI assistance post-upgrade?
- Re-review all with special attention to AI-negative calls
- Identify any missed diagnoses requiring patient notification
- Local validation study on GT450:
- Test algorithm on 500 internal prostate biopsies scanned on GT450
- Compare to sign-out diagnoses
- Calculate sensitivity/specificity on new scanner
- Only resume clinical use if performance acceptable
- Update validation protocols:
- Document that algorithm is scanner-model-specific
- Any future scanner changes require revalidation
- Consider this in scanner procurement decisions (vendor lock-in)
Lessons learned: - Scanner specificity is real: Even upgrading within same vendor can break AI - Continuous monitoring essential: Track AI performance over time, watch for degradation - Plan for scanner changes: Budget time and resources for revalidation when upgrading scanners
Prevention for next time: - Before scanner upgrade, ask vendor: “Will this require AI revalidation?” - Plan revalidation into upgrade timeline - Consider parallel scanning (old + new scanners) during transition for validation
Scenario 3: The Rare Sarcoma Misdiagnosis
Clinical situation: A 45-year-old woman has a breast mass biopsied. Core needle biopsy submitted. Your lab uses breast cancer detection AI that’s been excellent for usual ductal and lobular carcinomas.
The AI flags the case as “high-grade ductal carcinoma, 95% confidence.” You review the slide. The cells are indeed high-grade, pleomorphic, with mitotic activity. But the architecture doesn’t look like typical ductal carcinoma. It’s more spindle-cell, fascicular growth pattern.
You’re not a breast pathology expert. The AI is 95% confident. Your general pathology training says “looks malignant, high-grade.”
Question 3: Do you sign out as high-grade ductal carcinoma based on AI’s 95% confidence?
Click to reveal answer
Answer: No. This is likely angiosarcoma or other sarcoma, not carcinoma. Get expert consultation immediately.
Why the AI is wrong:
The rare diagnosis problem: 1. AI trained on ductal and lobular carcinomas: Seen thousands of typical breast cancers 2. Never seen angiosarcoma: Occurs in <1% of breast malignancies 3. Spindle-cell pattern outside training distribution: Algorithm defaults to “high-grade carcinoma” because it’s the closest thing it knows 4. High confidence is misleading: 95% confidence just means “95% similar to high-grade carcinomas in training set.” It doesn’t mean it is carcinoma
What this actually is: - Breast angiosarcoma: Rare vascular malignancy - Spindle-cell morphology: Fascicular growth pattern, not glandular architecture - Mimics carcinoma: High-grade, mitotically active - Treatment differs: Angiosarcoma requires wide excision, often radiation; different chemotherapy than carcinoma
What happens if you call it carcinoma: - Patient gets inappropriate treatment (mastectomy + sentinel node biopsy, but angiosarcoma doesn’t metastasize to axillary nodes typically) - Oncologist prescribes breast cancer chemotherapy (ineffective for angiosarcoma) - Patient harm from wrong diagnosis and wrong treatment
What you should do:
- Recognize the morphology doesn’t fit:
- Spindle cells ≠ ductal carcinoma
- Fascicular pattern ≠ typical breast cancer architecture
- Trust your morphologic assessment over AI confidence score
- Get expert consultation:
- Send to breast pathology expert at academic center
- Describe: “High-grade spindle-cell malignancy, AI flagged as carcinoma but morphology atypical”
- Order IHC panel:
- Vascular markers (CD31, CD34, ERG) → positive in angiosarcoma
- Epithelial markers (cytokeratins, GATA3) → negative in angiosarcoma
- IHC will distinguish carcinoma from sarcoma
- Document AI assistance and your clinical reasoning:
- “Computer-assisted detection system flagged as ductal carcinoma; however, spindle-cell morphology atypical for breast carcinoma. IHC and expert consultation recommended.”
- Report this case to AI vendor:
- False positive (confidently misdiagnosed rare sarcoma as carcinoma)
- Vendor should add angiosarcoma cases to training set to prevent future misdiagnoses
Lessons learned: - AI confidence scores are misleading for rare diagnoses: 95% confident doesn’t mean 95% accurate. It just means 95% similar to training data - Morphology trumps algorithm: When pattern doesn’t fit AI prediction, trust your morphologic assessment - Rare diagnoses are AI blind spots: Algorithms haven’t seen them, will misclassify as “closest common diagnosis” - Always integrate clinical context: 45-year-old woman with breast mass. Angiosarcoma is rare but recognized entity; consider it
Bottom line: Never diagnose based solely on AI prediction when morphology is atypical. AI is a tool, not truth.
Professional Society Guidelines on AI in Pathology
The College of American Pathologists has established comprehensive guidance for AI validation in pathology:
Whole Slide Imaging (WSI) Validation Guidelines (2022 Update):
Published in Archives of Pathology & Laboratory Medicine, the CAP guideline provides:
- 3 strong recommendations (SRs)
- 9 good practice statements (GPSs)
- GRADE framework for evidence evaluation
- Specific validation protocols for diagnostic use
Key Validation Principle: Laboratories are not restricted to FDA-approved AI systems. However, a non-FDA-approved system may be employed for clinical testing only if adequate validation has been performed by the laboratory, and the regulatory status should be documented in surgical reports.
CAP AI Resources and Programs
AI Studio (2024): CAP provides members a secure, interactive environment to:
- Experiment with emerging AI tools in pathology
- Explore foundation models
- Build confidence in AI applications
Recent Publications (Archives of Pathology & Laboratory Medicine, 2025):
- “Introduction to Generative Artificial Intelligence: Contextualizing the Future” (February 2025)
- “Harnessing the Power of Generative Artificial Intelligence in Pathology Education” (February 2025)
- “Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice” (February 2025)
- “Bridging the Clinical-Computational Transparency Gap in Digital Pathology” (2024)
Digital Pathology CPT Codes (2024)
CAP worked with the AMA CPT Editorial Panel to establish:
- 30 new digital pathology add-on codes for 2024
- Category III add-on codes 0751T-0763T
- New codes 0827T-0856T
These codes capture additional clinical staff work associated with digitizing glass microscope slides for primary diagnosis and AI algorithm use.
American Society for Clinical Pathology (ASCP)
ASCP provides complementary guidance on:
- Laboratory quality management for AI systems
- Technologist training requirements for digital pathology
- Integration of AI into laboratory workflows
Implementation Guidance: ASCP emphasizes that AI implementation requires: - Validated workflows for specimen handling and digitization - Quality control procedures for scanner calibration - Clear documentation of AI involvement in diagnosis
Digital Pathology Association (DPA)
The DPA provides regulatory and implementation resources:
- Healthcare regulatory information for digital pathology
- Best practices for clinical deployment
- Vendor evaluation frameworks