Surgical Subspecialties

Colonoscopy AI has the strongest randomized controlled trial evidence of any surgical AI application, but real-world implementation studies reveal a troubling gap: benefits shrink or disappear outside trial conditions. Gastroenterology AI now extends beyond colonoscopy to liver disease screening (AI-ECG doubled cirrhosis detection in a Nature Medicine RCT), celiac disease pathology, and gastric cancer screening across 16 centers. Prostate cancer AI has three FDA-authorized tools validated across phase 3 RCTs. In otolaryngology, the Cochlear Nexa smart implant (FDA approved July 2025) is the first with upgradeable firmware, while the TruDi surgical navigation case study illustrates the risks of rushing AI into the operating room.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate colonoscopy AI with RCT evidence and understand the implementation gap
  • Assess gastroenterology AI beyond colonoscopy (liver cirrhosis screening, celiac disease pathology, gastric cancer detection)
  • Identify FDA-authorized prostate cancer AI tools and their clinical evidence
  • Evaluate bladder cancer, kidney stone, and robotic surgery AI applications
  • Understand otolaryngology AI (cochlear implants, head and neck cancer, surgical navigation safety)
  • Apply evidence-based frameworks for surgical subspecialty AI adoption

The Clinical Context: Surgical subspecialties have developed specialty-specific AI applications with varying evidence levels. Colonoscopy AI has the strongest RCT evidence base. Gastroenterology AI now extends to liver cirrhosis screening, celiac disease pathology, and gastric cancer detection. Urology has the most FDA-authorized AI devices. Otolaryngology AI includes both promising applications (cochlear implants, cancer prognostication) and cautionary data (surgical navigation malfunctions).

What Works Well:

Specialty Application Evidence Level Key Finding
GI Colonoscopy polyp detection Strong RCT evidence ADR improves 10-20% relative
GI AI-ECG liver cirrhosis screening RCT (Nature Medicine) Doubled new cirrhosis diagnoses in primary care
GI Celiac disease pathology AI Multicenter validation (NEJM AI) >95% accuracy, pathologist-level performance
Urology Prostate cancer AI (pathology, MRI, prognostication) Strong (multiple FDA authorizations) ArteraAI validated across 6 phase 3 RCTs
Urology Bladder cancer detection AI Multicenter studies + meta-analysis AI outperforms clinicians (AUC 0.89 vs. 0.81)
ENT Cochlear implant outcome prediction Multicenter (JAMA Otolaryngology) 92% accuracy predicting language development

What’s Emerging:

Specialty Application Status Notes
GI GRAPE gastric cancer screening Validated across 16 centers AUC 0.927, outperforms radiologists
Urology AI urine tests (bladder cancer) FDA Breakthrough Designation TOBY test: noninvasive VOC analysis, AUC >0.9
Urology Kidney stone AI during surgery Clinically validated AiFURS: 92–95% accuracy
ENT Cochlear Nexa smart implant FDA approved July 2025 First upgradeable firmware cochlear implant

Safety Signal: The TruDi surgical navigation system saw malfunctions increase from 8 to over 100 after AI enhancement, with at least 10 patient injuries. AI enhancement does not guarantee improved safety.

The Bottom Line: Colonoscopy AI has the strongest RCT evidence base, with multiple FDA-cleared systems. Prostate cancer AI now has three FDA-authorized tools spanning pathology detection (Paige), prognostication (ArteraAI), and tumor mapping (Avenda). Gastroenterology AI has expanded to liver cirrhosis, celiac disease, and gastric cancer. Bladder cancer AI outperforms clinicians in multicenter studies. Real-world implementation often underperforms RCT results.

Introduction

Colonoscopy AI has something most surgical AI lacks: multiple randomized controlled trials showing 10-20% relative improvement in adenoma detection rate, where each 1% ADR increase correlates with 3% reduced interval cancer risk. Yet real-world implementation studies consistently show smaller benefits than trial conditions predicted. Alert fatigue, variable endoscopist technique, and uncontrolled clinical environments erode the gains seen under protocol-driven RCT conditions. Urology AI has advanced rapidly, with three FDA-authorized prostate cancer tools, multicenter bladder cancer validation studies, and clinically validated kidney stone AI. The gap between trial performance and practice performance recurs across these applications and every other surgical subspecialty where AI has been deployed.


Part 1: Gastroenterology AI

Colonoscopy AI: The Evidence Leader

Clinical Need

Colonoscopy quality varies substantially by endoscopist. Adenoma detection rate (ADR), the proportion of screening colonoscopies detecting at least one adenoma, correlates with interval colorectal cancer risk. Each 1% increase in ADR is associated with 3% reduced interval cancer risk (Corley et al., 2014).

Computer-aided detection (CADe) systems aim to reduce missed polyps by providing real-time alerts during colonoscopy.

FDA-Cleared Systems

Multiple CADe systems have received FDA clearance:

System Manufacturer FDA Clearance Key Features
GI Genius Medtronic 2021 Real-time polyp detection overlay
CAD EYE Fujifilm 2022 Integrated with endoscopy processor
EndoScreener Wision AI 2022 AI detection with size estimation
ENDO-AID Olympus 2023 Multiple detection modes

RCT Evidence

Meta-analysis findings (2024):

The largest meta-analysis of AI-assisted colonoscopy included 44 RCTs (Soleymanjahi et al., 2024):

  • ADR increased from 36.7% to 44.7% (RR 1.21, 95% CI 1.15-1.28)
  • Consistent benefit across CADe platforms
  • Improved detection of sessile serrated lesions

GI Genius specific evidence:

Study Design ADR Effect
Repici et al., 2020 RCT ADR 54.8% vs 40.4% (RR 1.30)
COLO-DETECT, 2024 Pragmatic RCT RR 1.12 (95% CI 1.03-1.22)
Meta-analysis GI Genius studies Multiple Variable, I²=64%

Real-World vs. RCT Performance

The Implementation Gap

RCT vs. real-world discordance:

Real-world implementation studies show smaller or absent benefits compared to RCTs (Wei et al., 2024):

  • Overall real-world ADR: 36.3% with CADe vs. 35.8% without (RR 1.13)
  • GI Genius specifically: No significant difference (RR 0.96, 95% CI 0.85-1.07)

Why the gap?

  1. RCT conditions: Protocol adherence, selected endoscopists, controlled environments
  2. Real-world conditions: Variable technique, alert fatigue, workflow integration challenges
  3. Ceiling effect: High-performing endoscopists may not benefit from AI

Clinical implication: CADe AI is a supplement to, not substitute for, rigorous colonoscopy technique. Centers with low baseline ADR may see greater benefit.

False Positive Burden

CADe increases detection of non-neoplastic polyps: - Hyperplastic polyps (not requiring removal if <5mm in rectosigmoid) - Artifacts, stool, mucosal folds triggering false alerts

Consequence: Increased polypectomy of benign lesions represents unnecessary intervention and procedural risk.

Serrated Lesion Detection

Sessile serrated lesions (SSLs) are precursors to interval cancers and historically difficult to detect. Meta-analysis shows CADe improves SSLDR (RR 1.27, 95% CI 1.11-1.47).

However: Advanced adenoma detection rate (aADR), arguably the most clinically relevant metric, shows no significant improvement (RR 1.01, 95% CI 0.90-1.13).

GI AI Beyond Colonoscopy

AI-ECG liver cirrhosis detection:

A pragmatic cluster-randomized clinical trial (98 primary care teams, 15,596 adults) tested whether an AI-enabled ECG model could identify undiagnosed advanced chronic liver disease. In the intervention group, new diagnoses of advanced liver disease doubled compared with usual care (1.0% vs. 0.5%, OR 2.09, p=0.007). Among AI-positive patients, detection was four-fold higher (4.4% vs. 1.1%, OR 4.37, p<0.001). The model detects cardiac electrical changes associated with liver cirrhosis from a routine 12-lead ECG (Ahn et al., 2025).

Celiac disease pathology AI:

A machine learning model trained on 3,383 whole slide images of duodenal biopsies from four hospitals achieved pathologist-level diagnostic performance for celiac disease, with accuracy, sensitivity, and specificity exceeding 95% and AUC exceeding 99%. Inter-observer agreement between pathologist and model was statistically indistinguishable from pathologist-pathologist agreement (p>0.96), suggesting AI could address diagnostic bottlenecks in settings with pathologist shortages (Hujoel et al., 2025).

GRAPE gastric cancer screening:

The GRAPE (Gastric Cancer Risk Assessment Procedure with AI) model uses noncontrast CT and deep learning to screen for gastric cancer. Trained on data from 2 centers (6,720 cases) and validated across 16 independent centers (18,160 cases), the model achieved AUC of 0.927, improving radiologist sensitivity by 21.8% and specificity by 14.0%, particularly for early-stage disease. In real-world screening, GRAPE identified gastric cancer detection rates of 17.7–24.5%, with approximately 40% of detected cases lacking prior abdominal symptoms (Xu et al., 2025).

AI-assisted capsule endoscopy:

NaviCam ProScan (AnX Robotica) received FDA clearance via De Novo pathway (DEN230027, 2024) as the first AI-assisted reading tool for small bowel video capsule endoscopy. In clinical testing, AI-assisted reading reduced interpretation time from 58 to 21 minutes (p<0.001) while maintaining detection sensitivity for suspected small bowel bleeding (AnX Robotica, January 2024).


Part 2: Urology AI

Urology has the broadest range of FDA-authorized AI tools among surgical subspecialties, spanning prostate cancer detection and prognostication, bladder cancer screening, kidney stone analysis, and robotic surgery.

Prostate Cancer AI

Prostate cancer has the most FDA-authorized AI tools of any urologic condition, spanning pathology detection, prognostication, tumor mapping, and MRI interpretation.

FDA-Authorized Prostate Cancer AI Devices

Device Company FDA Pathway Year Application
Paige Prostate Paige.AI De Novo (DEN200080) 2021 Cancer detection in pathology slides
Avenda iQuest (Unfold AI) Avenda Health 510(k) 2022 Tumor mapping for focal therapy planning
ArteraAI Prostate Artera De Novo 2025 10-year outcome prognostication

Paige Prostate was the first AI product authorized for digital pathology in any disease. In the FDA study, pathologists using Paige improved cancer detection sensitivity from 89.5% to 96.8%, with a 70% reduction in false negatives and 24% reduction in false positives (Paige, FDA DEN200080). Non-specialist pathologists using Paige matched the performance of prostate pathology specialists without it.

ArteraAI Prostate uses multimodal AI analyzing digital biopsy images together with clinical data to prognosticate 10-year risk of distant metastasis and prostate cancer-specific mortality. Validated across 6 phase 3 randomized trials with median 11.4-year follow-up, the tool showed 9.2–14.6% relative improvement over NCCN risk stratification across all endpoints (Artera, August 2025). The De Novo authorization includes a Predetermined Change Control Plan allowing capability expansion without new 510(k) submissions.

Avenda iQuest (Unfold AI) combines MRI and biopsy data with deep learning to map cancer location within the prostate. Urologists using the tool improved tumor extent identification sensitivity from 37% to 97% and changed treatment recommendations in 27% of cases, predominantly shifting toward more localized therapy (Avenda Health).

Gleason Grading AI

GleasonXAI, an explainable AI for Gleason pattern grading, was trained on 1,015 tissue microarray core images annotated by 54 international pathologists from 10 countries. Unlike conventional black-box models, GleasonXAI uses pathologist-defined terminology and “soft labels” reflecting inter-pathologist variability to provide transparent pattern-level explanations. The model achieved equivalent or better accuracy than conventional approaches while offering interpretability (Mittmann et al., 2025). The team also released the largest freely available dataset with explanatory Gleason pattern annotations.

PI-RADS and AI Integration

Multiparametric MRI (mpMRI) with PI-RADS (Prostate Imaging Reporting and Data System) scoring guides targeted biopsy decisions. AI aims to:

  • Detect suspicious lesions on MRI
  • Assign PI-RADS-equivalent scores
  • Reduce inter-reader variability
  • Improve clinically significant prostate cancer (csPCa) detection

PI-RADS Steering Committee Standards

PI-RADS Steering Committee AI Requirements (2025)

The PI-RADS Steering Committee published requirements for AI development in prostate MRI (Turkbey et al., 2025):

Performance benchmarks:

  • Cancer detection rate: 40-70% for PI-RADS 4 or higher lesions
  • Demonstration of equivalent or better performance than radiologists
  • ROC and precision-recall curves required

Reporting requirements:

  • Training data composition and demographics
  • Biopsy correlation methodology
  • External validation in independent populations
  • Specific failure mode analysis

Clinical context: - Focus on biopsy-naive men with positive clinical screening - Clinically significant cancer (Gleason ≥7) as primary endpoint

Current AI Performance

External validation studies (2024):

Study AI System csPCa Sensitivity Comparison
Belue et al., 2025 Biparametric AI 88.4% Comparable to radiologists (89.5%)
mdprostate PI-RADS classification 85.5% (PI-RADS ≥4) Specificity 63.2%
Deep learning model mpMRI analysis AUC 0.902 vs. PI-RADS AUC 0.759

Key finding: Combining AI with radiologist interpretation improves csPCa sensitivity by 5.8% compared to either alone.

External Validation Challenges

AI performance degrades on external MRI scans: - Lesion detection: 39.7% (external) vs. 56.0% (in-house) - csPCa detection: 61% (external) vs. 79% (in-house)

Factors affecting performance: - MRI quality (especially diffusion-weighted imaging) - Scanner differences - Protocol variations

Clinical Implementation

Prostate MRI AI is best positioned for: - Second-read quality assurance - Lesion detection in high-volume practices - Training and education

Not ready for: - Autonomous PI-RADS scoring without radiologist review - Replacement of urologic clinical judgment

Bladder Cancer AI

Bladder cancer AI has progressed from research-stage to multicenter validation and, for one tool, FDA Breakthrough Device Designation.

Cystoscopy AI:

A multicenter diagnostic study (CAIDS) trained on 69,204 cystoscopy images from 10,729 patients across 6 hospitals achieved diagnostic accuracy of 0.977 (internal validation) and up to 0.991 (external validation), with 93.9% accuracy and 95.4% sensitivity, surpassing experienced urologists (Shkolyar et al., 2022). At AUA 2025, a lightweight AI model from the University of Tsukuba demonstrated improved detection of difficult lesions: small tumor detection rose from 39.9% to 61.7%, flat lesions from 56.6% to 76.2%, and carcinoma in situ identification from 70.6% to 89.2% (EUS/AUA 2025).

AI vs. clinician performance: A 2025 meta-analysis found AI sensitivity and specificity both 83% (DOR 24, AUC 0.89), compared with clinicians at 78% each (DOR 13, AUC 0.81). AI outperformed clinicians across all measured metrics (World Journal of Urology, 2025).

Risk stratification: The PROGRxN-BCa model, evaluated on the largest non-muscle-invasive bladder cancer (NMIBC) cohort to date (n=12,659), outperformed existing risk models by approximately 10% on overall c-index, including both EAU and AUA risk models, particularly for high-grade Ta disease (AUA 2025).

TOBY urine test: In June 2025, the FDA granted Breakthrough Device Designation to an AI-powered urine test that analyzes volatile organic compounds (VOCs) via gas chromatography-mass spectrometry. Internal validation showed AUC >0.9 for bladder cancer detection from a single noninvasive sample (TOBY, June 2025). Not yet FDA-cleared for clinical use.

Kidney Stone AI

Intraoperative stone detection (AiFURS):

The AiFURS system provides real-time detection, classification, and measurement of kidney stones during flexible ureteroscopy. Clinical validation (100 in vivo cases, 80 external validation cases) demonstrated diagnostic accuracy of 92.2–95.3% in vivo and 86.8–92.2% on external validation, outperforming expert surgeons in patient-level stone type prediction (npj Digital Medicine, 2025).

CT-based stone composition: AI predicts stone composition from CT characteristics (calcium oxalate vs. uric acid) to guide treatment selection (ESWL vs. ureteroscopy vs. PCNL). Research studies show 80–90% accuracy for common stone types.

AI-enhanced surgical robotics: Johnson & Johnson’s Monarch platform, developed in partnership with NVIDIA, integrates AI-driven digital twin simulation for kidney stone procedure planning and training. U.S. commercial launch is planned for 2026 (J&J/NVIDIA, 2025).

Urologic Robotic Surgery

The robotic surgery landscape for urology expanded significantly in 2025, breaking the single-platform paradigm:

Hugo RAS (Medtronic): FDA cleared in December 2025 for urologic procedures including prostatectomy, nephrectomy, and cystectomy, covering approximately 230,000 U.S. surgeries annually. Validated in the Expand URO IDE study (137 patients). The platform has been used in tens of thousands of procedures across 30+ countries (Medtronic, December 2025).

Focal One HIFU: Updated with AI-driven algorithms for tissue ablation visualization and treatment evaluation in prostate cancer (Urology Times, 2025).

Current status: AI integration with robotic platforms includes tissue recognition, surgical phase identification, and quality metrics analysis. Autonomous AI surgery remains unapproved for urologic applications.


Part 3: Otolaryngology AI

AAO-HNS Task Force Report (2024)

AAO-HNS AI Task Force Guidance

The American Academy of Otolaryngology-Head and Neck Surgery Task Force published guidance on AI integration (AAO-HNS, 2024):

Identified applications:

  • Precision medicine in head and neck cancer
  • Clinical decision support
  • Operational efficiency (scheduling, documentation)
  • Research and education tools

Key challenges:

  • Data quality and bias
  • Health equity concerns
  • Privacy and security
  • Regulatory gaps
  • Ethical considerations

Recommendations:

  • Careful validation before clinical deployment
  • Attention to health equity implications
  • Transparency in AI decision-making
  • Specialty-specific training data development

Head and Neck Cancer AI

Oropharyngeal cancer multimodal prognostication:

A multimodal fusion framework (SMuRF) integrating CT imaging of the primary tumor and lymph nodes with whole-slide pathology images predicted disease-free survival and tumor grade in HPV-associated oropharyngeal squamous cell carcinoma (n=277). The model achieved c-index of 0.81 (development) and 0.79 (test) for disease-free survival, functioning as an independent prognostic biomarker with a hazard ratio of 17 (95% CI 4.9–58, p<0.0001) after controlling for clinical variables (Song et al., 2025). This represents the first study combining radiology and pathology imaging for biomarker discovery in oropharyngeal cancer.

Surgical Navigation AI: Safety Signals

TruDi Navigation System: AI Safety Case Study

The TruDi Navigation System (Acclarent/J&J), used for endoscopic sinus surgery, provides a cautionary example of AI integration in surgical navigation. After three years on the market with eight reported malfunctions, AI algorithms were added to the system. Subsequently, over 100 malfunctions and adverse events were reported, with at least 10 patients injured between late 2021 and November 2025 (Reuters investigation, February 2026).

Reported injuries include:

  • Skull base puncture with cerebrospinal fluid leak
  • Carotid artery injury causing stroke requiring ICU admission
  • Incorrect instrument location information during intracranial procedures

Allegations in ongoing litigation:

  • Accuracy target of only 80% for some AI-enhanced features before market integration
  • Multiple concurrent lawsuits against Acclarent

Clinical implication: AI-enhanced surgical navigation systems require the same rigorous validation as the underlying navigation hardware. Surgeons should not assume AI enhancement improves safety; independent validation of accuracy claims is essential before clinical adoption.

Hearing AI Applications

Smartphone-based audiometry:

AI enables hearing screening outside traditional audiology settings: - Direct-to-consumer apps for hearing self-assessment - School and community screening programs - Remote monitoring for hearing aid users

Evidence:

  • Smartphone audiometry shows good correlation with standard audiometry in controlled settings
  • Real-world performance varies with ambient noise and user technique
  • Does not replace comprehensive audiologic evaluation for diagnosis

Age-related hearing loss (ARHL):

The AAO-HNS Clinical Practice Guideline on ARHL (2024) provides context: - ARHL affects 1 in 3 adults age 65-74 - Associated with dementia, depression, falls - AI screening could expand early detection

Cochlear Nexa smart implant:

The Cochlear Nucleus Nexa System, FDA approved July 2025, is the first cochlear implant with upgradeable firmware and internal memory (Cochlear, July 2025). Like smartphones, the implant firmware can be updated to enable new features without surgical revision. The system includes the smallest and lightest sound processor with all-day battery life.

AI-predicted cochlear implant outcomes:

A multicenter study (n=278 children across U.S., Australia, and Hong Kong) used deep transfer learning on pre-implantation brain MRI scans to predict spoken language outcomes 1–3 years after cochlear implantation. The model achieved 92% accuracy in predicting language development trajectories, enabling a “predict-to-prescribe” approach where children likely to have more difficulty with spoken language can be identified before implantation and offered intensified therapy earlier (Tobey et al., 2025).

Hearing aid optimization:

AI powers automatic adjustment of hearing aids based on: - Acoustic environment detection - User preferences and listening patterns - Real-time speech enhancement

Voice Analysis AI

Applications:

  1. Vocal cord pathology detection
    • Analysis of voice recordings for nodules, polyps, paralysis
    • Screening for laryngeal cancer
  2. Speech therapy monitoring
    • Objective voice quality measures
    • Treatment response tracking
  3. Neurological voice changes
    • Parkinson’s disease voice biomarkers
    • Stroke-related dysarthria assessment

Status: Research-stage. No FDA-cleared diagnostic voice AI for ENT applications.

Sleep Apnea Screening

AI tools analyze: - Snoring patterns from audio recordings - Movement data from wearables - Oximetry trends

Performance: Screening tools show 80-90% sensitivity for moderate-severe OSA in research settings. Cannot replace polysomnography for diagnosis.


Part 4: Professional Society Positions

Gastroenterology Societies

American Gastroenterological Association (AGA):

The AGA published its first guideline critically evaluating AI in GI care in 2025:

The AGA Clinical Practice Update on AI in Polyp Diagnosis (Samarasena et al., 2023) provides additional context on polyp characterization AI.

American College of Gastroenterology (ACG):

ACG has not published a formal position statement on AI/machine learning. The 2024 ACG/ASGE Quality Indicators for Colonoscopy (Rex et al., 2024):

  • Establishes ADR benchmarks (≥35% overall, 40% men, 30% women)
  • Does not include AI or CADe as a quality indicator
  • Focuses on technique-based metrics: withdrawal time (≥8 minutes), bowel preparation adequacy (≥90%)

American Society for Gastrointestinal Endoscopy (ASGE):

ASGE has been most active on AI policy among GI societies:

American Urological Association (AUA)

The AUA has no standalone AI clinical practice guideline. Review of official AUA Policy and Position Statements confirms no AI-specific policy document. However, the AUA has increased AI-related advocacy and education:

  • Advocacy position (2024): The AUA recognizes AI as “inevitably integral to health care” and has identified strategic initiatives including education on AI use/misuse, incorporation into AUA committee activities, and infrastructure assessment (AUA Advocacy, June 2024)
  • Annual meeting programming: AUA 2025 featured dedicated AI courses including “Practical AI for Practicing Urologists” and multiple AI-focused abstract sessions (AUA 2025)

Prostate MRI guidance: The AUA/SUO Early Detection of Prostate Cancer Guideline (2023) mentions AI only in Future Directions: “evolving MRI protocols, such as biparametric MRI and use of artificial intelligence, requires further study.” AI is not recommended as a current clinical adjunct.

Provider sentiment: A 2025 survey of urology healthcare providers found 83.4% believed AI will improve efficiency, but 82% expressed concerns about technical reliability and 76% worried about diagnostic errors from generative AI (Healthcare, 2025).

Content restrictions: The AUA Privacy Policy explicitly prohibits uploading AUA content into AI systems for training purposes.

Surgical Subspecialty Society Positions

American Academy of Orthopaedic Surgeons (AAOS):

  • Position Statement on Artificial Intelligence (Document #1193, February 2025)
  • Addresses physician understanding of AI benefits, risks, and ethical considerations
  • Emphasizes need for socio-economic awareness of AI integration

American Association of Neurological Surgeons (AANS)/Council of State Neurosurgical Societies (CSNS):

Society of American Gastrointestinal and Endoscopic Surgeons (SAGES):

Society of University Surgeons:

Societies Without Formal AI Position Statements

Several surgical societies have educational resources but no formal AI policy documents:

  • American College of Surgeons (ACS): Informatics and AI Committee, educational programming only
  • Society of Thoracic Surgeons (STS): Educational content on AI/ML, no formal position
  • American Society of Colon and Rectal Surgeons (ASCRS): Educational webinars only
  • American Society of Plastic Surgeons (ASPS): Educational articles, no formal position

Cross-Specialty Themes

Across surgical societies with formal positions, consistent themes emerge:

  • AI as adjunct, not replacement, for surgical judgment
  • Specialty-specific validation required before deployment
  • Human oversight of AI recommendations mandatory
  • Academic integrity standards for AI-generated content
  • Concerns about training data bias and equity

Clinical Scenarios

Case: During a screening colonoscopy with CADe, the AI system generates an alert highlighting a mucosal fold. The endoscopist examines the area and determines it is a false positive. This is the fourth false alert during this procedure.

Question: How should the endoscopist manage alert fatigue while maintaining detection quality?

Discussion

Understanding alert fatigue:

CADe systems have high sensitivity, meaning they detect most polyps but also generate false positives for: - Mucosal folds - Stool particles - Artifacts - Vascular patterns

Appropriate response:

  1. Examine each alert carefully: Even with fatigue, each alert deserves brief evaluation
  2. Document appropriately: Override reason helps quality tracking
  3. Maintain technique: CADe supplements but doesn’t replace systematic inspection
  4. Provide feedback: Some systems allow false positive marking to improve algorithms

What not to do:

  • Ignore alerts due to accumulated fatigue
  • Rely solely on AI (it misses flat lesions, has detection gaps)
  • Reduce withdrawal time because AI is “watching”
Teaching point: Alert fatigue is a recognized limitation of CADe. High-quality colonoscopy technique remains essential. AI assists detection but cannot substitute for careful examination.

Case: A 62-year-old man with elevated PSA undergoes multiparametric prostate MRI. The radiologist assigns PI-RADS 3 (equivocal). An AI second-read system identifies the same lesion and assigns it as high-risk (equivalent to PI-RADS 4).

Question: How should the urologist interpret this discordance?

Discussion

Understanding discordance:

PI-RADS 3 represents the most challenging interpretation: - 12-40% probability of clinically significant cancer - Biopsy decision depends on clinical context - AI may have different threshold calibration than radiologists

Factors to consider:

  1. AI validation: Was this AI system validated on similar patient populations?
  2. Clinical context: PSA density, prior biopsy results, family history
  3. Lesion characteristics: Location, size, DWI signal
  4. Patient preferences: Risk tolerance for biopsy vs. active surveillance

Possible approaches:

  • Discuss discordance with radiologist
  • Consider targeted biopsy (MRI-TRUS fusion or cognitive targeting)
  • Repeat MRI if quality concerns
  • PSA density and other biomarkers for risk stratification

What not to do:

  • Automatically defer to AI over radiologist
  • Ignore AI finding without consideration
  • Proceed to saturation biopsy without targeted approach
Teaching point: AI second-read can identify lesions that may warrant additional attention. Discordance should prompt discussion rather than automatic action. Clinical judgment integrates AI findings with patient-specific factors.

Case: A 68-year-old patient shows you results from a smartphone hearing screening app indicating moderate hearing loss. They ask if they need hearing aids.

Question: How should you counsel this patient about the app results?

Discussion

Smartphone audiometry limitations:

  • Ambient noise affects results
  • Headphone quality varies
  • Calibration may not match clinical audiometers
  • Cannot assess word recognition, speech-in-noise, or middle ear function

Appropriate response:

  1. Validate concern: The app results suggest possible hearing loss worth evaluating
  2. Recommend formal testing: Refer to audiology for comprehensive evaluation
  3. Discuss ARHL: Age-related hearing loss is common and treatable
  4. Manage expectations: App results may overestimate or underestimate actual loss

Audiologic evaluation includes:

  • Pure tone audiometry in sound-treated booth
  • Speech recognition testing
  • Tympanometry for middle ear function
  • Hearing aid candidacy assessment

When apps are valuable:

  • Motivating patients to seek evaluation
  • Monitoring known hearing loss between visits
  • Screening in resource-limited settings
Teaching point: Smartphone hearing apps serve as screening, not diagnostic, tools. Positive results should prompt formal audiologic evaluation. Treatment decisions require comprehensive assessment.

Case: You are a GI division chief evaluating whether to purchase a CADe colonoscopy system. The sales representative presents RCT data showing 15% relative improvement in ADR. Your division’s current mean ADR is 45%.

Question: What factors should inform this decision?

Discussion

Evaluating the evidence:

The RCT data is promising, but consider:

  1. Baseline ADR matters: Your division ADR of 45% exceeds quality benchmarks (25-30% minimum). Improvement may be smaller for high performers.

  2. Real-world vs. RCT performance: Meta-analysis of real-world GI Genius studies showed no significant ADR improvement (RR 0.96). Implementation conditions differ from trials.

  3. What improves: Primarily small adenomas and sessile serrated lesions. Advanced adenoma detection may not change.

  4. What increases: Non-neoplastic polypectomy (false positives leading to unnecessary removal).

Cost-benefit analysis:

  • System cost (typically $100,000+ plus per-procedure fees)
  • Procedure time: May increase slightly
  • Revenue: No specific reimbursement for CADe use
  • Quality metrics: Potential ADR improvement affects reporting

Implementation requirements:

  • Workflow integration
  • Endoscopist training
  • IT support
  • Quality monitoring to verify benefit

Recommendation:

  • Honest assessment of current quality gaps
  • Pilot period with outcome tracking
  • Focus on technique improvement alongside technology
  • Consider centers with lower baseline ADR as priority
Teaching point: CADe investment should follow analysis of local quality data, realistic performance expectations, and implementation capacity. RCT results represent best-case scenarios.

Key Takeaways

Clinical Bottom Line

Gastroenterology AI:

  • Colonoscopy AI has the strongest RCT evidence base; real-world effects are smaller
  • AI-ECG doubled liver cirrhosis detection in primary care RCT (Nature Medicine, 2025)
  • Celiac disease pathology AI achieves pathologist-level performance (NEJM AI, 2025)
  • GRAPE gastric cancer screening validated across 16 centers (AUC 0.927)
  • NaviCam ProScan is first FDA-cleared AI capsule endoscopy reading tool
  • AGA makes no recommendation for or against CADe colonoscopy (very low certainty evidence on cancer outcomes)

Urology AI:

  • Three FDA-authorized prostate cancer tools: Paige Prostate (pathology detection), ArteraAI (prognostication), Avenda iQuest (tumor mapping)
  • ArteraAI validated across 6 phase 3 RCTs with median 11.4-year follow-up
  • Bladder cancer AI outperforms clinicians in meta-analysis (AUC 0.89 vs. 0.81)
  • TOBY urine test (FDA Breakthrough Designation, 2025) offers noninvasive bladder cancer detection
  • AiFURS achieves 92–95% intraoperative kidney stone detection accuracy
  • Hugo RAS (Medtronic) FDA-cleared December 2025 for urologic procedures
  • External validation shows performance degradation across sites for all tools

Otolaryngology AI:

  • AAO-HNS Task Force (2024) provides implementation guidance
  • Cochlear Nexa (FDA approved July 2025): first smart implant with upgradeable firmware
  • AI predicts cochlear implant language outcomes with 92% accuracy (JAMA Otolaryngology)
  • Oropharyngeal cancer multimodal AI achieves c-index 0.81 for survival prediction (eBioMedicine)
  • TruDi surgical navigation: 100+ malfunctions after AI enhancement, at least 10 injuries
  • Voice analysis and sleep apnea screening AI remain research-stage

Implementation principles:

  • Real-world performance often lags RCT results
  • AI supplements but does not replace surgical skill and judgment
  • AI enhancement of existing devices does not guarantee improved safety (TruDi case)
  • Specialty-specific validation is essential
  • Alert fatigue affects all real-time detection AI