[Surgical Subspecialties]{.chapter-title}

doi:10.5281/zenodo.18251405

Surgical Subspecialties

Colonoscopy AI has the strongest randomized controlled trial evidence of any surgical AI application. Multiple RCTs show 10-20% relative improvement in adenoma detection rate. But real-world implementation studies reveal a troubling gap: benefits shrink or disappear outside trial conditions. Prostate MRI AI performs comparably to radiologists for PI-RADS scoring. ENT applications for hearing screening and voice analysis are emerging.

Learning Objectives

After reading this chapter, you will be able to:

Evaluate colonoscopy AI for polyp detection with RCT evidence
Understand prostate MRI AI and PI-RADS integration
Assess voice and hearing AI applications in otolaryngology
Navigate specialty-specific limitations and opportunities
Apply evidence-based frameworks for surgical subspecialty AI

Chapter Summary (TL;DR)

The Clinical Context: Surgical subspecialties have developed specialty-specific AI applications with varying evidence levels. Colonoscopy AI has the strongest RCT evidence. Prostate MRI AI is increasingly adopted. Other applications remain early-stage.

What Works Well:

Specialty	Application	Evidence Level	Key Finding
GI/Colorectal	Colonoscopy polyp detection	Strong RCT evidence	ADR improves 10-20% relative
Urology	Prostate MRI AI (PI-RADS)	Moderate	Comparable sensitivity to radiologists
ENT	Hearing screening AI	Emerging	Expanding access where audiologists unavailable

What’s Emerging:

Specialty	Application	Status	Notes
Urology	Cystoscopy AI	Research	Bladder lesion detection
ENT	Voice analysis AI	Research	Vocal cord pathology, speech disorders
Plastic surgery	Aesthetic outcome prediction	Research	Ethical concerns about beauty standards

Critical Insight: Colonoscopy AI improves adenoma detection in RCTs but real-world implementation studies show smaller or absent effects. The gap between trial and practice performance applies across surgical AI.

The Bottom Line: Colonoscopy AI has the strongest evidence base, with FDA-cleared systems and RCT support. Prostate MRI AI performs comparably to radiologists but requires specialized validation. ENT AI is emerging for hearing and voice applications. Real-world implementation often underperforms RCT results.

Introduction

Colonoscopy AI has something most surgical AI lacks: multiple randomized controlled trials showing 10-20% relative improvement in adenoma detection rate, where each 1% ADR increase correlates with 3% reduced interval cancer risk. Yet real-world implementation studies consistently show smaller benefits than trial conditions predicted. Alert fatigue, variable endoscopist technique, and uncontrolled clinical environments erode the gains seen under protocol-driven RCT conditions. This gap between trial performance and practice performance recurs across prostate MRI AI, ENT applications, and every other surgical subspecialty where AI has been deployed.

Part 1: Colonoscopy AI: The Evidence Leader

Clinical Need

Colonoscopy quality varies substantially by endoscopist. Adenoma detection rate (ADR), the proportion of screening colonoscopies detecting at least one adenoma, correlates with interval colorectal cancer risk. Each 1% increase in ADR is associated with 3% reduced interval cancer risk (Corley et al., 2014).

Computer-aided detection (CADe) systems aim to reduce missed polyps by providing real-time alerts during colonoscopy.

FDA-Cleared Systems

Multiple CADe systems have received FDA clearance:

System	Manufacturer	FDA Clearance	Key Features
GI Genius	Medtronic	2021	Real-time polyp detection overlay
CAD EYE	Fujifilm	2022	Integrated with endoscopy processor
EndoScreener	Wision AI	2022	AI detection with size estimation
ENDO-AID	Olympus	2023	Multiple detection modes

RCT Evidence

Meta-analysis findings (2024):

The largest meta-analysis of AI-assisted colonoscopy included 44 RCTs (Soleymanjahi et al., 2024):

ADR increased from 36.7% to 44.7% (RR 1.21, 95% CI 1.15-1.28)
Consistent benefit across CADe platforms
Improved detection of sessile serrated lesions

GI Genius specific evidence:

Study	Design	ADR Effect
Repici et al., 2020	RCT	ADR 54.8% vs 40.4% (RR 1.30)
COLO-DETECT, 2024	Pragmatic RCT	RR 1.12 (95% CI 1.03-1.22)
Meta-analysis GI Genius studies	Multiple	Variable, I²=64%

Real-World vs. RCT Performance

The Implementation Gap

RCT vs. real-world discordance:

Real-world implementation studies show smaller or absent benefits compared to RCTs (Wei et al., 2024):

Overall real-world ADR: 36.3% with CADe vs. 35.8% without (RR 1.13)
GI Genius specifically: No significant difference (RR 0.96, 95% CI 0.85-1.07)

Why the gap?

RCT conditions: Protocol adherence, selected endoscopists, controlled environments
Real-world conditions: Variable technique, alert fatigue, workflow integration challenges
Ceiling effect: High-performing endoscopists may not benefit from AI

Clinical implication: CADe AI is a supplement to, not substitute for, rigorous colonoscopy technique. Centers with low baseline ADR may see greater benefit.

False Positive Burden

CADe increases detection of non-neoplastic polyps: - Hyperplastic polyps (not requiring removal if <5mm in rectosigmoid) - Artifacts, stool, mucosal folds triggering false alerts

Consequence: Increased polypectomy of benign lesions represents unnecessary intervention and procedural risk.

Serrated Lesion Detection

Sessile serrated lesions (SSLs) are precursors to interval cancers and historically difficult to detect. Meta-analysis shows CADe improves SSLDR (RR 1.27, 95% CI 1.11-1.47).

However: Advanced adenoma detection rate (aADR), arguably the most clinically relevant metric, shows no significant improvement (RR 1.01, 95% CI 0.90-1.13).

Part 2: Prostate Cancer Detection AI

PI-RADS and AI Integration

Multiparametric MRI (mpMRI) with PI-RADS (Prostate Imaging Reporting and Data System) scoring guides targeted biopsy decisions. AI aims to:

Detect suspicious lesions on MRI
Assign PI-RADS-equivalent scores
Reduce inter-reader variability
Improve clinically significant prostate cancer (csPCa) detection

PI-RADS Steering Committee Standards

PI-RADS Steering Committee AI Requirements (2025)

The PI-RADS Steering Committee published requirements for AI development in prostate MRI (Turkbey et al., 2025):

Performance benchmarks:

Cancer detection rate: 40-70% for PI-RADS 4 or higher lesions
Demonstration of equivalent or better performance than radiologists
ROC and precision-recall curves required

Reporting requirements:

Training data composition and demographics
Biopsy correlation methodology
External validation in independent populations
Specific failure mode analysis

Clinical context: - Focus on biopsy-naive men with positive clinical screening - Clinically significant cancer (Gleason ≥7) as primary endpoint

Current AI Performance

External validation studies (2024):

Study	AI System	csPCa Sensitivity	Comparison
Belue et al., 2025	Biparametric AI	88.4%	Comparable to radiologists (89.5%)
mdprostate	PI-RADS classification	85.5% (PI-RADS ≥4)	Specificity 63.2%
Deep learning model	mpMRI analysis	AUC 0.902	vs. PI-RADS AUC 0.759

Key finding: Combining AI with radiologist interpretation improves csPCa sensitivity by 5.8% compared to either alone.

External Validation Challenges

AI performance degrades on external MRI scans: - Lesion detection: 39.7% (external) vs. 56.0% (in-house) - csPCa detection: 61% (external) vs. 79% (in-house)

Factors affecting performance: - MRI quality (especially diffusion-weighted imaging) - Scanner differences - Protocol variations

Clinical Implementation

Prostate MRI AI is best positioned for: - Second-read quality assurance - Lesion detection in high-volume practices - Training and education

Not ready for: - Autonomous PI-RADS scoring without radiologist review - Replacement of urologic clinical judgment

Part 3: Otolaryngology AI

AAO-HNS Task Force Report (2024)

AAO-HNS AI Task Force Guidance

The American Academy of Otolaryngology-Head and Neck Surgery Task Force published guidance on AI integration (AAO-HNS, 2024):

Identified applications:

Precision medicine in head and neck cancer
Clinical decision support
Operational efficiency (scheduling, documentation)
Research and education tools

Key challenges:

Data quality and bias
Health equity concerns
Privacy and security
Regulatory gaps
Ethical considerations

Recommendations:

Careful validation before clinical deployment
Attention to health equity implications
Transparency in AI decision-making
Specialty-specific training data development

Hearing AI Applications

Smartphone-based audiometry:

AI enables hearing screening outside traditional audiology settings: - Direct-to-consumer apps for hearing self-assessment - School and community screening programs - Remote monitoring for hearing aid users

Evidence:

Smartphone audiometry shows good correlation with standard audiometry in controlled settings
Real-world performance varies with ambient noise and user technique
Does not replace comprehensive audiologic evaluation for diagnosis

Age-related hearing loss (ARHL):

The AAO-HNS Clinical Practice Guideline on ARHL (2024) provides context: - ARHL affects 1 in 3 adults age 65-74 - Associated with dementia, depression, falls - AI screening could expand early detection

Hearing aid optimization:

AI powers automatic adjustment of hearing aids based on: - Acoustic environment detection - User preferences and listening patterns - Real-time speech enhancement

Voice Analysis AI

Applications:

Vocal cord pathology detection
- Analysis of voice recordings for nodules, polyps, paralysis
- Screening for laryngeal cancer
Speech therapy monitoring
- Objective voice quality measures
- Treatment response tracking
Neurological voice changes
- Parkinson’s disease voice biomarkers
- Stroke-related dysarthria assessment

Status: Research-stage. No FDA-cleared diagnostic voice AI for ENT applications.

Sleep Apnea Screening

AI tools analyze: - Snoring patterns from audio recordings - Movement data from wearables - Oximetry trends

Performance: Screening tools show 80-90% sensitivity for moderate-severe OSA in research settings. Cannot replace polysomnography for diagnosis.

Part 4: Urology AI Beyond Prostate

Bladder Cancer Detection

Cystoscopy AI:

AI analysis of cystoscopy video for: - Bladder tumor detection - Blue light cystoscopy enhancement - Mapping of multifocal disease

Status: Research-stage. No FDA-cleared autonomous cystoscopy AI.

Kidney Stone Analysis

CT-based stone composition:

AI predicts stone composition from CT characteristics: - Calcium oxalate vs. uric acid differentiation - Treatment selection (ESWL vs. ureteroscopy vs. PCNL) - Metabolic stone prevention guidance

Performance: Research studies show 80-90% accuracy for common stone types. Not validated for clinical decision-making.

Urologic Robotic Surgery

AI integration with robotic surgery platforms: - Real-time tissue recognition - Surgical phase identification - Quality metrics analysis

Current status: Assistive features available; autonomous AI surgery not approved for urologic applications.

Part 5: Professional Society Positions

Gastroenterology Societies

American Gastroenterological Association (AGA):

The AGA published its first guideline critically evaluating AI in GI care in 2025:

AGA Living Clinical Practice Guideline on CADe-Assisted Colonoscopy (Sultan et al., 2025)
Makes no recommendation for or against CADe-assisted colonoscopy due to very low certainty of evidence regarding cancer outcomes
Acknowledges modest ADR improvement (44.8% vs. 37.4%) but questions clinical significance
Raises concern about overdiagnosis: 635 additional surveillance colonoscopies per 10,000 patients

The AGA Clinical Practice Update on AI in Polyp Diagnosis (Samarasena et al., 2023) provides additional context on polyp characterization AI.

American College of Gastroenterology (ACG):

ACG has not published a formal position statement on AI/machine learning. The 2024 ACG/ASGE Quality Indicators for Colonoscopy (Rex et al., 2024):

Establishes ADR benchmarks (≥35% overall, 40% men, 30% women)
Does not include AI or CADe as a quality indicator
Focuses on technique-based metrics: withdrawal time (≥8 minutes), bowel preparation adequacy (≥90%)

American Society for Gastrointestinal Endoscopy (ASGE):

ASGE has been most active on AI policy among GI societies:

Position Statement on Priorities for AI in GI Endoscopy (2020)
ASGE AI Task Force consensus statements (2024)

American Urological Association (AUA)

The AUA has no standalone AI clinical practice guideline. Review of official AUA Policy and Position Statements confirms no AI-specific policy document.

Prostate MRI guidance: The AUA/SUO Early Detection of Prostate Cancer Guideline (2023) mentions AI only in Future Directions: “evolving MRI protocols, such as biparametric MRI and use of artificial intelligence, requires further study.” AI is not recommended as a current clinical adjunct.

Content restrictions: The AUA Privacy Policy explicitly prohibits uploading AUA content into AI systems for training purposes.

Surgical Subspecialty Society Positions

American Academy of Orthopaedic Surgeons (AAOS):

Position Statement on Artificial Intelligence (Document #1193, February 2025)
Addresses physician understanding of AI benefits, risks, and ethical considerations
Emphasizes need for socio-economic awareness of AI integration

American Association of Neurological Surgeons (AANS)/Council of State Neurosurgical Societies (CSNS):

Policy Statement on the Use of AI in Neurosurgery (2025)
Five core domains: responsible use, privacy/security, transparency, academic integrity, FDA/IRB oversight
Key position: AI should augment, not replace, human decision-making

Society of American Gastrointestinal and Endoscopic Surgeons (SAGES):

Defining Digital Surgery White Paper (Ali et al., 2024)
Consensus Recommendations on Surgical Video Data (2023) for AI research standards
Active AI Task Force work on video recording for machine learning

Society of University Surgeons:

Position Statement on AI in Surgical Training (Kewalramani et al., 2026)
Addresses AI literacy and prompt engineering as foundational competencies
Warns of cognitive off-loading and “deskilling” risks in surgical education

Societies Without Formal AI Position Statements

Several surgical societies have educational resources but no formal AI policy documents:

American College of Surgeons (ACS): Informatics and AI Committee, educational programming only
Society of Thoracic Surgeons (STS): Educational content on AI/ML, no formal position
American Society of Colon and Rectal Surgeons (ASCRS): Educational webinars only
American Society of Plastic Surgeons (ASPS): Educational articles, no formal position

Cross-Specialty Themes

Across surgical societies with formal positions, consistent themes emerge:

AI as adjunct, not replacement, for surgical judgment
Specialty-specific validation required before deployment
Human oversight of AI recommendations mandatory
Academic integrity standards for AI-generated content
Concerns about training data bias and equity

Clinical Scenarios

Scenario 1: Colonoscopy AI Alert Override

Case: During a screening colonoscopy with CADe, the AI system generates an alert highlighting a mucosal fold. The endoscopist examines the area and determines it is a false positive. This is the fourth false alert during this procedure.

Question: How should the endoscopist manage alert fatigue while maintaining detection quality?

Discussion

Understanding alert fatigue:

CADe systems have high sensitivity, meaning they detect most polyps but also generate false positives for: - Mucosal folds - Stool particles - Artifacts - Vascular patterns

Appropriate response:

Examine each alert carefully: Even with fatigue, each alert deserves brief evaluation
Document appropriately: Override reason helps quality tracking
Maintain technique: CADe supplements but doesn’t replace systematic inspection
Provide feedback: Some systems allow false positive marking to improve algorithms

What not to do:

Ignore alerts due to accumulated fatigue
Rely solely on AI (it misses flat lesions, has detection gaps)
Reduce withdrawal time because AI is “watching”

Teaching point: Alert fatigue is a recognized limitation of CADe. High-quality colonoscopy technique remains essential. AI assists detection but cannot substitute for careful examination.

Scenario 2: Prostate MRI AI Second Read

Case: A 62-year-old man with elevated PSA undergoes multiparametric prostate MRI. The radiologist assigns PI-RADS 3 (equivocal). An AI second-read system identifies the same lesion and assigns it as high-risk (equivalent to PI-RADS 4).

Question: How should the urologist interpret this discordance?

Discussion

Understanding discordance:

PI-RADS 3 represents the most challenging interpretation: - 12-40% probability of clinically significant cancer - Biopsy decision depends on clinical context - AI may have different threshold calibration than radiologists

Factors to consider:

AI validation: Was this AI system validated on similar patient populations?
Clinical context: PSA density, prior biopsy results, family history
Lesion characteristics: Location, size, DWI signal
Patient preferences: Risk tolerance for biopsy vs. active surveillance

Possible approaches:

Discuss discordance with radiologist
Consider targeted biopsy (MRI-TRUS fusion or cognitive targeting)
Repeat MRI if quality concerns
PSA density and other biomarkers for risk stratification

What not to do:

Automatically defer to AI over radiologist
Ignore AI finding without consideration
Proceed to saturation biopsy without targeted approach

Teaching point: AI second-read can identify lesions that may warrant additional attention. Discordance should prompt discussion rather than automatic action. Clinical judgment integrates AI findings with patient-specific factors.

Scenario 3: Hearing Screening App Referral

Case: A 68-year-old patient shows you results from a smartphone hearing screening app indicating moderate hearing loss. They ask if they need hearing aids.

Question: How should you counsel this patient about the app results?

Discussion

Smartphone audiometry limitations:

Ambient noise affects results
Headphone quality varies
Calibration may not match clinical audiometers
Cannot assess word recognition, speech-in-noise, or middle ear function

Appropriate response:

Validate concern: The app results suggest possible hearing loss worth evaluating
Recommend formal testing: Refer to audiology for comprehensive evaluation
Discuss ARHL: Age-related hearing loss is common and treatable
Manage expectations: App results may overestimate or underestimate actual loss

Audiologic evaluation includes:

Pure tone audiometry in sound-treated booth
Speech recognition testing
Tympanometry for middle ear function
Hearing aid candidacy assessment

When apps are valuable:

Motivating patients to seek evaluation
Monitoring known hearing loss between visits
Screening in resource-limited settings

Teaching point: Smartphone hearing apps serve as screening, not diagnostic, tools. Positive results should prompt formal audiologic evaluation. Treatment decisions require comprehensive assessment.

Scenario 4: CADe Implementation Decision

Case: You are a GI division chief evaluating whether to purchase a CADe colonoscopy system. The sales representative presents RCT data showing 15% relative improvement in ADR. Your division’s current mean ADR is 45%.

Question: What factors should inform this decision?

Discussion

Evaluating the evidence:

The RCT data is promising, but consider:

Baseline ADR matters: Your division ADR of 45% exceeds quality benchmarks (25-30% minimum). Improvement may be smaller for high performers.
Real-world vs. RCT performance: Meta-analysis of real-world GI Genius studies showed no significant ADR improvement (RR 0.96). Implementation conditions differ from trials.
What improves: Primarily small adenomas and sessile serrated lesions. Advanced adenoma detection may not change.
What increases: Non-neoplastic polypectomy (false positives leading to unnecessary removal).

Cost-benefit analysis:

System cost (typically $100,000+ plus per-procedure fees)
Procedure time: May increase slightly
Revenue: No specific reimbursement for CADe use
Quality metrics: Potential ADR improvement affects reporting

Implementation requirements:

Workflow integration
Endoscopist training
IT support
Quality monitoring to verify benefit

Recommendation:

Honest assessment of current quality gaps
Pilot period with outcome tracking
Focus on technique improvement alongside technology
Consider centers with lower baseline ADR as priority

Teaching point: CADe investment should follow analysis of local quality data, realistic performance expectations, and implementation capacity. RCT results represent best-case scenarios.

Key Takeaways

Clinical Bottom Line

Colonoscopy AI:

Strongest evidence base among surgical subspecialty AI
RCTs show 10-20% relative ADR improvement
Real-world implementation often shows smaller or absent effects
Does not replace rigorous colonoscopy technique
Increases detection of sessile serrated lesions but not advanced adenomas

Prostate MRI AI:

Performs comparably to radiologists for csPCa detection
Combining AI with radiologist improves sensitivity
External validation shows performance degradation
Best suited for second-read quality assurance
Cannot replace urologic clinical judgment

Otolaryngology AI:

AAO-HNS Task Force (2024) provides implementation guidance
Hearing screening apps expand access but require formal audiologic follow-up
Voice analysis AI is research-stage
Sleep apnea screening tools show promise but cannot replace polysomnography

Implementation principles:

Real-world performance often lags RCT results
AI supplements but does not replace surgical skill and judgment
Specialty-specific validation is essential
Alert fatigue affects all real-time detection AI
Cost-benefit analysis should be realistic about expected gains