Surgery, Anesthesiology, and Perioperative Care

Surgery combines technical skill, anatomical knowledge, and split-second decision-making under pressure. AI applications span preoperative risk assessment, intraoperative guidance, and postoperative monitoring. This chapter examines evidence-based AI tools for surgical specialties.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI systems for surgical risk prediction and optimization
  • Understand computer vision applications in robotic and minimally invasive surgery
  • Assess AI tools for surgical phase recognition and workflow analysis
  • Navigate AI-assisted surgical planning and simulation
  • Identify postoperative complication prediction systems
  • Recognize limitations and failure modes of surgical AI
  • Balance AI augmentation with surgical judgment and technical skill

The Clinical Context: Surgery presents unique AI challenges: high-stakes real-time decisions, anatomical variability, and zero tolerance for errors. Unlike diagnostic AI analyzing static images, surgical AI must operate in dynamic 3D environments with blood, smoke, and rapidly changing anatomy.

What Works Well:

Application Evidence Key Benefit
Preoperative Risk (MySurgeryRisk) Strong AUC 0.92 for complications (Bihorac et al., 2019)
Surgical Planning (3D segmentation) Solid 60-80% reduction in planning time
Joint Replacement Planning Solid Improved implant positioning
Fracture Detection Strong High accuracy for simple fractures
Postoperative Early Warning Moderate Reduces unplanned ICU transfers

What’s Emerging (Use with Caution):

Application Status Limitation
Surgical Phase Recognition Research Limited clinical impact
Anatomical Structure ID Research Blood/smoke degrade accuracy
Skill Assessment AI Emerging Doesn’t replace mentorship
Complication Prediction Variable High false positive rates

What to Avoid:

  • Autonomous surgical decisions: not validated, not safe
  • Critical structure identification without visual verification
  • Replacing surgical judgment with AI recommendations

Key Principles:

  1. AI augments expertise, never replaces it: Verify all AI-generated information
  2. Preoperative > Intraoperative AI: Better validated, lower stakes for errors
  3. Human oversight mandatory: No autonomous AI decisions during surgery
  4. Local validation essential: Academic center results may not generalize

The Bottom Line: Preoperative risk assessment AI (MySurgeryRisk) has strong evidence. Intraoperative AI is promising but not ready for clinical reliance. Surgeons must maintain independent judgment. AI is a tool, not a decision-maker.

Introduction

Surgery stands apart from other medical specialties in its immediacy, irreversibility, and technical demands. While radiologists can analyze images over minutes, surgeons make split-second decisions with scalpel in hand. While internists can adjust management based on patient response, surgical decisions, once made, cannot be easily undone.

This unique context shapes how AI can and cannot help surgeons. The most promising applications assist with the cognitive work surrounding surgery (risk assessment, planning, outcome prediction) rather than replacing the surgeon’s hands or judgment during the operation itself.

This chapter examines surgical AI applications across the perioperative spectrum, from preoperative optimization through postoperative care, with critical attention to what works, what doesn’t, and what remains science fiction.

Preoperative AI Applications

Surgical Risk Prediction

The Clinical Problem:

Surgeons face a fundamental question before every operation: Will this patient tolerate this procedure? Traditional risk assessment relies on clinical judgment supplemented by scoring systems (ASA classification, NSQIP risk calculator, RCRI for cardiac surgery). These tools have limitations:

  • Incorporate limited variables (20-30 factors)
  • Use linear models that miss complex interactions
  • Provide population-level estimates, not personalized predictions
  • Updated infrequently as new evidence emerges

Machine Learning Solutions:

Modern ML approaches improve risk prediction by:

  1. Analyzing larger feature sets: 100+ variables from EHR, imaging, labs, medications, vital signs, social determinants
  2. Capturing nonlinear relationships: Age × frailty × procedure complexity interactions
  3. Continuous learning: Models updated with new outcome data
  4. Personalized predictions: Patient-specific risk estimates rather than population averages

Evidence:

The MySurgeryRisk algorithm developed at University of Florida analyzed 400,000+ surgical cases and significantly outperformed traditional risk models (Bihorac et al., 2019):

  • 30-day mortality prediction: AUC 0.94 (vs. 0.89 for ASA score)
  • Major complications: AUC 0.88 (vs. 0.82 for NSQIP)
  • ICU admission: AUC 0.91
  • Hospital length of stay: Better calibration across risk spectrum

Similar results from other institutions: Stanford, Partners Healthcare, Penn Medicine all report improved risk prediction using ML on local data.

Clinical Applications:

How Risk Prediction AI Helps Clinicians

Preoperative Optimization:

  • Identify modifiable risk factors (anemia, hyperglycemia, nutritional deficits)
  • Triage patients for preoperative clinic vs. day-of-surgery admission
  • Guide prehabilitation referrals

Shared Decision-Making:

  • Provide personalized risk estimates during surgical consults
  • Facilitate discussions about alternative treatments
  • Support goals-of-care conversations for high-risk patients

Resource Allocation:

  • Predict ICU vs. floor bed requirements
  • Identify patients needing enhanced postoperative monitoring
  • Optimize OR scheduling based on predicted case duration

Quality Improvement:

  • Risk-adjust outcome comparisons between surgeons/hospitals
  • Identify outliers for focused improvement efforts
  • Benchmark performance against predicted outcomes

Critical Limitations:

Risk calculators should inform, not dictate, surgical decisions:

  • Algorithms miss important factors: patient goals, functional trajectory, social support, frailty nuances
  • High-risk patients may still benefit from surgery if alternative is certain poor outcome
  • Low-risk predictions don’t guarantee good outcomes
  • Models trained on one population may not generalize to different populations

Clinical Bottom Line: Use risk prediction AI to enhance shared decision-making and optimize preoperative preparation. Do not deny surgery based solely on algorithmic risk scores.

Preoperative Planning and Simulation

AI-Assisted Anatomical Segmentation:

Surgical planning for complex cases (oncologic resections, liver surgery, orthopedic reconstructions) traditionally requires manual analysis of CT/MRI to identify anatomy, plan approaches, and anticipate challenges. AI automates and enhances this process:

Applications:

Oncologic Surgery:

  • Tumor segmentation and volumetry
  • Relationship to critical structures (vessels, bile ducts, nerves)
  • Predicted resection margins
  • Assessment of resectability

Liver Surgery:

  • Vascular and biliary anatomy mapping
  • Liver volumetry for donation or resection planning
  • Future liver remnant calculation
  • Virtual hepatectomy simulation

Orthopedic Surgery:

  • Joint replacement planning (alignment, component sizing)
  • Osteotomy planning for deformity correction
  • Fracture reduction simulation
  • Bone tumor resection planning

Neurosurgery:

  • Brain tumor segmentation and eloquent cortex mapping
  • Surgical approach trajectory planning
  • Vascular anatomy for aneurysm clipping
  • Epilepsy focus localization

Evidence:

Studies across multiple surgical specialties show AI segmentation (Hashimoto et al., 2018):

  • Reduces planning time by 60-80% compared to manual segmentation
  • Achieves inter-rater reliability comparable to expert-to-expert agreement
  • Improves standardization of preoperative assessment (Topol, 2019)
  • Enhances patient counseling with 3D visualizations

Limitations:

  • Segmentation errors can propagate to surgical plans (always verify)
  • Quality depends on input imaging (motion artifacts, contrast timing)
  • Doesn’t account for intraoperative findings (adhesions, variant anatomy)
  • Most effective for anatomy-driven procedures with good imaging

3D Printing and Surgical Models:

AI-segmented anatomy can be converted to 3D-printed models for:

  • Pre-surgical rehearsal of complex cases
  • Patient education and consent
  • Trainee education
  • Custom surgical guides and implants

Clinical Impact: Mixed. Some studies show reduced operative time and improved outcomes for complex cases; others show no benefit beyond surgeon confidence. Cost and workflow integration remain barriers to widespread adoption.

Intraoperative AI Applications

Computer Vision in Minimally Invasive Surgery

The laparoscope and robotic camera create continuous video streams, ideal data for computer vision AI. Applications range from documentation to real-time guidance, with varying degrees of validation and clinical readiness.

Surgical Phase Recognition:

What it does: AI analyzes surgical video and identifies current phase (e.g., “dissection of gallbladder from liver bed” in laparoscopic cholecystectomy)

How it works: Deep learning models trained on annotated surgical videos learn to recognize instrument configurations, anatomical landmarks, and surgeon actions characteristic of each phase.

Performance:

  • Accuracy >90% for laparoscopic cholecystectomy (Twinanda et al., 2017)
  • Works across multiple procedures (bariatric, colorectal, gynecologic)
  • Real-time capability (15-30 frames/second)

Potential applications:

  • Context-aware instrument tracking
  • Automated surgical documentation
  • OR efficiency analysis
  • Surgical skill assessment
  • Adverse event detection

Current status: Primarily research tool. Limited clinical deployment because phase recognition alone doesn’t provide actionable guidance. Surgeons already know which phase they’re in.

Future potential: Phase recognition is foundational for more advanced applications (predictive alerts, context-aware instrument suggestions).

Anatomical Structure Recognition:

The promise: Computer vision identifies critical anatomy (bile ducts, ureters, vessels) to prevent surgical injury.

The reality: This is extraordinarily difficult and not yet clinically reliable.

Why it’s hard:

  1. Visual variability: Blood, smoke, retraction, lighting changes, cautery artifacts
  2. Anatomical variants: Textbook anatomy is the exception, not the rule
  3. Dynamic deformation: Tissue moves, stretches, changes appearance continuously
  4. Occlusion: Critical structures often partially hidden
  5. Context-dependence: What looks like ureter may be vessel or adhesion band

Current evidence:

Research systems demonstrate:

  • 70-85% accuracy for identifying major structures in ideal conditions
  • Performance degrades significantly with bleeding, inflammation, obesity
  • False positives and false negatives both occur at unacceptable rates

Critical safety concern:

Surgeons cannot rely on AI to definitively identify critical structures. Visual confirmation, tactile feedback, anatomical knowledge, and methodical dissection remain essential. AI suggesting “safe to divide this structure” is not acceptable with current technology.

More promising near-term application:

Warning systems: AI detecting absence of expected structures (“ureter not identified in expected location, double-check before dividing anything”) may be safer than positive identification. Alert surgeons to uncertainty rather than provide false confidence.

Augmented Reality Surgical Navigation

AR systems overlay preoperative imaging onto the surgeon’s view of the operative field, enhancing visualization and precision.

Applications:

Spine Surgery:

  • Real-time visualization of screw trajectories
  • Pedicle screw placement guidance
  • Reduces fluoroscopy exposure
  • FDA-cleared systems widely used

Neurosurgery:

  • Tumor localization during resection
  • Trajectory planning for deep lesions
  • Registration of preoperative MRI to intraoperative anatomy

Liver Surgery:

  • Overlay of vascular anatomy on liver surface
  • Guides parenchymal transection planes
  • Helps identify tumor location in real-time

Evidence:

Spine surgery: Multiple studies show AR navigation improves screw placement accuracy (98%+ correct positioning vs. 90-95% with fluoroscopy alone) and reduces radiation exposure (Mason et al., 2014).

Neurosurgery: AR reduces targeting errors, but brain shift (tissue deformation after opening dura) remains significant challenge. Intraoperative imaging updates required for accuracy.

Liver surgery: Registration accuracy (aligning preoperative imaging to surgical field) degrades with tissue deformation. Useful for initial approach planning but less reliable as resection progresses.

Critical Limitation: Registration Errors

AR requires precise alignment of imaging to patient anatomy. Registration errors (2-5mm typical) can be clinically significant, especially for small structures or narrow safety margins. Surgeons must verify AR guidance against direct visualization and anatomical knowledge.

AI in Robotic Surgery

Current State: No Autonomy

Despite “robotic surgery” terminology, da Vinci and similar systems are teleoperated tools, not autonomous robots. The surgeon controls every movement. AI plays minimal role in current clinical robotic systems.

Emerging AI Applications:

Surgical Skill Assessment:

  • AI analyzes instrument paths, economy of motion, smoothness
  • Provides objective feedback for training
  • Correlates with surgical experience and patient outcomes (Gumbs et al., 2021)
  • Used in residency training programs

Tremor Filtering:

  • Robot compensates for physiologic tremor
  • Standard feature, not novel AI (rule-based filtering)
  • Improves precision for microsurgical tasks

Autonomous Task Execution (Research Only):

The STAR (Smart Tissue Autonomous Robot) performed supervised autonomous bowel anastomosis in pigs (Shademan et al., 2016). This proof-of-concept demonstrated technical feasibility but:

  • Not FDA-approved
  • Not tested in humans
  • Requires perfect conditions: no bleeding, adhesions, or unexpected anatomy
  • Slower than human surgeons
  • Monitoring surgeon must be ready to intervene instantly

Variability of human anatomy, tissue properties, and intraoperative findings far exceeds AI’s ability to safely respond without human judgment. Fully autonomous robotic surgery remains research, not reality.

More realistic future: Semi-autonomous assistance for repetitive sub-tasks (suturing, tissue dissection in clear planes) under continuous surgeon supervision.

Postoperative AI Applications

Complication Prediction

Surgical Site Infection (SSI) Prediction:

ML models predict SSI risk using:

  • Patient factors (diabetes, obesity, smoking, immunosuppression)
  • Operative characteristics (duration, complexity, contamination class)
  • Intraoperative variables (glucose control, normothermia, antibiotic timing)
  • Postoperative factors (drain output, pain scores)

Evidence: Modest improvements over clinical judgment alone (AUC 0.75-0.80 vs. 0.70-0.72).

Limitations:

  • High false positive rates (30-40%) limit actionability
  • Shouldn’t guide prophylactic antibiotic decisions (risk of resistance)
  • Best use: Enhanced surveillance for high-risk patients

Postoperative Delirium:

Prediction models incorporating preoperative cognitive assessment, anesthesia factors, and postoperative medications identify high-risk patients for:

  • Non-pharmacologic prevention (reorientation, sleep hygiene, family presence)
  • Avoidance of deliriogenic medications
  • Enhanced monitoring

Evidence: Better than clinical intuition, but delirium remains multifactorial and incompletely preventable.

Anastomotic Leak Prediction:

ML models analyzing postoperative labs (CRP trajectory), vital signs, and clinical notes can identify leak risk earlier than clinical suspicion alone.

Challenge: Rare outcomes (1-5% incidence) make model training difficult and false positive rates high.

Deterioration Monitoring

AI systems analyzing continuous vitals, lab trends, nursing documentation, and medication administration can detect patterns predicting clinical deterioration 6-12 hours before conventional early warning scores.

Applications:

  • Postoperative hemorrhage
  • Respiratory failure
  • Sepsis
  • Cardiac events

Evidence: Detection performance generally good, but high false positive rates create alert fatigue (similar to sepsis prediction challenges discussed in Chapter 9) (Wong et al., 2021; Beam & Kohane, 2018).

Best Implementation: Integrate AI alerts with rapid response team protocols and ensure alerts are actionable (not just “patient is high-risk”) (Topol, 2019).

Surgical Quality and Education

Video-Based Surgical Assessment

AI analysis of surgical videos enables objective skill assessment and quality improvement.

Applications:

Skill Scoring:

  • Objective assessment of technical performance
  • Identifies specific errors (tissue trauma, bleeding, inefficiency)
  • Provides quantitative feedback for training

Evidence: AI scores correlate strongly with expert human assessment and predict surgical outcomes (Gumbs et al., 2021).

Benefits for surgical education:

  • Objective feedback supplements subjective faculty evaluation
  • Tracks skill progression over time
  • Identifies specific areas needing improvement
  • Benchmarks against peer performance

Quality Improvement:

  • Retrospective review of complications to identify technical factors
  • Process improvement for OR efficiency
  • Standardization of surgical techniques

Challenges:

  • Privacy and medicolegal concerns about routine recording
  • Surgeon resistance to surveillance
  • Doesn’t capture decision-making quality (only technical execution)
  • Storage and analysis infrastructure requirements

Natural Language Processing for Operative Notes

AI extraction of structured data from operative notes enables:

Quality Metrics:

  • Automated calculation of process measures (antibiotic timing, VTE prophylaxis)
  • Complication detection from dictated notes
  • Adherence to surgical best practices

Registry Auto-Population:

  • Reduces manual data entry burden for NSQIP, VASQIP, other registries
  • Improves data completeness and accuracy

Clinical Decision Support:

  • Extraction of critical operative details for downstream care (mesh type in hernia repair, prosthesis in joint replacement)

Evidence: High accuracy (>95%) for structured data elements. Challenges remain for nuanced surgical findings and judgment-based assessments.

Specialty-Specific Applications

Different surgical specialties face unique challenges and opportunities for AI integration:

General Surgery

  • Hernia recurrence risk prediction
  • Cholecystectomy difficulty scoring
  • Bile duct injury prevention (research phase)

Orthopedic Surgery

  • Fracture detection AI (high accuracy for simple fractures)
  • Joint replacement planning and component sizing
  • Spinal navigation systems (FDA-cleared)
  • Ligament injury diagnosis from MRI

Neurosurgery

  • Brain tumor segmentation for resection planning
  • Epilepsy focus localization
  • Surgical navigation systems
  • Intraoperative tumor margin assessment (research)

Cardiac Surgery

  • Surgical risk models (STS score enhanced with ML)
  • Intraoperative echocardiography interpretation
  • ICU outcome prediction

Thoracic Surgery

  • Lung nodule characterization from CT
  • Surgical approach selection (VATS vs. thoracotomy)
  • Lymph node metastasis prediction

Vascular Surgery

  • AAA rupture risk prediction
  • Vascular anatomy segmentation
  • Endovascular procedure planning

Plastic Surgery

  • Breast reconstruction outcome prediction
  • Aesthetic outcome simulation
  • Flap viability monitoring (research)

Critical Limitations and Risks

Why Surgical AI Must Be Approached With Particular Caution

Immediacy of Harm: Unlike diagnostic errors that can be caught through physician review, intraoperative AI errors cause immediate, potentially irreversible patient harm.

Complexity of Surgical Judgment: Surgery requires integration of visual, tactile, and proprioceptive information with anatomical knowledge, pattern recognition from thousands of prior cases, and real-time adaptation to unexpected findings. AI doesn’t replicate this.

Medicolegal Implications: If a surgeon follows AI guidance and causes injury, liability is clear: the surgeon is responsible. If surgeon ignores AI warning and causes injury, plaintiff’s attorneys will argue AI was ignored. This creates defensive pressure to over-rely on AI even when clinical judgment suggests otherwise.

Technology Failure Modes: Computer vision fails with blood, smoke, optical artifacts. ML models fail with out-of-distribution inputs (unusual anatomy, rare findings). Risk models fail when patient circumstances differ from training data.

Trust Calibration: Surgeons must neither over-trust (following AI suggestions without verification) nor under-trust (ignoring useful AI alerts). Achieving appropriate calibration is difficult (Char et al., 2020).

Regulatory and Medicolegal Considerations

FDA Regulation of Surgical AI

  • Surgical planning software: Class II (510k clearance)
  • Surgical navigation systems: Class II (moderate-risk devices)
  • Autonomous surgical robots: Would be Class III (PMA required)
  • Risk calculators: Often considered clinical decision support (no FDA oversight)

Medicolegal Principles

Surgeons remain legally responsible for AI-assisted decisions. Key documentation practices:

  • Informed consent should mention AI use when material to patient decision
  • Documentation should note AI tools used and how output was interpreted
  • Malpractice risk if AI recommendation followed without independent verification

The Liability Dilemma

  • Following AI that’s wrong: Surgeon liable for not exercising independent judgment
  • Ignoring AI that’s right: Plaintiff attorneys argue surgeon ignored available technology
  • Best practice: Document independent verification of AI outputs, explain clinical reasoning when overriding AI recommendations

Evidence-Based Guidelines for Surgical AI Adoption

Recommendations for Surgeons and Surgical Departments

Before Adopting Any Surgical AI:

  1. Demand evidence: Prospective validation studies in diverse populations, not just retrospective accuracy metrics (Nagendran et al., 2020)
  2. Understand training data: Was the model trained on cases like yours? (Procedure types, patient populations, institutional practices) (Beam & Kohane, 2018)
  3. Know the failure modes: How does the system fail? What are the error rates? What happens with unusual cases? (Vabalas et al., 2019)
  4. Assess workflow integration: Does this fit your existing workflow or require disruptive changes?
  5. Clarify liability: What does your malpractice carrier say about using this AI? What does hospital legal counsel advise?
  6. Verify regulatory status: Is this FDA-cleared? For what specific indication?
  7. Evaluate cost-effectiveness: Does the benefit justify the cost (both financial and cognitive/workflow burden)?

Safe Implementation Practices:

  1. Pilot testing: Start with low-stakes applications, expand carefully based on performance
  2. Parallel validation: Run AI alongside current practice, compare results before replacing current approach
  3. Defined oversight: Clear protocols for who reviews AI outputs and how discrepancies are resolved
  4. Incident reporting: Systems to capture AI errors or near-misses
  5. Ongoing validation: Monitor real-world performance, don’t assume initial validation persists indefinitely
  6. User training: Ensure all users understand AI capabilities, limitations, and appropriate use
  7. Informed consent: Discuss AI use with patients when material to their decision-making

Red Flags (Avoid These AI Systems):

  • Claims of autonomous surgical decision-making
  • Black-box models with no explanation of predictions
  • Lack of prospective validation studies
  • Vendors unwilling to disclose training data characteristics
  • No mechanism for reporting errors or failures
  • Regulatory status unclear or misrepresented
  • Pressure to adopt without adequate evaluation period

Professional Society Guidelines on AI in Surgery

ACS Leadership on AI in Surgery (2024-2025)

The American College of Surgeons has established significant AI infrastructure:

Leadership:

  • Dr. Genevieve Melton-Meaux appointed as inaugural Chief Health Informatics Officer (2024)
  • Practicing colorectal surgeon and director of the Center for Learning Health System Sciences at University of Minnesota

Educational Programs:

  • “Artificial Intelligence and Machine Learning: Transforming Surgical Practice and Education” - online course available since 2023
  • Clinical Congress sessions addressing ethical and regulatory AI considerations

Strategic Direction: The ACS emphasizes that surgeons must take the lead in integrating AI, defining how it affects their practice, and influencing what good patient care means. If surgeons don’t step up, what defines successful surgery will be decided by others.

AI Applications Recognized by ACS

The ACS recognizes three primary AI categories transforming surgical practice:

  1. Ambient AI: Automated documentation of surgical encounters and procedures
  2. Prediction tools: Perioperative risk assessment and outcome prediction
  3. Research and writing solutions: Literature review, manuscript preparation assistance

NSQIP and Risk Prediction

The ACS National Surgical Quality Improvement Program (NSQIP) Surgical Risk Calculator represents one of the most validated AI-adjacent tools in surgery:

  • Developed from outcomes data on millions of surgical patients
  • Provides patient-specific risk predictions for major complications
  • Continuously updated with new outcome data
  • Endorsed by ACS as a shared decision-making tool

SAGES Guidelines

The Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) has engaged with AI particularly in:

  • Computer vision for surgical field analysis
  • Real-time anatomical structure identification during laparoscopic procedures
  • Surgical video analysis for quality improvement and training

Implementation Note: SAGES emphasizes that AI in the OR must be validated for the specific surgical context and population before clinical deployment.

Future Directions

Realistic Near-Term Progress (2-5 years)

  • Routine integration of ML risk calculators into preoperative clinics
  • Expanded use of AI surgical planning for complex cases
  • Video-based quality feedback becoming standard in training
  • Better postoperative monitoring with AI-augmented early warning systems

Medium-Term Possibilities (5-10 years)

  • Improved real-time anatomical recognition (still with human verification required)
  • Context-aware intraoperative decision support (suggestions, not autonomous action)
  • Personalized surgical technique optimization based on patient anatomy
  • Semi-autonomous robotic assistance for specific sub-tasks under continuous human supervision

Long-Term Speculation (10+ years)

  • Highly accurate real-time tissue characterization (pathology-level information intraoperatively)
  • Predictive models anticipating surgical course and complications with high accuracy
  • Integration of multi-omic patient data into surgical decision-making
  • Robotic systems handling increasing proportions of routine surgical tasks (still under surgeon control)

Unlikely Despite Hype

  • Fully autonomous robotic surgery without surgeon in the loop
  • AI replacing surgical judgment for complex, high-stakes decisions
  • Elimination of surgical complications through AI

Conclusion

Surgery is fundamentally a human activity requiring manual skill, real-time judgment, and adaptation to unique patient circumstances. AI can enhance the cognitive work surrounding surgery (risk assessment, planning, quality improvement) and may eventually provide useful intraoperative information. But the surgeon’s hands, eyes, judgment, and responsibility remain central.

The most successful surgical AI applications will be those that respect the complexity of surgery, acknowledge uncertainty transparently, augment rather than replace expertise, and prioritize patient safety over technological impressiveness.

Surgeons should embrace AI as a powerful adjunct while maintaining the healthy skepticism, independent verification, and personal accountability that define good surgical practice.


Check Your Understanding

Scenario 1: AI Risk Calculator Overestimates Surgical Risk

You’re a colorectal surgeon evaluating an 82-year-old woman with Stage III colon cancer. She’s otherwise healthy: active, independent ADLs, no major comorbidities, ECOG 0.

AI surgical risk calculator (MySurgeryRisk) estimates:

  • 30-day mortality risk: 18%
  • Major complication risk: 45%
  • Recommendation: “High risk - consider non-operative management”

Traditional ACS NSQIP calculator estimates:

  • 30-day mortality: 3.2%
  • Major complication: 12%

Patient’s oncologist refers to you stating “AI says surgery too risky. Recommend palliative chemo only.”

Your clinical assessment: Patient is good surgical candidate. Age alone shouldn’t preclude curative surgery. Frailty assessment normal. Cardiopulmonary exam reassuring.

Decision point: Do you:

    1. Follow AI recommendation, refer to medical oncology for palliative chemotherapy
    1. Override AI, recommend surgery based on your clinical judgment

Answer 1: What explains the discrepancy between AI and traditional calculators?

AI model likely over-weighted age without considering:

  • Functional status: Patient is ECOG 0, independent, not frail
  • Comorbidity burden: Minimal comorbidities despite age 82
  • Fitness indicators: Normal cardiopulmonary reserve

Potential AI training bias:

  • If AI trained on data where older patients had higher complication rates, model may learn “age 80+ = high risk” without distinguishing fit vs. frail
  • Simpson’s paradox: Age correlates with frailty in training data, but THIS patient defies that correlation

Traditional NSQIP calculator:

  • Uses validated risk factors (ASA class, functional status, comorbidities)
  • May better account for physiologic age vs. chronologic age

Answer 2: What are the liability implications of each choice?

Choice A (Follow AI, decline surgery):

Plaintiff argument (if patient dies from untreated cancer):

  • Surgeon inappropriately deferred to AI algorithm
  • Failed to exercise independent clinical judgment
  • Denied patient potentially curative treatment based on flawed AI estimate
  • Standard of care requires surgeons to assess patient individually, not defer to algorithm

Legal precedent: Multiple cases where physicians found liable for following decision support tools that contradict clinical judgment

Choice B (Override AI, proceed with surgery):

If patient has major complication or dies:

Plaintiff argument:

  • Surgeon ignored AI warning of 18% mortality risk
  • Proceeded with high-risk surgery against AI recommendation
  • Reckless disregard for patient safety

Defense argument:

  • AI is decision support tool, not substitute for clinical judgment
  • Surgeon’s assessment (frailty, functional status, cardiopulmonary reserve) more accurate than AI age-based estimate
  • Standard of care requires individualized assessment, not algorithmic adherence
  • Traditional NSQIP calculator (validated, widely used) supported decision
  • Patient underwent informed consent understanding risks

Likely outcome: Defense verdict if surgeon documented thorough clinical assessment, explained AI discrepancy, obtained informed consent discussing both AI and traditional estimates.

Answer 3: How should you handle this AI-clinical judgment conflict?

Appropriate approach:

  1. Investigate AI discrepancy
    • Review AI inputs: What features drove high-risk estimate?
    • Compare with traditional validated tools (NSQIP, ERAS)
    • Consult surgical colleagues: Would they operate on this patient?
  2. Comprehensive clinical assessment
    • Gait speed, grip strength (frailty markers)
    • Cardiopulmonary exercise testing if available
    • Geriatric assessment
    • Functional status (independent vs. dependent ADLs)
  3. Multidisciplinary discussion
    • Present case at tumor board
    • Geriatric surgery consult if available
    • Anesthesia risk assessment
  4. Transparent informed consent
    • Discuss both AI and traditional risk estimates with patient
    • Explain why estimates differ (age vs. physiologic status)
    • Present alternatives (surgery, chemotherapy alone, observation)
    • Document: “AI calculator estimated 18% mortality; however, clinical assessment suggests patient is physiologically fit. Traditional NSQIP calculator estimates 3.2% mortality. Discussed both estimates with patient. Patient understands risks, chooses surgery.”
  5. Document clinical reasoning
    • “AI risk calculator estimates high risk primarily based on age 82. However, patient demonstrates excellent functional status (ECOG 0, independent ADLs, normal gait speed), minimal comorbidities, normal cardiopulmonary reserve. Traditional NSQIP calculator estimates mortality 3.2%. Clinical judgment: patient is appropriate surgical candidate. AI estimate likely over-weighted chronologic age without adequate consideration of physiologic fitness.”

Lesson: AI risk calculators are tools to inform, not dictate, surgical decisions. When AI conflicts with clinical judgment and validated traditional tools, surgeon must exercise independent assessment. Age alone should not preclude surgery in fit older adults. Document thorough reasoning when overriding AI recommendations.

Scenario 2: Intraoperative AI Misidentifies Critical Anatomy

You’re performing robotic-assisted partial nephrectomy for small renal mass using da Vinci Xi with integrated AI “Surgical Intelligence” system.

AI system features:

  • Real-time anatomical labeling (kidney, renal artery, renal vein, ureter, tumor)
  • Proximity alerts when instruments near critical structures
  • Augmented reality overlay on surgical view

Intraoperative event:

During hilar dissection, AI labels renal artery and renal vein on display. You prepare to clamp renal artery for tumor excision.

Your visual assessment: Structure labeled “renal artery” appears larger than expected, bluish tint, pulsations not prominent.

Uncertainty: Is this truly renal artery or is AI mislabeling renal vein as artery?

Decision point: Do you:

    1. Trust AI label, clamp structure labeled “renal artery”
    1. Pause, verify anatomy manually before clamping

You choose: Option B (pause and verify)

Manual verification: Doppler ultrasound confirms structure labeled “renal artery” is actually renal vein. True renal artery is 2mm posterior, unlabeled by AI.

If you had clamped based on AI label: Would have clamped renal vein, not artery → inadequate ischemic control → bleeding during tumor excision, potential need for total nephrectomy.

Answer 1: Why did the AI mislabel critical anatomy?

AI computer vision failure modes:

  1. Anatomical variation: This patient had variant renal vascular anatomy (early branching, aberrant vessel course)

    • AI trained on typical anatomy
    • Variants (present in 20-30% of patients) not well-represented in training data
  2. Tissue appearance similarity: Renal artery and vein can appear similar on video (both red/pink, both tubular)

    • AI relies on position, caliber, pulsatility
    • In variant anatomy, typical positional relationships disrupted
  3. Partial occlusion: Surgical manipulation may have partially occluded artery → reduced pulsations → AI misidentified as vein

  4. Confidence threshold: AI may have been 60-70% confident (below human comfort level) but still displayed label without uncertainty indication

Answer 2: What are the liability implications if you had clamped the wrong vessel?

If you clamped renal vein instead of artery:

Immediate consequences:

  • Inadequate tumor ischemia → bleeding during excision
  • Potential renal vein thrombosis
  • May require total nephrectomy instead of partial
  • Patient loses kidney function unnecessarily

Malpractice analysis:

Plaintiff argument:

  • Surgeon blindly followed AI labeling without manual verification
  • Failed to exercise fundamental surgical principle: verify anatomy before clamping/cutting
  • Fell below standard of care by deferring anatomical judgment to AI
  • Patient lost kidney due to surgeon’s inappropriate reliance on technology

Defense argument:

  • AI was marketed as “surgical intelligence” system
  • Reasonable to rely on technology validated by manufacturer, FDA-cleared
  • Anatomical variation not surgeon’s fault
  • Damage was not from negligence, but from AI error

Likely outcome:

  • Plaintiff verdict likely: Courts hold surgeons to personal anatomical verification standard
  • “AI told me to” is NOT valid defense
  • Fundamental principle: surgeon must personally verify anatomy before irreversible action
  • FDA clearance of AI tool does not absolve surgeon of personal responsibility

Precedent: In Smith v. Hospital (hypothetical but representative), surgeon relied on navigation system for spine surgery, placed pedicle screw in wrong location causing nerve injury. Court ruled surgeon liable despite navigation system error: “technology augments but does not replace surgeon’s duty to verify.”

Answer 3: What are the appropriate use principles for intraoperative AI?

Surgical AI as “junior resident”:

  1. AI suggestions are hypotheses, not facts
    • AI labels = “This might be renal artery”
    • Surgeon verifies = “I confirm this is renal artery”
  2. Verify before irreversible action
    • Before clamping, cutting, coagulating: manual confirmation
    • Use additional tools: Doppler, manual palpation, ICG angiography, direct visualization
  3. Heightened skepticism in variant anatomy
    • If anatomical landmarks don’t match expected positions
    • If AI labels conflict with visual assessment
    • If patient has known anatomical variants (duplicated vessels, horseshoe kidney)
  4. Demand uncertainty quantification
    • AI should display confidence levels
    • “Renal artery (92% confident)” vs. “Renal artery (60% confident)”
    • Low confidence → require additional verification
  5. Continuous cross-checking
    • Compare AI labels with your visual assessment at each step
    • If discrepancy, investigate before proceeding

Institutional safeguards:

  1. Training requirements
    • Surgeons using AI-augmented systems must complete training on:
      • AI failure modes
      • When to trust vs. verify AI
      • Manual verification techniques
  2. Quality assurance
    • Review cases where AI labeling was incorrect
    • Share at M&M conferences
    • Track AI error rates by anatomy type, procedure
  3. Documentation
    • When AI labeling conflicts with surgeon assessment, document:
      • “AI labeled [structure] as [label]; however, manual verification with [Doppler/ICG/palpation] confirmed [correct identity]”

Lesson: Intraoperative AI is assistive, not authoritative. Surgeons remain responsible for anatomical identification regardless of AI labels. Verify critical anatomy manually before irreversible actions. “Trust but verify” is insufficient. Standard should be “Verify independently, AI assists.”

Scenario 3: Postoperative AI Alert Fatigue

You’re surgical quality director implementing AI-based early warning system (Rothman Index, commercial product) for postoperative complication detection.

AI system: Analyzes vital signs, lab values, nursing assessments every 15 minutes. Generates alert when patient predicted to be at increased risk for:

  • Sepsis
  • Respiratory failure
  • Acute kidney injury
  • Need for ICU transfer

Month 1 performance:

  • Alerts generated: 847 alerts across 320 postoperative patients (2.6 alerts per patient)
  • True positives: 23 patients developed complications flagged by AI
  • False positives: 824 alerts did not correspond to actual complications
  • False positive rate: 97.3%
  • Positive predictive value: 2.7%

Clinical impact:

  • Nursing staff overwhelmed by alerts
  • Most alerts dismissed as “AI crying wolf”
  • Alert fatigue setting in (nurses ignoring alerts)

Week 4 critical event:

  • 62-year-old man, post-colectomy day 2
  • AI generates alert at 2 AM: “High risk for sepsis - recommend immediate evaluation”
  • Night nurse dismisses alert (patient appears stable, vital signs acceptable)
  • No physician notification
  • 6 AM: Patient found hypotensive (BP 82/45), tachycardic (HR 128), altered mental status
  • Diagnosis: Anastomotic leak with peritonitis and sepsis
  • Patient requires emergent return to OR, ICU care
  • Prolonged hospital stay, family files complaint: “Why wasn’t the AI alert acted on?”

Answer 1: What caused the alert fatigue?

High false positive rate driven by:

  1. Low disease prevalence
    • True complication rate: ~7% of post-op patients
    • AI optimized for high sensitivity (catches 23/25 true complications = 92% sensitivity)
    • But: At 7% prevalence with 92% sensitivity, 85% specificity → PPV only 2.7%
  2. Threshold calibration
    • AI vendor set low threshold to maximize sensitivity (fear of missing complications)
    • Resulted in extreme false positive burden
  3. Lack of clinical context
    • AI analyzes physiologic data only
    • Does not know: patient just returned from 2-hour physical therapy session (explains elevated HR), patient received fluid bolus (explains improved BP trends), patient had expected postoperative fever
  4. Poor alarm design
    • All alerts same priority level
    • No distinction between “mild concern” vs. “urgent evaluation needed”
    • No incorporation of clinical trajectories (improving vs. worsening trends)

Alert fatigue:

  • 824 false positives → nurses learn “AI alerts usually wrong”
  • Cognitive bias: When 97.3% of alerts are false, dismissing alerts becomes learned behavior
  • The 23 true positives get lost in noise

Answer 2: Who is liable for the missed anastomotic leak?

Potentially both hospital and individual nurse:

Hospital institutional liability:

Plaintiff argument:

  • Hospital deployed AI system with 97.3% false positive rate
  • Created alert fatigue environment where critical alerts ignored
  • Failed to calibrate system before clinical deployment
  • Should have monitored alert fatigue, intervened when nurses began dismissing alerts

Nursing liability:

Plaintiff argument:

  • Nurse dismissed AI alert without evaluating patient
  • Failed to notify physician of high-risk alert
  • Did not document why alert was dismissed
  • Fell below nursing standard of care

Defense argument (nursing):

  • 97.3% false positive rate meant 97 of every 100 alerts were false
  • Nurse made reasonable judgment based on clinical assessment (patient appeared stable)
  • Hospital created untenable alert burden
  • Individual nurse cannot be expected to thoroughly evaluate 2.6 alerts per patient per shift

Likely outcome:

  • Shared liability: Hospital bears primary responsibility for deploying poorly calibrated system
  • Individual nurse may bear some liability for not documenting assessment and physician notification

Answer 3: How should AI early warning systems be implemented safely?

System calibration:

  1. Acceptable false positive rate
    • Target PPV ≥10-15% (not 2.7%)
    • May require reducing sensitivity from 92% → 70-75%
    • Trade-off: Catch fewer complications, but those caught are more likely real
  2. Tiered alert system
    • Low priority (informational): “Monitor patient closely”
    • Medium priority (nursing assessment): “Evaluate patient within 1 hour”
    • High priority (physician notification): “Urgent evaluation needed, notify MD immediately”
    • Reserve high-priority alerts for PPV >30%
  3. Clinical context integration
    • Suppress alerts during expected post-op recovery (first 24 hours)
    • Incorporate clinical context (patient just ambulated, received fluid bolus, normal post-op fever)
    • Trend analysis (worsening vs. stable vs. improving)

Workflow integration:

  1. Alert response protocol
    • High-priority alert → Mandatory nursing assessment within 15 minutes + physician notification
    • Document: “AI alert reviewed. Patient assessed. Findings: [stable vs. concerning]. Action: [continued monitoring vs. physician notified].”
  2. Feedback loop
    • Track AI alert accuracy
    • Monthly review: How many alerts were true positives?
    • Adjust thresholds based on performance
  3. Human oversight
    • Nurse or physician reviews AI alerts, decides which require action
    • AI does not page physician directly (human gatekeeper)

Quality monitoring:

  1. Track alert fatigue
    • Monitor alert dismissal rates
    • If >80% of alerts dismissed without assessment → system is failing
    • Survey staff on alert burden monthly
  2. Audit missed complications
    • For every complication, determine: Did AI alert? Was alert acted on?
    • If multiple complications missed due to dismissed alerts → pause system, recalibrate
  3. Continuous improvement
    • Vendor partnership: Provide feedback on false positives
    • Request threshold adjustment or better risk stratification

Lesson: AI early warning systems can improve outcomes only if positive predictive value is high enough to avoid alert fatigue. A system with 97% false positive rate creates more harm (ignored alerts, missed complications) than benefit. Implementation requires careful calibration, tiered alerts, clinical context, and continuous monitoring. “High sensitivity” is not enough. PPV must be clinically actionable (≥10-15% minimum).


References