Surgery, Anesthesiology, and Perioperative Care
Surgery combines technical skill, anatomical knowledge, and split-second decision-making under pressure. AI applications span preoperative risk assessment, intraoperative guidance, and postoperative monitoring.
After reading this chapter, you will be able to:
- Evaluate AI systems for surgical risk prediction and optimization
- Understand computer vision applications in robotic and minimally invasive surgery
- Create evaluation plans for robotic surgery platforms and surgical video AI
- Assess AI tools for surgical phase recognition and workflow analysis
- Navigate AI-assisted surgical planning and simulation
- Identify postoperative complication prediction systems
- Recognize limitations and failure modes of surgical AI
- Balance AI augmentation with surgical judgment and technical skill
Introduction
Surgery stands apart from other medical specialties in its immediacy, irreversibility, and technical demands. While radiologists can analyze images over minutes, surgeons make split-second decisions with scalpel in hand. While internists can adjust management based on patient response, surgical decisions, once made, cannot be easily undone.
This unique context shapes how AI can and cannot help surgeons. The most promising applications assist with the cognitive work surrounding surgery (risk assessment, planning, outcome prediction) rather than replacing the surgeon’s hands or judgment during the operation itself.
The sections that follow cover surgical AI applications across the perioperative spectrum, from preoperative optimization through postoperative care.
Preoperative AI Applications
Surgical Risk Prediction
The Clinical Problem:
Surgeons face a fundamental question before every operation: Will this patient tolerate this procedure? Traditional risk assessment relies on clinical judgment supplemented by scoring systems (ASA classification, NSQIP risk calculator, RCRI for cardiac risk in non-cardiac surgery). These tools have limitations:
- Incorporate limited variables (20-30 factors)
- Use linear models that miss complex interactions
- Provide population-level estimates, not personalized predictions
- Updated infrequently as new evidence emerges
Machine Learning Solutions:
Modern ML approaches improve risk prediction by:
- Analyzing larger feature sets: 100+ variables from EHR, imaging, labs, medications, vital signs, social determinants
- Capturing nonlinear relationships: Age × frailty × procedure complexity interactions
- Continuous learning: Models updated with new outcome data
- Personalized predictions: Patient-specific risk estimates rather than population averages
Evidence:
The MySurgeryRisk algorithm developed at University of Florida analyzed 400,000+ surgical cases and significantly outperformed traditional risk models (Bihorac et al., 2019):
- 30-day mortality prediction: AUC 0.77-0.83 across follow-up periods (the 0.94 figure is the upper end of the complications range, not mortality)
- Major complications: AUC range 0.82-0.94 across 8 complication types
- ICU admission: AUC within the 0.82-0.94 range (individual breakdown not separately reported in abstract)
- Hospital length of stay: Better calibration across risk spectrum
Similar results from other institutions: Stanford, Partners Healthcare, Penn Medicine all report improved risk prediction using ML on local data.
Clinical Applications:
Preoperative Optimization:
- Identify modifiable risk factors (anemia, hyperglycemia, nutritional deficits)
- Triage patients for preoperative clinic vs. day-of-surgery admission
- Guide prehabilitation referrals
Shared Decision-Making:
- Provide personalized risk estimates during surgical consults
- Facilitate discussions about alternative treatments
- Support goals-of-care conversations for high-risk patients
Resource Allocation:
- Predict ICU vs. floor bed requirements
- Identify patients needing enhanced postoperative monitoring
- Optimize OR scheduling based on predicted case duration
Quality Improvement:
- Risk-adjust outcome comparisons between surgeons/hospitals
- Identify outliers for focused improvement efforts
- Benchmark performance against predicted outcomes
Critical Limitations:
Risk calculators should inform, not dictate, surgical decisions:
- Algorithms miss important factors: patient goals, functional trajectory, social support, frailty nuances
- High-risk patients may still benefit from surgery if alternative is certain poor outcome
- Low-risk predictions don’t guarantee good outcomes
- Models trained on one population may not generalize to different populations
Clinical Bottom Line: Use risk prediction AI to enhance shared decision-making and optimize preoperative preparation. Do not deny surgery based solely on algorithmic risk scores.
Preoperative Planning and Simulation
AI-Assisted Anatomical Segmentation:
Surgical planning for complex cases (oncologic resections, liver surgery, orthopedic reconstructions) traditionally requires manual analysis of CT/MRI to identify anatomy, plan approaches, and anticipate challenges. AI automates and enhances this process:
Applications:
Oncologic Surgery:
- Tumor segmentation and volumetry
- Relationship to critical structures (vessels, bile ducts, nerves)
- Predicted resection margins
- Assessment of resectability
Liver Surgery:
- Vascular and biliary anatomy mapping
- Liver volumetry for donation or resection planning
- Future liver remnant calculation
- Virtual hepatectomy simulation
Orthopedic Surgery:
- Joint replacement planning (alignment, component sizing)
- Osteotomy planning for deformity correction
- Fracture reduction simulation
- Bone tumor resection planning
Neurosurgery:
- Brain tumor segmentation and eloquent cortex mapping
- Surgical approach trajectory planning
- Vascular anatomy for aneurysm clipping
- Epilepsy focus localization
Evidence:
Studies across multiple surgical specialties show AI segmentation (Hashimoto et al., 2018):
- Reduces planning time by 60-80% compared to manual segmentation
- Achieves inter-rater reliability comparable to expert-to-expert agreement
- Improves standardization of preoperative assessment (Topol, 2019)
- Enhances patient counseling with 3D visualizations
Limitations:
- Segmentation errors can propagate to surgical plans (always verify)
- Quality depends on input imaging (motion artifacts, contrast timing)
- Doesn’t account for intraoperative findings (adhesions, variant anatomy)
- Most effective for anatomy-driven procedures with good imaging
3D Printing and Surgical Models:
AI-segmented anatomy can be converted to 3D-printed models for:
- Pre-surgical rehearsal of complex cases
- Patient education and consent
- Trainee education
- Custom surgical guides and implants
Clinical Impact: Mixed. Some studies show reduced operative time and improved outcomes for complex cases; others show no benefit beyond surgeon confidence. Cost and workflow integration remain barriers to widespread adoption.
Intraoperative AI Applications
Computer Vision in Minimally Invasive Surgery
The laparoscope and robotic camera create continuous video streams, ideal data for computer vision AI. Applications range from documentation to real-time guidance, with varying degrees of validation and clinical readiness.
Surgical Phase Recognition:
What it does: AI analyzes surgical video and identifies current phase (e.g., “dissection of gallbladder from liver bed” in laparoscopic cholecystectomy)
How it works: Deep learning models trained on annotated surgical videos learn to recognize instrument configurations, anatomical landmarks, and surgeon actions characteristic of each phase.
Performance:
- Accuracy approximately 82% for laparoscopic cholecystectomy phase recognition (Twinanda et al., 2017)
- Works across multiple procedures (bariatric, colorectal, gynecologic)
- Real-time capability (15-30 frames/second)
Potential applications:
- Context-aware instrument tracking
- Automated surgical documentation
- OR efficiency analysis
- Surgical skill assessment
- Adverse event detection
Current status: Primarily research tool. Limited clinical deployment because phase recognition alone doesn’t provide actionable guidance. Surgeons already know which phase they’re in.
Future potential: Phase recognition is foundational for more advanced applications (predictive alerts, context-aware instrument suggestions).
Anatomical Structure Recognition:
The promise: Computer vision identifies critical anatomy (bile ducts, ureters, vessels) to prevent surgical injury.
The reality: This is extraordinarily difficult and not yet clinically reliable.
Why it’s hard:
- Visual variability: Blood, smoke, retraction, lighting changes, cautery artifacts
- Anatomical variants: Textbook anatomy is the exception, not the rule
- Dynamic deformation: Tissue moves, stretches, changes appearance continuously
- Occlusion: Critical structures often partially hidden
- Context-dependence: What looks like ureter may be vessel or adhesion band
Current evidence:
Research systems demonstrate:
- 70-85% accuracy for identifying major structures in ideal conditions
- Performance degrades significantly with bleeding, inflammation, obesity
- False positives and false negatives both occur at unacceptable rates
Critical safety concern:
Surgeons cannot rely on AI to definitively identify critical structures. Visual confirmation, tactile feedback, anatomical knowledge, and methodical dissection remain essential. AI suggesting “safe to divide this structure” is not acceptable with current technology.
More promising near-term application:
Warning systems: AI detecting absence of expected structures (“ureter not identified in expected location, double-check before dividing anything”) may be safer than positive identification. Alert surgeons to uncertainty rather than provide false confidence.
AI in Robotic Surgery
Robotic surgery belongs inside surgical AI evaluation, not as a separate specialty. It is a platform category, a surgical training problem, a data-capture layer for surgical AI, and a possible future path for supervised autonomy. Evaluation must separate the robot, the surgeon, the procedure, the AI layer, and the local health-system context.
Current state: teleoperation, not autonomy
Current clinical soft-tissue robotic platforms are teleoperated minimally invasive systems. The surgeon controls instrument motion from a console. The platform may provide three-dimensional visualization, wristed instruments, tremor filtering, motion scaling, ergonomic benefits, and integrated data capture, but those features do not equal autonomous surgical judgment.
The installed base and procedure volume are now large enough that robotic surgery requires routine service-line governance rather than innovation-lab treatment. Intuitive Surgical reported approximately 3.153 million da Vinci procedures in 2025, compared with approximately 2.683 million in 2024, and reported a system installed base of more than 12,100 by year-end 2025 (Intuitive Surgical, 2026 annual report).
FDA status is platform-specific
FDA clearance or authorization applies to specific devices, indications, and intended uses. It does not validate every hospital’s use case, every surgeon’s learning curve, or every AI claim layered on top of the robotic platform.
| Platform | FDA status | Current relevance |
|---|---|---|
| da Vinci 5 | 510(k) clearance, K232610, March 2024 | Fifth-generation multiport da Vinci system. Clearance supports the platform’s intended use, not autonomous surgery (FDA K232610). |
| Versius Surgical System | De Novo authorization, DEN230078, October 2024 | Class II modular electromechanical surgical system, initially indicated in the United States for adult cholecystectomy (FDA DEN230078). |
| Hugo RAS System | 510(k) clearance, K250725, December 2025 | Class II modular electromechanical surgical system for adult minimally invasive urologic surgical procedures (FDA K250725). |
FDA status should be verified from FDA records before procurement, credentialing, patient-facing materials, or handbook updates. Vendor announcements can describe launch strategy and training ecosystems, but FDA records define the cleared indication.
Commercial landscape and evidence status
Commercial activity now falls into distinct evidence categories. FDA-authorized teleoperated platforms, including da Vinci, Versius, and Hugo, should be evaluated by device indication and procedure-specific outcomes. Surgical video and operating-room analytics companies, including OR Black Box, Touch Surgery, Theator, and Caresyntax, should be evaluated as data-capture, documentation, video-review, or quality-measurement systems; peer-reviewed evidence supports feasibility and implementation analysis, but not automatic outcome improvement (Thornton et al., 2026; Aklilu et al., 2024). Autonomy-first startups, including Aleph Surgical, signal where the market is looking next, but Aleph’s March 2026 research preview is not peer-reviewed clinical evidence and no FDA authorization or clinical outcomes publication was located as of June 18, 2026. Treat these companies as systems to track, not as evidence of clinical readiness.
Recent clinical evidence
Robotic surgery does not have one evidence grade. Each procedure has its own learning curve, comparator, outcomes, and cost structure.
| Evidence question | Recent finding | Evidence quality |
|---|---|---|
| Middle and low rectal cancer | The REAL randomized trial included 1171 patients with median 43-month follow-up and found lower 3-year locoregional recurrence with robotic surgery than laparoscopy (1.6% vs. 4.0%) and higher disease-free survival (87.2% vs. 83.4%), with similar overall survival (Feng et al., 2025). | Moderate to high: randomized trial, but high-volume expert centers in China limit generalization. |
| Acute-care cholecystectomy | A propensity-matched JAMA Surgery cohort found similar bile duct injury rates for robotic-assisted and laparoscopic cholecystectomy, but higher major postoperative complications with robotic-assisted surgery (8.37% vs. 5.50%), more drain use, and longer length of stay (Woldehana et al., 2025). A newer JAMA Surgery analysis examined comparative safety in contemporary practice, underscoring that cholecystectomy safety should be monitored with current local outcomes rather than platform assumptions (Mullens et al., 2026). | Low to moderate: large observational cohorts, residual confounding remains possible. |
| Adoption drivers | A 2026 JAMA Network Open cohort of 20,313 surgeons found receipt of direct industry payment was associated with increased proportional use of robotic-assisted surgery, with a dose-response pattern (San Loh et al., 2026). | Low to moderate: observational policy evidence, useful for governance rather than causal proof of patient benefit. |
| Credentialing and privileging | A 2026 JAMA Surgery Viewpoint proposed competency-based, vendor-neutral privileging for robotic surgery rather than platform-specific exposure or case volume alone (Bertolo et al., 2026). | Expert framework: useful for governance, not patient-outcome evidence. |
| Intraoperative AI implementation | A multisite qualitative study of AI-based Operating Room Black Box implementation found gaps between expectations and delivery: additional AI training needs, difficult data access, limited postoperative complication prediction, and limited academic deliverables (Thornton et al., 2026). | Low: qualitative implementation evidence, high value for deployment planning. |
| Surgical video AI | NEJM AI published a computer-vision study that used laparoscopic cholecystectomy video to identify surgical actions associated with blood loss and surgical experience (Aklilu et al., 2024). | Low to moderate: promising analytic method, not a clinical intervention trial. |
The practical lesson is not that robotic surgery is good or bad. The lesson is that the unit of evidence is the procedure-institution-surgeon combination. A claim from rectal cancer surgery at expert centers should not be transferred to acute-care cholecystectomy, ventral hernia repair, hysterectomy, or community urology without local evidence.
Creating evaluations for robotic surgery
Every robotic surgery evaluation should begin with a falsifiable claim:
- Clinical outcome claim: robotic surgery reduces complications, conversion, recurrence, readmissions, pain, length of stay, or reoperation.
- Operational claim: robotic surgery improves OR throughput, case scheduling, turnover, staffing efficiency, or surgeon ergonomics.
- Training claim: robotic analytics improve skill acquisition, feedback quality, or credentialing reliability.
- AI claim: a model detects phases, instruments, anatomy, errors, risk states, or quality signals accurately enough to change decisions.
- Autonomy claim: a system performs a bounded physical subtask safely under defined oversight and abort conditions.
Do not evaluate “robotic surgery” as a generic intervention. Evaluate one claim at a time.
| Layer | Core question | Minimum evidence |
|---|---|---|
| Platform safety | Does the robot perform as intended under the cleared indication? | FDA record, device training requirements, malfunction reporting plan, local incident tracking. |
| Surgeon performance | Are operators past the learning curve for the procedure? | Case logs, simulation results, proctored cases, conversion and complication monitoring by surgeon. |
| Procedure outcomes | Does the robotic approach improve outcomes over the local comparator? | Procedure-specific outcomes, risk adjustment, comparable surgeon experience, 30-day and long-term outcomes when relevant. |
| AI perception | Does the model correctly detect instruments, anatomy, phase, smoke, bleeding, or errors? | Sensitivity, specificity, false alarms per hour, time-to-detection, external validation across surgeons and video systems. |
| Human factors | Do users understand when to trust, ignore, or override the system? | Simulation, silent-mode pilots, override audits, alert fatigue monitoring, qualitative workflow assessment. |
| Physical autonomy | Can the system act safely when tissue, lighting, bleeding, and anatomy vary? | Bench, simulation, ex vivo, animal, and eventually prospective clinical testing with predefined abort conditions. |
Match metrics to risk
Low-risk analytics, such as video indexing or case length prediction:
- Annotation accuracy
- Time saved in review or scheduling
- Inter-rater agreement with expert reviewers
- External validation across services
- User adoption and correction burden
Moderate-risk decision support, such as phase recognition or complication prediction:
- Sensitivity, specificity, PPV, NPV at local prevalence
- False alerts per case and per hour
- Time-to-detection before human recognition
- Calibration by procedure and patient subgroup
- Silent-mode performance before clinical use
High-risk intraoperative guidance, such as anatomy labeling or “do not cut” warnings:
- False negative rate for critical structures
- False positive rate causing unnecessary dissection delay
- Performance under blood, smoke, glare, lens fog, obesity, inflammation, adhesions, and variant anatomy
- Confidence display and uncertainty calibration
- Surgeon override and verification behavior
Physical action, including supervised autonomy:
- Task success rate and safety-margin violations
- Tissue trauma, force, thermal spread, bleeding, and clip or suture placement accuracy
- Recovery from near-miss states
- Human takeover latency
- Abort reliability
- Failure mode severity under worst-case scenarios
For irreversible actions, the evaluation threshold must be higher than diagnostic AI. A missed pulmonary nodule can still be reviewed. A divided bile duct cannot be undivided.
Use staged evidence, not one benchmark
The IDEAL framework for surgical robotics emphasizes development, comparative evaluation, and long-term monitoring rather than treating a single trial as the endpoint (Marcus et al., 2024). A practical hospital sequence is:
- Technical verification: confirm FDA indication, service contracts, instrument compatibility, downtime plan, and MAUDE reporting workflow.
- Simulation and dry-lab testing: test surgeon setup, docking, instrument exchange, emergency undocking, and AI display failure.
- Proctored clinical introduction: restrict to selected surgeons and procedures with clear exclusion criteria.
- Silent AI trial: run AI video analytics without clinical display, compare against expert annotation and outcomes.
- Limited visible pilot: display AI outputs to trained users, require manual verification, and audit overrides.
- Service-line deployment: monitor outcomes, costs, case mix, surgeon learning curves, and patient-reported outcomes.
- Post-deployment surveillance: review complications, conversions, reoperations, device malfunctions, video-model drift, and alert burden.
Skipping stages is most dangerous when AI moves from retrospective analytics to real-time intraoperative guidance.
Evals by use case
Surgical video analytics: Evaluate video analytics as a measurement system before treating them as a quality or safety intervention. Minimum evidence includes public and local test sets separated by surgeon, site, patient factors, and video system; inter-rater agreement for ground-truth labels; performance under blood, smoke, glare, lens cleaning, and off-axis camera views; error taxonomy; and prospective silent-mode validation before clinical display.
Robotic skill assessment: Automated performance metrics can reduce subjectivity in surgical education, but they must not collapse skill into speed or motion economy alone. A systematic review in the British Journal of Surgery found heterogeneous tools for robotic technical skills assessment and emphasized validity and reliability as central requirements (Boal et al., 2024). Minimum evidence includes correlation with blinded expert ratings, predictive validity for patient outcomes or supervised entrustment decisions, fairness across training level and prior robotic exposure, separation of technical execution from case complexity, and feedback that identifies remediable behaviors.
Anatomy labeling and warning systems: Anatomy labeling is high risk because confident false labels can create false reassurance. The safer near-term design is an uncertainty-aware warning system rather than an authoritative “this is the duct” display. Minimum evidence includes critical-structure false negative rates under worst-case visual conditions, confidence calibration, “unknown” states, stress tests with inflammation and variant anatomy, and explicit prohibition on irreversible action based on AI label alone.
Autonomous and semi-autonomous subtasks: Research is moving quickly. SRT-H used language-conditioned imitation learning for autonomous ex vivo cholecystectomy steps and achieved 100% success across 8 unseen pig gallbladders (Kim et al., 2025). A separate Science Robotics study introduced a surgical embodied intelligence simulator and demonstrated task autonomy across simulated, ex vivo, and in vivo animal settings (Long et al., 2025). These studies show meaningful progress in perception, planning, recovery, and sim-to-real transfer. They do not establish clinical readiness. Human surgery adds live bleeding, patient motion, anesthetic constraints, instrument failures, legal accountability, and rare anatomy that small experimental samples cannot resolve.
Procurement and governance
Robotic surgery programs should be governed like service-line investments. The evaluation should compare robotic surgery against the institution’s current best alternative, not against a theoretical average laparoscopic program.
Core local metrics:
- Case volume by procedure and surgeon
- Conversion to open surgery
- Intraoperative injury and bleeding
- Operative time, docking time, turnover time, and late-day delays
- 30-day complications, readmissions, emergency department returns, and reoperations
- Cancer-specific outcomes where relevant
- Patient-reported pain, function, urinary, sexual, and quality-of-life outcomes when relevant
- Direct and total costs, including instruments, disposable supplies, service contracts, staffing, depreciation, and OR time
- Training throughput and effects on laparoscopic competency
A 2025 systematic review of cost analyses in randomized trials found that robotic-surgery cost analyses are often incomplete, which means hospitals should not accept generic cost-effectiveness claims without local accounting (Bosscha et al., 2025).
Robotic surgery adoption can be shaped by marketing, patient demand, hospital competition, and industry relationships. The 2026 JAMA Network Open study linking industry payments to increased robotic-assisted surgery use does not prove inappropriate care, but it does justify governance around disclosure, credentialing, and value review (San Loh et al., 2026).
Robotic surgery committees should include surgery, anesthesia, nursing, sterile processing, biomedical engineering, finance, compliance, patient safety, and informatics. AI-enabled modules add model governance, data governance, and cybersecurity requirements.
Red flags:
- FDA status is unclear, misrepresented, or outside the intended use
- The vendor describes teleoperation as autonomy
- Outcomes are reported without comparator, case mix, surgeon experience, or learning-curve context
- AI analytics are trained on one institution’s videos and deployed elsewhere without external validation
- Anatomy labels are displayed without uncertainty or “unknown” states
- The system cannot export error logs, model version, video timestamp, and user override records
- Case volume is too low to maintain proficiency or amortize cost
- Training emphasizes platform operation but not failure recognition and emergency undocking
- Patient-facing marketing implies superior outcomes without procedure-specific evidence
No robotic-surgery AI should be trusted for irreversible intraoperative action without independent surgeon verification.
Postoperative AI Applications
Complication Prediction
Surgical Site Infection (SSI) Prediction:
ML models predict SSI risk using:
- Patient factors (diabetes, obesity, smoking, immunosuppression)
- Operative characteristics (duration, complexity, contamination class)
- Intraoperative variables (glucose control, normothermia, antibiotic timing)
- Postoperative factors (drain output, pain scores)
Evidence: Modest improvements over clinical judgment alone (AUC 0.75-0.80 vs. 0.70-0.72).
Limitations:
- High false positive rates (30-40%) limit actionability
- Shouldn’t guide prophylactic antibiotic decisions (risk of resistance)
- Best use: Enhanced surveillance for high-risk patients
Postoperative Delirium:
Prediction models incorporating preoperative cognitive assessment, anesthesia factors, and postoperative medications identify high-risk patients for:
- Non-pharmacologic prevention (reorientation, sleep hygiene, family presence)
- Avoidance of deliriogenic medications
- Enhanced monitoring
Evidence: Better than clinical intuition, but delirium remains multifactorial and incompletely preventable.
Anastomotic Leak Prediction:
ML models analyzing postoperative labs (CRP trajectory), vital signs, and clinical notes can identify leak risk earlier than clinical suspicion alone.
Challenge: Rare outcomes (1-5% incidence) make model training difficult and false positive rates high.
Deterioration Monitoring
AI systems analyzing continuous vitals, lab trends, nursing documentation, and medication administration can detect patterns predicting clinical deterioration 6-12 hours before conventional early warning scores.
Applications:
- Postoperative hemorrhage
- Respiratory failure
- Sepsis
- Cardiac events
Evidence: Detection performance generally good, but high false positive rates create alert fatigue (similar to sepsis prediction challenges discussed in Emergency Medicine) (Wong et al., 2021; Beam & Kohane, 2018).
Best Implementation: Integrate AI alerts with rapid response team protocols and ensure alerts are actionable (not just “patient is high-risk”) (Topol, 2019).
Surgical Quality and Education
Video-Based Surgical Assessment
AI analysis of surgical videos enables objective skill assessment and quality improvement.
Applications:
Skill Scoring:
- Objective assessment of technical performance
- Identifies specific errors (tissue trauma, bleeding, inefficiency)
- Provides quantitative feedback for training
Evidence: AI scores correlate strongly with expert human assessment and predict surgical outcomes (Lavanchy et al., Scientific Reports, 2021).
Benefits for surgical education:
- Objective feedback supplements subjective faculty evaluation
- Tracks skill progression over time
- Identifies specific areas needing improvement
- Benchmarks against peer performance
Quality Improvement:
- Retrospective review of complications to identify technical factors
- Process improvement for OR efficiency
- Standardization of surgical techniques
Challenges:
- Privacy and medicolegal concerns about routine recording
- Surgeon resistance to surveillance
- Doesn’t capture decision-making quality (only technical execution)
- Storage and analysis infrastructure requirements
Natural Language Processing for Operative Notes
AI extraction of structured data from operative notes enables:
Quality Metrics:
- Automated calculation of process measures (antibiotic timing, VTE prophylaxis)
- Complication detection from dictated notes
- Adherence to surgical best practices
Registry Auto-Population:
- Reduces manual data entry burden for NSQIP, VASQIP, other registries
- Improves data completeness and accuracy
Clinical Decision Support:
- Extraction of critical operative details for downstream care (mesh type in hernia repair, prosthesis in joint replacement)
Evidence: High accuracy (>95%) for structured data elements. Challenges remain for nuanced surgical findings and judgment-based assessments.
Specialty-Specific Applications
Different surgical specialties face unique challenges and opportunities for AI integration:
General Surgery
- Hernia recurrence risk prediction
- Cholecystectomy difficulty scoring
- Bile duct injury prevention (research phase)
Orthopedic Surgery
- Fracture detection AI (high accuracy for simple fractures)
- Joint replacement planning and component sizing
- Spinal navigation systems (FDA-cleared)
- Ligament injury diagnosis from MRI
Neurosurgery
- Brain tumor segmentation for resection planning
- Epilepsy focus localization
- Surgical navigation systems
- Intraoperative tumor margin assessment (research)
Cardiac Surgery
- Surgical risk models (STS score enhanced with ML)
- Intraoperative echocardiography interpretation
- ICU outcome prediction
Perioperative hypotension prediction: The HYPE-2 randomized trial tested a machine-learning-derived Hypotension Prediction Index with diagnostic guidance during elective on-pump cardiac surgery and ICU care. Among 130 patients included in the primary analysis, the intervention reduced the median time-weighted average of MAP below 65 mm Hg by 63% and reduced time spent in hypotension by a median 28 minutes versus standard care (Schuurmans et al., 2025). The study supports protocolized hemodynamic decision support in cardiac anesthesia, but it was single-center and measured hypotension burden, not downstream complications or mortality.
Thoracic Surgery
- Lung nodule characterization from CT
- Surgical approach selection (VATS vs. thoracotomy)
- Lymph node metastasis prediction
Vascular Surgery
- AAA rupture risk prediction
- Vascular anatomy segmentation
- Endovascular procedure planning
Plastic Surgery
- Breast reconstruction outcome prediction
- Aesthetic outcome simulation
- Flap viability monitoring (research)
Breast Surgery
Claire (Perimeter Medical Imaging AI) – First FDA-Approved Intraoperative AI for Breast Cancer Surgery:
In March 2026, Claire received FDA Premarket Approval (PMA) as the first AI-enabled imaging device for intraoperative breast cancer margin assessment during breast-conserving surgery (lumpectomy) in patients with stage 0-III invasive ductal carcinoma or DCIS (Perimeter Medical Imaging AI, 2026).
- Uses proprietary wide-field optical coherence tomography (OCT) combined with AI trained on over 2 million breast tissue images
- Provides 10x higher resolution than X-ray or ultrasound at 2 mm depth (the clinically relevant margin width)
- Pivotal trial (206 patients): 88.1% overall accuracy in margin assessment, with statistically significant reduction in residual cancer vs. standard of care (p=0.0050)
- In 40% of patients, Claire detected residual disease missed by standard methods (palpation and specimen radiograph)
- Includes a predetermined change control plan allowing AI enhancements without further FDA review
Clinical significance: Positive surgical margins after lumpectomy require reoperation in approximately 20% of US cases. Real-time intraoperative margin assessment could reduce re-excision rates, additional surgeries, patient anxiety, and healthcare costs. Claire does not replace standard histopathology; it provides additional intraoperative information to guide shave excision decisions.
Critical Limitations and Risks
Immediacy of Harm: Unlike diagnostic errors that can be caught through physician review, intraoperative AI errors cause immediate, potentially irreversible patient harm.
Complexity of Surgical Judgment: Surgery requires integration of visual, tactile, and proprioceptive information with anatomical knowledge, pattern recognition from thousands of prior cases, and real-time adaptation to unexpected findings. AI doesn’t replicate this.
Medicolegal Implications: If a surgeon follows AI guidance and causes injury, liability is clear: the surgeon is responsible. If surgeon ignores AI warning and causes injury, plaintiff’s attorneys will argue AI was ignored. This creates defensive pressure to over-rely on AI even when clinical judgment suggests otherwise.
Technology Failure Modes: Computer vision fails with blood, smoke, optical artifacts. ML models fail with out-of-distribution inputs (unusual anatomy, rare findings). Risk models fail when patient circumstances differ from training data.
Trust Calibration: Surgeons must neither over-trust (following AI suggestions without verification) nor under-trust (ignoring useful AI alerts). Achieving appropriate calibration is difficult (Char et al., 2018).
Regulatory and Medicolegal Considerations
FDA Regulation of Surgical AI
- Surgical planning software: Class II (510k clearance)
- Surgical navigation systems: Class II (moderate-risk devices)
- Autonomous surgical robots: Would be Class III (PMA required)
- Risk calculators: Often considered clinical decision support (no FDA oversight)
Medicolegal Principles
Surgeons remain legally responsible for AI-assisted decisions. Key documentation practices:
- Informed consent should mention AI use when material to patient decision
- Documentation should note AI tools used and how output was interpreted
- Malpractice risk if AI recommendation followed without independent verification
The Liability Dilemma
- Following AI that’s wrong: Surgeon liable for not exercising independent judgment
- Ignoring AI that’s right: Plaintiff attorneys argue surgeon ignored available technology
- Best practice: Document independent verification of AI outputs, explain clinical reasoning when overriding AI recommendations
Evidence-Based Guidelines for Surgical AI Adoption
Before Adopting Any Surgical AI:
- Demand evidence: Prospective validation studies in diverse populations, not just retrospective accuracy metrics (Nagendran et al., 2020)
- Understand training data: Was the model trained on cases like yours? (Procedure types, patient populations, institutional practices) (Beam & Kohane, 2018)
- Know the failure modes: How does the system fail? What are the error rates? What happens with unusual cases? (Vabalas et al., 2019)
- Assess workflow integration: Does this fit your existing workflow or require disruptive changes?
- Clarify liability: What does your malpractice carrier say about using this AI? What does hospital legal counsel advise?
- Verify regulatory status: Is this FDA-cleared? For what specific indication?
- Evaluate cost-effectiveness: Does the benefit justify the cost (both financial and cognitive/workflow burden)?
Safe Implementation Practices:
- Pilot testing: Start with low-stakes applications, expand carefully based on performance
- Parallel validation: Run AI alongside current practice, compare results before replacing current approach
- Defined oversight: Clear protocols for who reviews AI outputs and how discrepancies are resolved
- Incident reporting: Systems to capture AI errors or near-misses
- Ongoing validation: Monitor real-world performance, don’t assume initial validation persists indefinitely
- User training: Ensure all users understand AI capabilities, limitations, and appropriate use
- Informed consent: Discuss AI use with patients when material to their decision-making
Red Flags (Avoid These AI Systems):
- Claims of autonomous surgical decision-making
- Black-box models with no explanation of predictions
- Lack of prospective validation studies
- Vendors unwilling to disclose training data characteristics
- No mechanism for reporting errors or failures
- Regulatory status unclear or misrepresented
- Pressure to adopt without adequate evaluation period
Professional Society Guidelines on AI in Surgery
The American College of Surgeons has established significant AI infrastructure:
Leadership:
- Dr. Genevieve Melton-Meaux appointed as inaugural Chief Health Informatics Officer (2024)
- Practicing colorectal surgeon and director of the Center for Learning Health System Sciences at University of Minnesota
Educational Programs:
- “Artificial Intelligence and Machine Learning: Transforming Surgical Practice and Education” - online course available since 2023
- Clinical Congress sessions addressing ethical and regulatory AI considerations
Strategic Direction: The ACS emphasizes that surgeons must take the lead in integrating AI, defining how it affects their practice, and influencing what good patient care means. If surgeons don’t step up, what defines successful surgery will be decided by others.
AI Applications Recognized by ACS
The ACS recognizes three primary AI categories transforming surgical practice:
- Ambient AI: Automated documentation of surgical encounters and procedures
- Prediction tools: Perioperative risk assessment and outcome prediction
- Research and writing solutions: Literature review, manuscript preparation assistance
NSQIP and Risk Prediction
The ACS National Surgical Quality Improvement Program (NSQIP) Surgical Risk Calculator represents one of the most validated AI-adjacent tools in surgery:
- Developed from outcomes data on millions of surgical patients
- Provides patient-specific risk predictions for major complications
- Continuously updated with new outcome data
- Endorsed by ACS as a shared decision-making tool
SAGES Guidelines
The Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) has engaged with AI particularly in:
- Computer vision for surgical field analysis
- Real-time anatomical structure identification during laparoscopic procedures
- Surgical video analysis for quality improvement and training
Implementation Note: SAGES emphasizes that AI in the OR must be validated for the specific surgical context and population before clinical deployment.
Future Directions
Realistic Near-Term Progress (2-5 years)
- Routine integration of ML risk calculators into preoperative clinics
- Expanded use of AI surgical planning for complex cases
- Video-based quality feedback becoming standard in training
- Better postoperative monitoring with AI-augmented early warning systems
Medium-Term Possibilities (5-10 years)
- Improved real-time anatomical recognition (still with human verification required)
- Context-aware intraoperative decision support (suggestions, not autonomous action)
- Personalized surgical technique optimization based on patient anatomy
- Semi-autonomous robotic assistance for specific sub-tasks under continuous human supervision
Long-Term Speculation (10+ years)
- Highly accurate real-time tissue characterization (pathology-level information intraoperatively)
- Predictive models anticipating surgical course and complications with high accuracy
- Integration of multi-omic patient data into surgical decision-making
- Robotic systems handling increasing proportions of routine surgical tasks (still under surgeon control)
Unlikely Despite Hype
- Fully autonomous robotic surgery without surgeon in the loop
- AI replacing surgical judgment for complex, high-stakes decisions
- Elimination of surgical complications through AI
Conclusion
Surgery is fundamentally a human activity requiring manual skill, real-time judgment, and adaptation to unique patient circumstances. AI can enhance the cognitive work surrounding surgery (risk assessment, planning, quality improvement) and may eventually provide useful intraoperative information. But the surgeon’s hands, eyes, judgment, and responsibility remain central.
The most successful surgical AI applications will be those that respect the complexity of surgery, acknowledge uncertainty transparently, augment rather than replace expertise, and prioritize patient safety over technological impressiveness.
Surgeons should embrace AI as a powerful adjunct while maintaining the healthy skepticism, independent verification, and personal accountability that define good surgical practice.
Check Your Understanding
Scenario 1: AI Risk Calculator Overestimates Surgical Risk
You’re a colorectal surgeon evaluating an 82-year-old woman with Stage III colon cancer. She’s otherwise healthy: active, independent ADLs, no major comorbidities, ECOG 0.
AI surgical risk calculator (MySurgeryRisk) estimates:
- 30-day mortality risk: 18%
- Major complication risk: 45%
- Recommendation: “High risk - consider non-operative management”
Traditional ACS NSQIP calculator estimates:
- 30-day mortality: 3.2%
- Major complication: 12%
Patient’s oncologist refers to you stating “AI says surgery too risky. Recommend palliative chemo only.”
Your clinical assessment: Patient is good surgical candidate. Age alone shouldn’t preclude curative surgery. Frailty assessment normal. Cardiopulmonary exam reassuring.
Decision point: Do you:
- Follow AI recommendation, refer to medical oncology for palliative chemotherapy
- Override AI, recommend surgery based on your clinical judgment
Answer 1: What explains the discrepancy between AI and traditional calculators?
AI model likely over-weighted age without considering:
- Functional status: Patient is ECOG 0, independent, not frail
- Comorbidity burden: Minimal comorbidities despite age 82
- Fitness indicators: Normal cardiopulmonary reserve
Potential AI training bias:
- If AI trained on data where older patients had higher complication rates, model may learn “age 80+ = high risk” without distinguishing fit vs. frail
- Simpson’s paradox: Age correlates with frailty in training data, but THIS patient defies that correlation
Traditional NSQIP calculator:
- Uses validated risk factors (ASA class, functional status, comorbidities)
- May better account for physiologic age vs. chronologic age
Answer 2: What are the liability implications of each choice?
Choice A (Follow AI, decline surgery):
Plaintiff argument (if patient dies from untreated cancer):
- Surgeon inappropriately deferred to AI algorithm
- Failed to exercise independent clinical judgment
- Denied patient potentially curative treatment based on flawed AI estimate
- Standard of care requires surgeons to assess patient individually, not defer to algorithm
Legal precedent: Multiple cases where physicians found liable for following decision support tools that contradict clinical judgment
Choice B (Override AI, proceed with surgery):
If patient has major complication or dies:
Plaintiff argument:
- Surgeon ignored AI warning of 18% mortality risk
- Proceeded with high-risk surgery against AI recommendation
- Reckless disregard for patient safety
Defense argument:
- AI is decision support tool, not substitute for clinical judgment
- Surgeon’s assessment (frailty, functional status, cardiopulmonary reserve) more accurate than AI age-based estimate
- Standard of care requires individualized assessment, not algorithmic adherence
- Traditional NSQIP calculator (validated, widely used) supported decision
- Patient underwent informed consent understanding risks
Likely outcome: Defense verdict if surgeon documented thorough clinical assessment, explained AI discrepancy, obtained informed consent discussing both AI and traditional estimates.
Answer 3: How should you handle this AI-clinical judgment conflict?
Appropriate approach:
- Investigate AI discrepancy
- Review AI inputs: What features drove high-risk estimate?
- Compare with traditional validated tools (NSQIP, ERAS)
- Consult surgical colleagues: Would they operate on this patient?
- Comprehensive clinical assessment
- Gait speed, grip strength (frailty markers)
- Cardiopulmonary exercise testing if available
- Geriatric assessment
- Functional status (independent vs. dependent ADLs)
- Multidisciplinary discussion
- Present case at tumor board
- Geriatric surgery consult if available
- Anesthesia risk assessment
- Transparent informed consent
- Discuss both AI and traditional risk estimates with patient
- Explain why estimates differ (age vs. physiologic status)
- Present alternatives (surgery, chemotherapy alone, observation)
- Document: “AI calculator estimated 18% mortality; however, clinical assessment suggests patient is physiologically fit. Traditional NSQIP calculator estimates 3.2% mortality. Discussed both estimates with patient. Patient understands risks, chooses surgery.”
- Document clinical reasoning
- “AI risk calculator estimates high risk primarily based on age 82. However, patient demonstrates excellent functional status (ECOG 0, independent ADLs, normal gait speed), minimal comorbidities, normal cardiopulmonary reserve. Traditional NSQIP calculator estimates mortality 3.2%. Clinical judgment: patient is appropriate surgical candidate. AI estimate likely over-weighted chronologic age without adequate consideration of physiologic fitness.”
Lesson: AI risk calculators are tools to inform, not dictate, surgical decisions. When AI conflicts with clinical judgment and validated traditional tools, surgeon must exercise independent assessment. Age alone should not preclude surgery in fit older adults. Document thorough reasoning when overriding AI recommendations.
Scenario 2: Intraoperative AI Misidentifies Critical Anatomy
You’re performing robotic-assisted partial nephrectomy for small renal mass using da Vinci Xi with integrated AI “Surgical Intelligence” system.
AI system features:
- Real-time anatomical labeling (kidney, renal artery, renal vein, ureter, tumor)
- Proximity alerts when instruments near critical structures
- Augmented reality overlay on surgical view
Intraoperative event:
During hilar dissection, AI labels renal artery and renal vein on display. You prepare to clamp renal artery for tumor excision.
Your visual assessment: Structure labeled “renal artery” appears larger than expected, bluish tint, pulsations not prominent.
Uncertainty: Is this truly renal artery or is AI mislabeling renal vein as artery?
Decision point: Do you:
- Trust AI label, clamp structure labeled “renal artery”
- Pause, verify anatomy manually before clamping
You choose: Option B (pause and verify)
Manual verification: Doppler ultrasound confirms structure labeled “renal artery” is actually renal vein. True renal artery is 2mm posterior, unlabeled by AI.
If you had clamped based on AI label: Would have clamped renal vein, not artery → inadequate ischemic control → bleeding during tumor excision, potential need for total nephrectomy.
Answer 1: Why did the AI mislabel critical anatomy?
AI computer vision failure modes:
Anatomical variation: This patient had variant renal vascular anatomy (early branching, aberrant vessel course)
- AI trained on typical anatomy
- Variants (present in 20-30% of patients) not well-represented in training data
Tissue appearance similarity: Renal artery and vein can appear similar on video (both red/pink, both tubular)
- AI relies on position, caliber, pulsatility
- In variant anatomy, typical positional relationships disrupted
Partial occlusion: Surgical manipulation may have partially occluded artery → reduced pulsations → AI misidentified as vein
Confidence threshold: AI may have been 60-70% confident (below human comfort level) but still displayed label without uncertainty indication
Answer 2: What are the liability implications if you had clamped the wrong vessel?
If you clamped renal vein instead of artery:
Immediate consequences:
- Inadequate tumor ischemia → bleeding during excision
- Potential renal vein thrombosis
- May require total nephrectomy instead of partial
- Patient loses kidney function unnecessarily
Malpractice analysis:
Plaintiff argument:
- Surgeon blindly followed AI labeling without manual verification
- Failed to exercise fundamental surgical principle: verify anatomy before clamping/cutting
- Fell below standard of care by deferring anatomical judgment to AI
- Patient lost kidney due to surgeon’s inappropriate reliance on technology
Defense argument:
- AI was marketed as “surgical intelligence” system
- Reasonable to rely on technology validated by manufacturer, FDA-cleared
- Anatomical variation not surgeon’s fault
- Damage was not from negligence, but from AI error
Likely outcome:
- Plaintiff verdict likely: Courts hold surgeons to personal anatomical verification standard
- “AI told me to” is NOT valid defense
- Fundamental principle: surgeon must personally verify anatomy before irreversible action
- FDA clearance of AI tool does not absolve surgeon of personal responsibility
Precedent: In Smith v. Hospital (hypothetical but representative), surgeon relied on navigation system for spine surgery, placed pedicle screw in wrong location causing nerve injury. Court ruled surgeon liable despite navigation system error: “technology augments but does not replace surgeon’s duty to verify.”
Answer 3: What are the appropriate use principles for intraoperative AI?
Surgical AI as “junior resident”:
- AI suggestions are hypotheses, not facts
- AI labels = “This might be renal artery”
- Surgeon verifies = “I confirm this is renal artery”
- Verify before irreversible action
- Before clamping, cutting, coagulating: manual confirmation
- Use additional tools: Doppler, manual palpation, ICG angiography, direct visualization
- Heightened skepticism in variant anatomy
- If anatomical landmarks don’t match expected positions
- If AI labels conflict with visual assessment
- If patient has known anatomical variants (duplicated vessels, horseshoe kidney)
- Demand uncertainty quantification
- AI should display confidence levels
- “Renal artery (92% confident)” vs. “Renal artery (60% confident)”
- Low confidence → require additional verification
- Continuous cross-checking
- Compare AI labels with your visual assessment at each step
- If discrepancy, investigate before proceeding
Institutional safeguards:
- Training requirements
- Surgeons using AI-augmented systems must complete training on:
- AI failure modes
- When to trust vs. verify AI
- Manual verification techniques
- Surgeons using AI-augmented systems must complete training on:
- Quality assurance
- Review cases where AI labeling was incorrect
- Share at M&M conferences
- Track AI error rates by anatomy type, procedure
- Documentation
- When AI labeling conflicts with surgeon assessment, document:
- “AI labeled [structure] as [label]; however, manual verification with [Doppler/ICG/palpation] confirmed [correct identity]”
- When AI labeling conflicts with surgeon assessment, document:
Lesson: Intraoperative AI is assistive, not authoritative. Surgeons remain responsible for anatomical identification regardless of AI labels. Verify critical anatomy manually before irreversible actions. “Trust but verify” is insufficient. Standard should be “Verify independently, AI assists.”
Scenario 3: Postoperative AI Alert Fatigue
You’re surgical quality director implementing AI-based early warning system (Rothman Index, commercial product) for postoperative complication detection.
AI system: Analyzes vital signs, lab values, nursing assessments every 15 minutes. Generates alert when patient predicted to be at increased risk for:
- Sepsis
- Respiratory failure
- Acute kidney injury
- Need for ICU transfer
Month 1 performance:
- Alerts generated: 847 alerts across 320 postoperative patients (2.6 alerts per patient)
- True positives: 23 patients developed complications flagged by AI
- False positives: 824 alerts did not correspond to actual complications
- False positive rate: 97.3%
- Positive predictive value: 2.7%
Clinical impact:
- Nursing staff overwhelmed by alerts
- Most alerts dismissed as “AI crying wolf”
- Alert fatigue setting in (nurses ignoring alerts)
Week 4 critical event:
- 62-year-old man, post-colectomy day 2
- AI generates alert at 2 AM: “High risk for sepsis - recommend immediate evaluation”
- Night nurse dismisses alert (patient appears stable, vital signs acceptable)
- No physician notification
- 6 AM: Patient found hypotensive (BP 82/45), tachycardic (HR 128), altered mental status
- Diagnosis: Anastomotic leak with peritonitis and sepsis
- Patient requires emergent return to OR, ICU care
- Prolonged hospital stay, family files complaint: “Why wasn’t the AI alert acted on?”
Answer 1: What caused the alert fatigue?
High false positive rate driven by:
- Low disease prevalence
- True complication rate: ~7% of post-op patients
- AI optimized for high sensitivity (catches 23/25 true complications = 92% sensitivity)
- But: At 7% prevalence with 92% sensitivity, 85% specificity → PPV only 2.7%
- Threshold calibration
- AI vendor set low threshold to maximize sensitivity (fear of missing complications)
- Resulted in extreme false positive burden
- Lack of clinical context
- AI analyzes physiologic data only
- Does not know: patient just returned from 2-hour physical therapy session (explains elevated HR), patient received fluid bolus (explains improved BP trends), patient had expected postoperative fever
- Poor alarm design
- All alerts same priority level
- No distinction between “mild concern” vs. “urgent evaluation needed”
- No incorporation of clinical trajectories (improving vs. worsening trends)
Alert fatigue:
- 824 false positives → nurses learn “AI alerts usually wrong”
- Cognitive bias: When 97.3% of alerts are false, dismissing alerts becomes learned behavior
- The 23 true positives get lost in noise
Answer 2: Who is liable for the missed anastomotic leak?
Potentially both hospital and individual nurse:
Hospital institutional liability:
Plaintiff argument:
- Hospital deployed AI system with 97.3% false positive rate
- Created alert fatigue environment where critical alerts ignored
- Failed to calibrate system before clinical deployment
- Should have monitored alert fatigue, intervened when nurses began dismissing alerts
Nursing liability:
Plaintiff argument:
- Nurse dismissed AI alert without evaluating patient
- Failed to notify physician of high-risk alert
- Did not document why alert was dismissed
- Fell below nursing standard of care
Defense argument (nursing):
- 97.3% false positive rate meant 97 of every 100 alerts were false
- Nurse made reasonable judgment based on clinical assessment (patient appeared stable)
- Hospital created untenable alert burden
- Individual nurse cannot be expected to thoroughly evaluate 2.6 alerts per patient per shift
Likely outcome:
- Shared liability: Hospital bears primary responsibility for deploying poorly calibrated system
- Individual nurse may bear some liability for not documenting assessment and physician notification
Answer 3: How should AI early warning systems be implemented safely?
System calibration:
- Acceptable false positive rate
- Target PPV ≥10-15% (not 2.7%)
- May require reducing sensitivity from 92% → 70-75%
- Trade-off: Catch fewer complications, but those caught are more likely real
- Tiered alert system
- Low priority (informational): “Monitor patient closely”
- Medium priority (nursing assessment): “Evaluate patient within 1 hour”
- High priority (physician notification): “Urgent evaluation needed, notify MD immediately”
- Reserve high-priority alerts for PPV >30%
- Clinical context integration
- Suppress alerts during expected post-op recovery (first 24 hours)
- Incorporate clinical context (patient just ambulated, received fluid bolus, normal post-op fever)
- Trend analysis (worsening vs. stable vs. improving)
Workflow integration:
- Alert response protocol
- High-priority alert → Mandatory nursing assessment within 15 minutes + physician notification
- Document: “AI alert reviewed. Patient assessed. Findings: [stable vs. concerning]. Action: [continued monitoring vs. physician notified].”
- Feedback loop
- Track AI alert accuracy
- Monthly review: How many alerts were true positives?
- Adjust thresholds based on performance
- Human oversight
- Nurse or physician reviews AI alerts, decides which require action
- AI does not page physician directly (human gatekeeper)
Quality monitoring:
- Track alert fatigue
- Monitor alert dismissal rates
- If >80% of alerts dismissed without assessment → system is failing
- Survey staff on alert burden monthly
- Audit missed complications
- For every complication, determine: Did AI alert? Was alert acted on?
- If multiple complications missed due to dismissed alerts → pause system, recalibrate
- Continuous improvement
- Vendor partnership: Provide feedback on false positives
- Request threshold adjustment or better risk stratification
Lesson: AI early warning systems can improve outcomes only if positive predictive value is high enough to avoid alert fatigue. A system with 97% false positive rate creates more harm (ignored alerts, missed complications) than benefit. Implementation requires careful calibration, tiered alerts, clinical context, and continuous monitoring. “High sensitivity” is not enough. PPV must be clinically actionable (≥10-15% minimum).