AI Tools Every Physician Should Know
Over 1,200 AI medical devices have FDA clearance, but most physicians don’t know which ones actually work. This chapter cuts through marketing hype to identify validated tools you can deploy today: diagnostic AI with prospective trial data, ambient scribes that save 1-2 hours daily, and specialty-specific applications backed by peer-reviewed evidence.
After reading this chapter, you will be able to:
- Identify FDA-cleared and clinically validated AI tools by specialty
- Understand capabilities and limitations of each tool category
- Evaluate tools appropriate for your practice setting
- Navigate privacy, liability, and reimbursement considerations
- Distinguish evidence-based tools from marketing hype
- Access hands-on resources for learning AI tools
Category 1: Clinical Decision Support (CDS)
Traditional CDS (Pre-AI Era)
UpToDate - Type: Evidence-based clinical reference - AI Features: Recently adding AI-powered literature review, question answering - Evidence: Widely adopted, associated with improved outcomes in observational studies (Isaac et al., 2012) - Cost: Institutional/individual subscriptions ($500-700/year individual) - Strength: Trusted source, regularly updated, covers all major specialties - Limitation: Not “AI” in modern sense, though adding AI features
DynaMedex with Dyna AI - Type: Clinical decision support combining DynaMed (evidence-based clinical information) and Micromedex (drug information) with AI integration - AI Features: Dyna AI surfaces concise, evidence-based clinical information quickly from exclusively DynaMedex sources - Cost: Free for ACP members - Access: Part of the ACP AI Resource Hub (https://www.acponline.org/clinical-information/clinical-resources-products/artificial-intelligence-ai-resource-hub)
DXplain (Massachusetts General Hospital) - Type: Differential diagnosis generator - Function: Enter findings → generates ranked differential - Evidence: Used since 1980s, educational tool primarily - Cost: Free for medical professionals - Strength: Broad knowledge base - Limitation: Doesn’t narrow differential without clinical judgment
Isabel Healthcare - Type: Differential diagnosis support - Function: Enter patient presentation → suggests diagnoses - Evidence: Some validation studies, primarily educational use - Cost: Subscription-based - Limitation: Accuracy variable, requires clinical interpretation
Modern AI-Enhanced CDS
Epic Sepsis Model - Type: EHR-integrated sepsis prediction - Function: Real-time risk score based on vital signs, labs - Evidence: CONTROVERSIAL - External validation showed 33% sensitivity (missed 67% of cases), 12% PPV (Wong et al., 2021) - Cost: Included with Epic EHR - Strength: Integrated workflow - Limitation: High false positive rates, mixed evidence for clinical benefit - Verdict: Use with caution, understand limitations
WAVE Clinical Platform (ExcelMedical) - Type: Continuous vital sign monitoring + early warning scores - Function: ICU/step-down monitoring, deterioration prediction - Evidence: Some validation in specific settings - Cost: Institutional licensing - Use case: Hospital early warning systems
Category 2: Diagnostic AI
Radiology AI (Comprehensive List)
Intracranial Hemorrhage Detection
Aidoc has become a widely deployed solution for many radiology departments, with implementation at 1,000+ hospitals. The system detects intracranial hemorrhages, pulmonary embolisms, and C-spine fractures that might otherwise be missed or delayed in the queue.
The sensitivity exceeds 95% for ICH in validation studies. Unlike many AI tools that radiologists ignore, this one gets used because the mobile notifications integrate smoothly into existing PACS workflows. Cost is $20-50K per scanner annually, which adds up for multi-scanner facilities. However, if it catches one missed bleed or gets neurosurgery mobilized 30 minutes faster, the return on investment is substantial.
Viz.ai takes a different approach that focuses on care coordination. Beyond detection, when it flags a large vessel occlusion stroke, it simultaneously alerts the stroke team, neurology, and interventional radiology. The published data shows reduction in time-to-treatment (Figurelle et al., 2023), with the VISIION study demonstrating 39% door-to-groin reduction for off-hours large vessel occlusion cases.
The platform has since expanded to pulmonary embolism, aortic dissection, and other time-critical diagnoses. Deployed at 1,700+ hospitals, Viz.ai works best in systems where rapid team mobilization impacts patient outcomes. Cost varies by institutional contract.
RapidAI has significant presence in the neuroradiology space. Beyond hemorrhage detection, it provides ASPECTS scoring, perfusion analysis, and hemorrhage volume quantification. These tools help neurology and neurosurgery make treatment decisions faster. For institutions handling complex stroke cases, RapidAI’s feature set warrants consideration.
Chest X-Ray AI
Lunit INSIGHT CXR tackles the bread-and-butter of emergency and hospital radiology: pneumothorax, nodules, consolidation, pleural effusion, cardiomegaly. The validation data shows solid sensitivity and specificity, and international deployment (particularly strong in Asia and Europe) suggests it performs reliably across different populations and imaging equipment.
What’s interesting about Lunit is they’ve focused on doing common things well rather than promising to detect everything. That focus shows in the performance metrics.
Oxipit ChestLink does one thing: pneumothorax detection. That’s it. And it does it well, with over 95% sensitivity in validation. This narrow focus makes it perfect for ED and ICU triage where missing a pneumothorax has immediate consequences. If you need full chest X-ray analysis across multiple findings, look elsewhere. If you need rapid, reliable pneumothorax flagging, this is worth a look.
qXR from Qure.ai claims to detect 29 different chest X-ray findings, which sounds impressive until you dig into the validation data. Their strongest evidence comes from TB-endemic regions, which makes sense given the company’s focus on emerging markets. The pricing is competitive, and for resource-limited settings where radiologist access is scarce, qXR provides real value. Just understand that a tool optimized for India’s public health challenges may perform differently in US community hospitals.
Mammography AI
iCAD ProFound AI dominates the US mammography AI market, and the dominance is earned. Validation studies show improved cancer detection rates without dramatically increasing false positives (the holy grail of screening mammography AI). Integration with major mammography vendors means most radiology practices can deploy it without replacing existing infrastructure.
The evidence base is solid. Widespread US deployment gives you the advantage of learning from other institutions’ implementation experiences. If you’re evaluating mammography AI, iCAD is the benchmark against which you’ll compare alternatives.
Lunit INSIGHT MMG offers a credible alternative, particularly if you’re in Europe or Asia where it has stronger market presence. The trials show non-inferiority to radiologists, which is the standard we should demand. What makes Lunit interesting is their international validation. If your patient population is demographically diverse, evidence from European and Asian deployments matters.
Hologic Genius AI is your choice if you already use Hologic mammography equipment. The smooth native integration is compelling: no middleware, no separate workstation, just AI baked into your existing workflow. Performance is solid, though not necessarily better than standalone options. The decision here is mostly about workflow simplicity versus best-in-class performance from independent vendors.
Other Imaging Modalities
Arterys (Cardiac MRI, CT Angiography) - FDA Clearance: Multiple - Function: Automated cardiac chamber quantification, vessel analysis - Evidence: Strong validation, time savings - Deployment: Academic medical centers, cardiology practices - Verdict: Leading cardiac imaging AI
HeartFlow FFR-CT - FDA Clearance: Yes - Function: CT-based fractional flow reserve (non-invasive) - Evidence: RCT evidence for reducing unnecessary catheterizations - Reimbursement: CPT codes established - Cost-effectiveness: Demonstrated in studies - Verdict: Excellent evidence, clinically impactful
Paige Prostate represents what medical AI should look like: narrow scope, prospective validation, clear clinical use case. FDA granted De Novo clearance in 2021 after the company demonstrated that their prostate biopsy cancer detection algorithm actually improves detection of high-grade cancer (Pantanowitz et al., 2020).
This isn’t replacing pathologists. It’s flagging suspicious regions for their review, reducing false negatives while maintaining human-in-the-loop oversight. Several pathology labs have deployed it clinically, which tells you the evidence convinced the people who actually have to stake their reputation on the results.
PathAI and Proscia are worth watching. PathAI has strong validation studies across GI, breast, and prostate pathology, with clinical deployments expanding. Proscia focuses on digital pathology infrastructure plus AI modules, which matters because workflow integration often determines whether AI gets used or ignored. No FDA clearances yet for their AI applications, but the research collaborations suggest they’re building systematically rather than rushing to market.
Dermatology AI
3Derm received FDA Breakthrough Device Designation in 2020 for melanoma risk assessment and targets primary care for dermatology triage. The system is still under clinical investigation and has not yet received FDA clearance. The validation studies that exist show performance varies significantly by skin type, a recurring problem with dermatology AI trained predominantly on lighter skin.
SkinVision and similar direct-to-consumer smartphone apps? I wouldn’t recommend them clinically. Variable validation, inconsistent performance, and the systematic review by Freeman et al. (Freeman et al., 2020) showed most consumer dermatology apps lack rigorous validation, with sensitivity ranging from 7-73%. Patients will use them anyway, so you should know what they are. Just don’t endorse them.
Ophthalmology AI
IDx-DR was the first autonomous AI diagnostic system FDA cleared (De Novo pathway, 2018). The prospective RCT showed 87.2% sensitivity and 90.7% specificity for diabetic retinopathy screening (Abràmoff et al., 2018).
Factors contributing to IDx-DR’s clinical adoption include narrow application (referable diabetic retinopathy: yes or no), clear clinical need (primary care physicians need retinal screening but lack ophthalmology expertise), reimbursement pathway (CPT 92229, approximately $50-80), and validation in real primary care settings, not just retrospective datasets.
Deployment has expanded in primary care offices, endocrinology clinics, and federally qualified health centers. For practices seeing diabetic patients with limited ophthalmology referral access, this system warrants consideration.
EyeArt from Eyenuk offers comparable performance, another FDA-cleared diabetic retinopathy screening option with solid validation. RetCAD tackles age-related macular degeneration risk assessment, expanding ophthalmology AI beyond diabetic retinopathy.
Category 3: Documentation and Ambient Scribe AI
FDA Note: These are NOT FDA-regulated (clinical decision support, not diagnostic)
Ambient documentation represents a significant application of AI for reducing physician administrative burden. These systems address documentation time rather than diagnostic accuracy.
Nuance DAX (Dragon Ambient eXperience) is widely deployed in the ambient scribe market. The system listens to patient encounters, transcribes conversations, extracts clinical information, and auto-generates SOAP note drafts. Physicians review, edit, and sign the generated notes.
Vendor-funded studies report ~50% reduction in documentation time, though independent real-world studies show more modest results (8-15% reduction in active documentation time, with larger impact on after-hours work). Thousands of physicians across specialties have adopted ambient scribe technology. Cost runs approximately $600 per physician per month plus setup fees.
Critical caveat: physicians must review AI-generated notes. These systems make errors, miss nuance, and occasionally misinterpret what was said. However, editing a partially correct note is faster than generating one from scratch.
Abridge creates patient-shareable visit summaries in addition to clinical documentation. Recordings and written summaries allow patients to review encounters afterward. User satisfaction is high, and that patient engagement feature differentiates it from pure documentation tools. Cost is competitive with DAX.
Suki adds features beyond transcription, including order placement, ICD/CPT code lookup, and voice-enabled EHR navigation. Deployment is growing, particularly among physicians seeking complete voice-enabled workflows.
DeepScribe and Freed AI target smaller practices. DeepScribe offers ambient transcription for primary care and specialty clinics. Freed AI has a free tier, making it accessible for solo practitioners or small groups testing ambient documentation.
Implementation note: Ambient scribe technology can reduce documentation burden substantially. Physician satisfaction appears genuine, and time savings are measurable. Start with a trial period, expect an initial learning curve, and maintain the practice of reviewing all AI-generated content.
Implementation Considerations:
Benefits: - Significant time savings (1-2 hours/day documentation) - Improved patient eye contact - Reduced burnout - After-hours documentation reduced
Considerations: - Requires physician review (AI makes errors) - Patient consent for recording - Privacy/security (HIPAA-compliant vendors only) - Cost (ROI depends on time saved, productivity gains) - Learning curve (initial weeks slower as physician adapts)
Category 4: Literature Search and Synthesis
PubMed / MEDLINE (with AI enhancements) - Free, covers most clinical scenarios - New features: AI-powered search refinement (limited) - Verdict: Still the gold standard, but time-consuming
Consensus (consensus.app) - Function: AI searches scientific papers, synthesizes findings - Use case: Quick literature review, evidence synthesis - Evidence: Growing adoption among researchers - Cost: Free tier, paid for advanced features - Verdict: Useful for rapid evidence gathering
Elicit (elicit.org) - Function: AI research assistant - finds papers, extracts key info - Use case: Literature review, research questions - Cost: Free tier, paid plans - Verdict: Helpful for systematic searches
Scite.ai - Function: Citation analysis - shows how papers cite each other (supporting, contrasting) - Use case: Evaluating strength of evidence, finding contradictory studies - Cost: Subscription - Verdict: Valuable for critical appraisal
ResearchRabbit - Function: Literature mapping, citation networks - Cost: Free - Verdict: Excellent for exploring research landscapes
Connected Papers - Function: Visual citation networks - Use case: Finding related papers - Cost: Free - Verdict: Great visualization tool
Category 5: Patient Communication
Patient Education
ChatGPT / GPT-4 (with extreme caution) - Capabilities: Generate patient education materials, explain diagnoses - Evidence: Can produce accurate information for common topics (Singhal et al., 2023) - Critical limitations: - Hallucinates (makes up plausible-sounding false information) - No access to patient-specific data - No liability/accountability - May generate outdated or incorrect guidance - Appropriate use: - Draft patient education materials (physician reviews/edits) - Simplify complex medical concepts (verify accuracy) - NOT for patient-specific medical advice - Verdict: Useful tool with physician oversight, NEVER autonomous patient advice
Google Med-PaLM 2 / MedLM - Medical-specific LLM - Evidence: Better performance than GPT-4 on medical licensing exams - Status: Now commercially available through MedLM to allowlisted Google Cloud healthcare customers in the U.S., not approved as a medical device - Verdict: Limited commercial availability, requires Google Cloud approval for medical use cases
Symptom Checkers (Patient-Facing)
Ada Health - Function: Symptom assessment, triage guidance - Evidence: Variable accuracy (30-60% for correct diagnosis in top 3) - Use case: Patient triage (ED vs. urgent care vs. PCP) - Verdict: Triage tool, not diagnostic
Buoy Health - Function: Symptom checker + care navigation - Evidence: Validation studies ongoing - Partnerships: Major health systems integrating - Verdict: Promising for patient navigation
K Health - Function: AI symptom assessment + telemedicine - Model: Subscription-based primary care - Verdict: Integrated care model
Caution on Symptom Checkers: - Accuracy limited (patients may not describe symptoms accurately) - Liability unclear if patients rely on recommendations - Best use: Triage, not diagnosis - Physicians should be cautious recommending specific tools
Category 6: Specialty-Specific Tools
Cardiology
HeartFlow FFR-CT (covered above)
Caption Health (GE HealthCare) - FDA Clearance: Yes - Function: AI-guided cardiac ultrasound acquisition - Use case: Point-of-care echo by non-experts - Evidence: Enables accurate image capture by novices - Verdict: Democratizes cardiac ultrasound
Eko Analysis - Function: Digital stethoscope + AI murmur detection - Use case: Primary care, cardiology - Evidence: Detects valvular heart disease - Verdict: Useful screening tool
Oncology
Tempus - Function: Genomic analysis + treatment matching - Use case: Precision oncology - Evidence: Widely used, NCCN-cited - Verdict: Leading precision oncology platform
Foundation Medicine - Function: Comprehensive genomic profiling - Use case: Cancer treatment selection - Evidence: FDA-cleared assays - Verdict: Gold standard tumor profiling
IBM Watson for Oncology ❌ - Status: DISCONTINUED after failures - Lesson: Marketing ≠ clinical validity
Emergency Medicine
Viz.ai suite (covered above - stroke, PE)
Epic Deterioration Index - Function: Patient deterioration prediction - Evidence: Variable - some validation, implementation challenges - Cost: Included with Epic - Verdict: Use with caution, understand limitations
Anesthesiology
Medtronic GI Genius - FDA Clearance: Yes - Function: AI-assisted colonoscopy (polyp detection) - Evidence: Increases adenoma detection rate - Use case: GI procedures - Verdict: Improves polyp detection
Hands-On: Evaluating AI Tools for Your Practice
Step 1: Identify Clinical Need
Ask: - What problem am I trying to solve? - Is this a real workflow pain point? - Will AI solution improve patient outcomes, efficiency, or satisfaction?
Step 2: Evidence Review
Essential questions: - FDA-cleared? (Check FDA database: accessdata.fda.gov/scripts/cdrh/cfdocs/cfPMN/pmn.cfm) - Peer-reviewed publications? (PubMed search) - Prospective validation? (Not just retrospective) - External validation? (Multiple institutions, populations) - Performance in MY setting? (Demographics, EHR, workflow)
Step 3: Workflow Assessment
Integration: - EHR-integrated or standalone? - Number of clicks? - Time added or saved? - Who operates it? (Physician, MA, nurse?)
Step 4: Financial Analysis
Costs: - Licensing fees (annual, per-study, per-patient) - Hardware (servers, cameras, specialized equipment) - Personnel (training, IT support, clinical champions) - Maintenance and updates
ROI: - Time saved (value your time) - Reimbursement (CPT codes available?) - Quality metrics (value-based care bonuses) - Risk reduction (fewer malpractice claims) - Patient satisfaction (retention, referrals)
Step 5: Pilot Testing
Before full deployment: - Retrospective testing on YOUR data - Small pilot with limited users - Collect feedback (physician, patient, staff) - Measure impact (time, accuracy, satisfaction) - Identify failure modes
Step 6: Continuous Monitoring
Post-deployment: - Quarterly performance reviews - User feedback collection - False positive/negative tracking - Clinical outcome monitoring - Vendor support responsiveness
Red Flags: When to Avoid AI Tools
No FDA clearance for diagnostic applications (wellness/CDS exceptions)
No peer-reviewed publications (only vendor whitepapers)
No external validation (only tested at vendor institution)
Vendor refuses to share performance data (lack of transparency)
Claims that seem too good to be true (“99.9% accuracy,” “replaces physicians”)
Unclear data use policies (who owns data, how is it used)
Poor customer references (other physicians had negative experiences)
Overly complex integration (requires major workflow changes)
No clear clinical value proposition (solution looking for problem)
Resources for Staying Current
FDA AI Device Database: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-enabled-medical-devices
Medical AI Research: - npj Digital Medicine (Nature) - The Lancet Digital Health - JAMA Network Open (AI sections) - Radiology: Artificial Intelligence
Professional Organizations: - Society for Imaging Informatics in Medicine (SIIM) - American Medical Informatics Association (AMIA) - Radiological Society of North America (RSNA) AI sessions
Conferences: - RSNA (annual AI showcase) - HIMSS (health IT focus) - ML4H (Machine Learning for Health - NeurIPS workshop)
The Clinical Bottom Line
Prioritize evidence: FDA clearance, peer-reviewed validation, prospective studies
Start with proven applications: Diabetic retinopathy screening, ambient documentation, specific radiology tasks
Evaluate for YOUR setting: External validation data, your patient population, your workflow
Calculate real ROI: Time savings, quality metrics, reimbursement, risk reduction
Pilot before full deployment: Test on your data, collect feedback, identify failures
Avoid red flags: No evidence, no transparency, too-good-to-be-true claims
Continuous monitoring essential: Performance can drift, vigilance required
Patient communication matters: Transparency, consent, addressing concerns
You remain responsible: AI is tool, liability stays with physician
Field evolving rapidly: Stay current, re-evaluate tools regularly
Next Chapter: We’ll dive deep into Large Language Models (ChatGPT, GPT-4, Med-PaLM) for clinical practice: capabilities, limitations, and safe usage guidelines.