Dermatology
Dermatology’s visual nature made it an early AI target. But the algorithmic bias crisis threatens to worsen existing health disparities. Melanoma detection algorithms trained predominantly on light skin show 27-percentage-point sensitivity gaps between Fitzpatrick I-II and V-VI skin. Consumer melanoma apps achieve sensitivities as low as 7%. Black patients with melanoma already face 1-2 years longer time to diagnosis and 52% advanced-stage presentation rates. Deploying biased AI would deepen these inequities. This chapter examines what works, what fails, and why skin tone matters.
After reading this chapter, you will be able to:
- Evaluate AI systems for skin cancer detection and dermatologic diagnosis
- Understand performance variations across different skin tones (Fitzpatrick scale I-VI)
- Critically assess direct-to-consumer melanoma detection apps and their dangers
- Navigate FDA regulatory landscape for dermatology AI devices
- Recognize dataset bias and its clinical consequences
- Apply evidence-based frameworks for evaluating dermatology AI before clinical adoption
- Counsel patients on appropriate and inappropriate use of skin lesion apps
Introduction: The Promise and Peril
Dermatology presents a unique opportunity for computer vision AI. Unlike radiology (where AI analyzes standardized DICOM images) or pathology (where AI analyzes stained tissue slides), dermatology involves visual assessment of external skin surfaces. This accessibility has led to:
The promise: - Non-invasive image capture (smartphone, dermoscopy, clinical photography) - Large potential training datasets (millions of skin lesion images) - Clear diagnostic tasks (benign vs. malignant, specific diagnoses) - Potential to expand access (teledermatology, underserved areas)
The peril: - Training data heavily biased toward light skin - Direct-to-consumer apps marketed without adequate validation - False reassurance leading to delayed diagnosis of melanoma - Worsening of existing racial health disparities
This chapter focuses on what evidence reveals about dermatology AI performance, with particular emphasis on the skin tone bias crisis that threatens to make AI a tool that widens rather than narrows health inequity.
Part 1: The Skin Tone Bias Crisis
The Fitzpatrick Scale
The Fitzpatrick skin type classification system categorizes skin into six types based on response to UV exposure:
| Fitzpatrick Type | Description | Typical Ethnicity |
|---|---|---|
| I | Always burns, never tans | Very fair, freckles |
| II | Usually burns, tans minimally | Fair |
| III | Sometimes burns, tans uniformly | Light brown |
| IV | Rarely burns, tans easily | Moderate brown |
| V | Very rarely burns, tans darkly | Dark brown |
| VI | Never burns, deeply pigmented | Very dark brown to black |
Clinical relevance: Melanoma presentation, diagnostic features, and differential diagnoses vary significantly across Fitzpatrick types.
The Data Problem
Major dermatology training datasets show severe imbalance:
HAM10000 (Human Against Machine with 10,000 training images): - 81% Fitzpatrick I-III (light skin) - 19% Fitzpatrick IV-VI (medium to dark skin) - Most widely used dataset for skin lesion classification
ISIC Archive (International Skin Imaging Collaboration): - 74% Fitzpatrick I-III - Only 9% Fitzpatrick V-VI - Used for annual melanoma detection challenges
Edinburgh Dermofit Library: - 93% light skin - Only 7% dark skin
Fitzpatrick17k (Groh et al., 2021): - First balanced dataset across skin tones - Created specifically to address bias in existing datasets - 16,577 images with expert-labeled Fitzpatrick types
Why this matters: Machine learning algorithms learn patterns from training data. If training data is predominantly light skin, algorithms learn to recognize pathology on light skin. Performance on dark skin suffers.
The Performance Gap: Quantifying the Bias
Daneshjou et al. (2022), Stanford University Study
Published in Science Advances, this comprehensive analysis examined multiple dermatology AI models across skin tones.
Study design: - Tested 4 major AI systems for skin cancer classification - Used clinically validated images across Fitzpatrick I-VI - Measured sensitivity and specificity by skin tone
Key findings:
| Fitzpatrick Type | Sensitivity for Malignancy | Specificity |
|---|---|---|
| I-II (very light) | 92% | 88% |
| III-IV (medium) | 87% | 85% |
| V-VI (dark) | 65% | 78% |
Clinical translation:
- For Fitzpatrick I-II: AI detects 92 of 100 melanomas
- For Fitzpatrick V-VI: AI detects only 65 of 100 melanomas
35 additional melanomas are missed in dark-skinned patients per 100 cases.
Additional findings: - False negative rate in dark skin: 35% vs. 8% in light skin - Lower specificity in dark skin leads to more unnecessary biopsies - Performance degradation was consistent across all 4 tested AI systems
Why Algorithms Fail on Dark Skin
Technical explanations:
- Contrast differences: Pigmented lesions have lower contrast against dark skin, making border detection difficult
- Color space limitations: RGB color models optimized for light skin photography
- Morphologic features: Dermoscopic patterns (pigment network, globules) appear different on dark skin
- Training data imbalance: Algorithms never learned dark skin pathology adequately
Clinical explanations:
- Different melanoma presentations: Acral lentiginous melanoma (more common in Black patients) underrepresented in training data
- Different benign lesion patterns: Dermatosis papulosa nigra, seborrheic keratosis patterns differ by skin tone
- Pigmentary variation: Post-inflammatory hyperpigmentation, normal pigmentation variance higher in dark skin
Health Equity Implications
Melanoma in Black patients: The existing disparity
- Diagnosis stage: 52% present with Stage III-IV disease (vs. 16% for white patients)
- 5-year survival: 68% for Black patients vs. 92% for white patients
- Median time to diagnosis: 1-2 years longer for Black patients
- Most common subtype: Acral lentiginous (palms, soles, under nails), often missed
Deploying biased AI would:
- Miss more melanomas in Black patients (already diagnosed later)
- Create false reassurance from consumer apps (“low risk” result for actual melanoma)
- Reduce access to dermatology (if AI “triage” excludes patients with missed lesions)
- Widen existing survival disparities
Using AI systems that perform worse on dark skin is not merely suboptimal, it is unethical. It takes an existing disparity (later melanoma diagnosis in Black patients) and makes it worse.
Before deploying any dermatology AI system, demand performance data stratified by Fitzpatrick type I-VI.
If a vendor cannot provide this data, do not deploy their system.
Part 2: Consumer Melanoma Apps: The Danger
Dozens of smartphone apps claim to assess skin cancer risk. Examples include:
- SkinVision
- MoleScope
- Skin Scanner
- MySkinPal
- UMSkinCheck
The marketing claims: - “Detect skin cancer early” - “AI-powered melanoma detection” - “Instant risk assessment”
The regulatory status: - Most are marketed as “wellness” or “educational” tools, not medical devices - This allows them to bypass FDA review - No requirement for clinical validation before app store release
Freeman et al. (2020) BMJ Study
Systematic evaluation of consumer skin cancer apps.
Study design: - Tested 4 popular apps - Used 188 clinical images of melanoma and benign lesions - Compared app recommendations to dermatologist diagnosis
Results:
| App | Sensitivity (Melanoma Detection) | Specificity |
|---|---|---|
| App 1 | 73% | 37% |
| App 2 | 30% | 94% |
| App 3 | 7% | 83% |
| App 4 | 70% | 65% |
Key findings: - Only 1 app detected >70% of melanomas - App 3 missed 93% of melanomas (7% sensitivity) - High false positive rates lead to unnecessary dermatology visits - No app provided skin tone-stratified performance data
The False Reassurance Problem
Clinical scenario:
- Patient notices changing mole
- Uses consumer app for “risk assessment”
- App returns “low risk” or “benign” classification
- Patient delays seeing dermatologist
- Melanoma progresses from Stage I to Stage II or III
- Prognosis worsens significantly
Real case reports:
- Multiple lawsuits filed against app developers for missed melanomas
- Published case reports of delayed melanoma diagnosis attributed to app reassurance
- No systematic data on how many melanomas are missed due to consumer apps (underreported)
The Liability Question
Who is liable when a consumer app misses a melanoma?
Current legal landscape:
- App developers: Generally shield themselves with disclaimers (“not medical advice,” “not a substitute for physician evaluation”)
- Patients: Difficult to prove causation (would they have seen dermatologist earlier without app?)
- Physicians: If patient mentions using app and physician doesn’t counsel against it, potential liability
Dermatologist responsibility:
- If patient discloses using skin cancer app, document counseling about limitations
- Advise patients NOT to use apps for self-diagnosis
- Perform complete skin exam regardless of app results
Professional Society Position
American Academy of Dermatology (AAD) Statement:
The AAD advises against reliance on smartphone apps for skin cancer detection, emphasizing that:
- Apps are not validated for clinical accuracy
- Apps cannot replace in-person dermatologic evaluation
- Delayed diagnosis is a significant risk
Recommendation to dermatologists:
- Actively counsel patients against using consumer apps for melanoma detection
- If patients insist on using apps, explain they should not delay evaluation based on “low risk” results
- Perform full skin examination regardless of app recommendations
Part 3: FDA-Cleared Dermatology AI Devices
DermaSensor (2024)
Technology: Elastic scattering spectroscopy (ESS)
Mechanism: - Handheld wireless device placed against skin lesion - Measures light scattering properties of tissue at cellular and subcellular levels - AI algorithm (trained on 20,000+ scans of approximately 4,000 lesions) classifies as “evaluate further” or “low risk”
FDA clearance: De Novo classification (Class II), January 17, 2024. FDA Breakthrough Device Designation granted 2021.
Intended use: - Aid in skin cancer evaluation for primary care physicians - Used as adjunct to visual examination - Not intended for dermatologists (assumed to have dermoscopy) - Designed to inform referral decisions to dermatology
Evidence:
DERM-SUCCESS pivotal trial (led by Mayo Clinic across 22 study centers, 1,000+ patients):
| Metric | Performance |
|---|---|
| Sensitivity | 96% across all skin cancers (224 cancers detected) |
| Sensitivity by type | Melanoma: 88%, BCC: 98%, SCC: 99% |
| Negative predictive value | 97% (negative result has 97% chance of being benign) |
| Clinical utility | Diagnostic sensitivity increased from 71% to 82% with device; referral sensitivity increased from 82% to 91% |
(FDA De Novo Summary, DEN230008) | (DermaSensor FDA Clearance Announcement)
Published clinical studies:
- Merry SP et al. Primary Care Physician Use of Elastic Scattering Spectroscopy on Skin Lesions Suggestive of Skin Cancer. J Prim Care Community Health. 2025;16:21501319251344423.
- Ferris LK, Seiverling EV et al. DERM-SUCCESS FDA Pivotal Study: A Multi-Reader Multi-Case Evaluation. J Prim Care Community Health. 2025;16:21501319251342106.
Limitations: - Specificity not publicly disclosed in detail (high false positive rate expected based on device design prioritizing sensitivity) - Not validated for use on dark skin (Fitzpatrick V-VI) - Requires contact with lesion (not image-based) - Cost per device: approximately $8,000-10,000 plus annual software license
Clinical role: - May help primary care physicians decide which lesions to refer to dermatology - Should not replace clinical judgment - Cannot be used to reassure patient that lesion is benign (too many false positives)
SciBase Nevisense (2017)
Technology: Electrical impedance spectroscopy (EIS)
Mechanism: - Measures electrical properties of skin tissue at multiple frequencies - Different tissues have different impedance patterns - Malignant tissue shows altered impedance due to cellular changes - Based on 20+ years of research from Karolinska Institute
FDA approval: Class III medical device (PMA P150046), June 29, 2017. This is the only FDA-approved device specifically for melanoma detection assistance.
Intended use: - Aid in melanoma detection for lesions with clinical or historical characteristics of melanoma - Used in conjunction with clinical and dermoscopic examination - Intended for dermatologists when considering biopsy - Should not be used on clinically obvious melanoma
Evidence:
Pivotal clinical trial (international, multicenter, prospective, blinded):
| Study | Sensitivity | Specificity | Population |
|---|---|---|---|
| Malvehy et al. (2014) | 96.6% | 34.4% | 2,416 lesions (265 melanomas) |
| Mohr et al. (2013) | 98.1% | 23.6% | Algorithm development study |
(Malvehy et al., 2014) | (FDA PMA Summary) | (SciBase)
Additional validation: - Negative predictive value: 98.2% - 100% sensitivity for non-melanoma skin cancers (BCC, SCC) - Over 300,000 patients tested globally - 60+ peer-reviewed publications
Limitations: - Low specificity (23-34%) results in many false positives - Not validated across skin tones (Fitzpatrick stratification not reported) - Requires physical contact with lesion - Device cost limits widespread adoption
Clinical utility: - High sensitivity useful for “ruling out” melanoma in equivocal cases - Should not be used to “rule in” melanoma (low PPV would result in many unnecessary biopsies) - Best used when dermatologist is uncertain whether to biopsy
VisualDx and DermExpert (FDA-Exempt Clinical Decision Support)
Important distinction: VisualDx is not an FDA-cleared medical device. It is classified as FDA-exempt clinical decision support software under FDA guidance for software that provides recommendations to healthcare providers by matching patient information to reference information.
Technology: Clinical decision support system with AI-assisted image analysis
Mechanism: - DermExpert (VisualDx’s dermatology AI feature) allows clinicians to photograph skin lesions - AI analyzes images against approximately 80 lesion types - Returns confidence scores and differential diagnoses for physician consideration - Database of 120,000+ medical images with expert-labeled diagnoses
Regulatory status: FDA-exempt (not cleared, not requiring clearance)
Per FDA guidance (2019), clinical decision support software that “provides recommendations to healthcare providers by matching patient-specific information to reference information the medical community routinely uses in clinical practice” is exempt from device regulation.
Intended use: - Clinical decision support for dermatologists and non-dermatologists - Differential diagnosis generation based on visual and clinical features - Educational resource for skin condition identification - EHR integration and telemedicine support
Evidence: - No FDA pivotal trial required (exempt status) - Limited peer-reviewed validation studies in public domain - Image diversity: 28.5% of images represent Fitzpatrick skin types IV-VI (Alvarado & Feng, JAAD, 2021)
Limitations: - No FDA review of diagnostic accuracy claims - No published sensitivity/specificity data for skin cancer detection - Not validated for autonomous diagnosis - Requires physician interpretation of AI suggestions
Clinical role: - May assist clinicians in generating differential diagnoses - Should not be used as standalone diagnostic tool - Useful for educational purposes and rare disease identification - Physicians remain responsible for diagnostic decisions
FDA-cleared devices (DermaSensor, Nevisense) have undergone FDA review with clinical trial data demonstrating safety and effectiveness.
FDA-exempt software (VisualDx) has not undergone this review. The FDA has determined that certain clinical decision support tools pose low enough risk to not require premarket review. This does NOT mean they are validated for clinical accuracy.
When evaluating AI tools for clinical use, verify regulatory status and demand evidence regardless of FDA classification.
Limitations of Current FDA-Cleared Devices
Common problems:
- Low specificity: Both devices have high sensitivity (good) but low specificity (many false positives)
- Lack of skin tone validation: Neither device provides Fitzpatrick-stratified performance data
- Cost barriers: Device cost limits access to well-resourced practices
- Not image-based: Cannot be used for telemedicine or remote evaluation
What FDA clearance does NOT guarantee:
- Clearance does not mean “accurate” or “ready for widespread use”
- Class II devices have lower evidence bar than Class III
- FDA does not require prospective clinical outcomes data (e.g., does device reduce melanoma mortality?)
- Performance in clinical practice may differ from pivotal trial conditions
Part 4: Dermoscopy-Based AI
The Dermoscopy Advantage
Dermoscopy provides:
- 10x magnification
- Polarized or immersion lighting to reduce surface reflection
- Visualization of subsurface structures (pigment network, vessels)
- Standardized image capture conditions
AI trained on dermoscopy images performs better than AI trained on clinical photographs.
Man Against Machine Studies
Esteva et al. (2017), Stanford/Nature Study
High-profile study claimed dermatologist-level classification.
Study design: - Deep learning algorithm trained on 129,450 images - Tested against 21 board-certified dermatologists - Binary classification tasks (malignant vs. benign)
Results: - Algorithm performance comparable to dermatologists - AUC 0.94-0.96 for various tasks
Limitations revealed by subsequent analysis:
- Dataset bias: Predominantly light skin (not reported in original paper)
- Task simplification: Real dermatology involves differential diagnosis, not just binary classification
- No clinical context: Algorithm didn’t know patient age, lesion history, family history
- Dermoscopy vs. clinical photos: Mixed image types in dataset
Follow-up validation studies showed performance degradation in real-world conditions.
Tschandl et al. (2020) Study
More rigorous “Human vs. Machine” evaluation.
Study design: - 11 AI algorithms tested - 511 dermatologists evaluated same images - Used HAM10000 dataset
Results:
| Evaluator | Sensitivity | Specificity |
|---|---|---|
| Best AI algorithm | 82% | 77% |
| Average dermatologist | 70% | 80% |
| Expert dermatologists (>10 years) | 78% | 82% |
Key insight: AI performed comparably to average dermatologists but not better than experts.
Critical limitations not addressed: - No skin tone stratification - No long-term clinical outcomes - Diagnostic accuracy ≠ patient outcomes
MoleMapper and Patient-Generated Dermoscopy
Consumer dermoscopy attachments:
- DermLite (smartphone attachment)
- Handyscope
- iDoc
Potential use case: - Patients take dermoscopy images at home - Send to dermatologist via telemedicine - AI pre-screens for high-risk lesions
Current status: - Limited validation data - Image quality highly variable (patient technique) - No clear evidence of clinical benefit over in-person examination
Research applications: - MoleMapper app (Oregon Health & Science University) - Patients track moles over time - Data used for melanoma epidemiology research - Not validated for clinical diagnosis
Part 5: Addressing Skin Tone Bias: Proposed Solutions
Dataset Diversification
Fitzpatrick17k (Groh et al., 2021):
Created to address imbalance in existing datasets.
Features: - 16,577 clinical images - Expert-labeled Fitzpatrick types (I-VI) - Balanced representation across skin tones - Publicly available for research
Impact: - New algorithms trained on Fitzpatrick17k show reduced performance gap - But still lag behind performance on light skin
Challenge: Even balanced datasets may not eliminate bias if pathology presentation differs by skin tone (not just data quantity issue).
Transfer Learning and Fine-Tuning
Approach: - Train algorithm on large light skin dataset - Fine-tune on smaller dark skin dataset - Test whether performance gap narrows
Results: - Modest improvement (5-10 percentage points) - Does not eliminate 27-point gap identified by Daneshjou et al. - Requires high-quality dark skin training data (still scarce)
Explainable AI (XAI) for Dermatology
Goal: Understand what features algorithms use to classify lesions
Techniques: - Saliency maps (which pixels most influenced classification?) - Attention mechanisms (where did algorithm “look”?) - Feature importance analysis
Findings: - Some algorithms learned spurious correlations (rulers in dermoscopy images, skin markers) - Pigmentation patterns learned differently for light vs. dark skin - Algorithms may rely on different features than dermatologists use
Clinical utility: - Helps identify when algorithm is making decisions for wrong reasons - Allows targeted dataset improvements - Builds trust (or appropriate distrust) in AI recommendations
Multi-Modal Approaches
Combining image data with clinical context:
- Patient age
- Lesion history (stable vs. changing)
- Family history of melanoma
- Anatomic location
- Patient-reported symptoms (bleeding, itching)
Hypothesis: Adding clinical context may reduce performance gap
Current status: Early research phase, no deployed systems
Part 6: International Perspectives and Guidelines
International Skin Imaging Collaboration (ISIC)
Mission: Create large, publicly available dermoscopy archive
ISIC Archive: - Over 40,000 dermoscopy images - Used for annual melanoma detection challenges - Free for research use
Acknowledged limitations: - 74% Fitzpatrick I-III (light skin bias) - ISIC 2024 challenge began including Fitzpatrick labels for bias mitigation
International Dermoscopy Society (IDS)
Position on AI:
- AI should augment, not replace, dermatologist expertise
- Clinical validation required before deployment
- Skin tone equity must be addressed
- Patient communication about AI use essential
European Academy of Dermatology and Venereology (EADV)
EADV Task Force on AI (2023):
Recommendations for dermatology AI deployment:
- Validation: External validation across institutions and skin tones required
- Transparency: Algorithms should disclose training data demographics
- Clinical integration: AI should fit into existing workflow, not create new burden
- Patient autonomy: Patients should be informed when AI is used in their care
- Liability: Clear responsibility when AI-assisted diagnoses are wrong
Part 7: Clinical Scenarios and Practical Guidance
Before Adopting Dermatology AI: Questions to Ask Vendors
Performance data:
- “What is sensitivity and specificity stratified by Fitzpatrick type I-VI?”
- If vendor cannot provide this, do not adopt
- “What datasets were used for training and validation?”
- Look for HAM10000, ISIC (biased) vs. Fitzpatrick17k (balanced)
- “What external validation has been performed?”
- Single-site validation is insufficient
Regulatory status:
- “What is FDA clearance status?”
- Class II De Novo? 510(k)? No clearance?
- “Is this marketed as a medical device or wellness tool?”
- Wellness tools bypass FDA review
Clinical integration:
- “How does this integrate into existing workflow?”
- Standalone dashboards often fail
- “What dermoscopy equipment is required?”
- Equipment costs may be prohibitive
- “What training is provided for clinical staff?”
Liability and cost:
- “What is total cost of ownership?” (device, software license, support)
- “What liability protection does vendor provide?”
- “Can we discontinue if performance is inadequate?”
Red Flags (Walk Away)
- No Fitzpatrick-stratified performance data
- Claims to “replace dermatologist”
- Validated only on academic datasets, not real-world clinical use
- No FDA clearance for diagnostic claims
- Vendor cites Esteva et al. (2017) as sole validation evidence (outdated)
- Cannot adjust sensitivity/specificity thresholds for clinical priorities
Patient Counseling Scripts
When patient asks about smartphone skin cancer apps:
“I understand these apps are convenient, but current apps are not accurate enough for skin cancer detection. Studies show some apps miss up to 93% of melanomas. A ‘low risk’ result from an app should not reassure you or delay seeing a dermatologist. If you’re concerned about a mole, the safest approach is an in-person skin examination.”
When patient has used app and received “low risk” result:
“Smartphone apps cannot replace a dermatologist’s evaluation. These apps have high error rates and are not validated across different skin tones. I will examine your skin thoroughly regardless of what the app said. Please don’t use app results to decide whether to see a doctor. Any changing mole, especially if it bleeds, itches, or has irregular borders, should be evaluated by a dermatologist.”
When discussing AI during dermoscopy examination:
“This dermoscopy device has an AI feature that helps identify lesions that might need biopsy. The AI is a tool that augments my judgment, it doesn’t make the final decision. I will review the AI recommendation along with your history, the appearance of the lesion, and my clinical experience to decide if biopsy is needed.”
Professional Society Guidelines
Core Principles:
Augmentation, not replacement: AI should support dermatologists’ clinical decision-making, not replace it. The term “augmented intelligence” is preferred over “artificial intelligence.”
Skin tone diversity in training data is essential: Algorithms must be trained and validated on diverse skin tones. Performance data should be stratified by Fitzpatrick type.
Local validation required: AI systems validated at one institution may not generalize to others. Prospective validation at point of deployment is necessary.
Patient communication: Patients should be informed when AI tools are used in their diagnosis and treatment.
Data privacy: Patient images used for AI training must have appropriate consent and de-identification.
Continuing education: Dermatologists should receive training on AI capabilities and limitations.
Specific Recommendations:
- Do not recommend consumer skin cancer detection apps to patients
- Demand Fitzpatrick-stratified performance data before adopting AI systems
- Maintain clinical override capability for all AI recommendations
- Monitor AI performance continuously after deployment
Source:
ISIC Archive Requirements:
- Image quality standards: Minimum resolution, lighting conditions, focus requirements
- Metadata requirements: Age, sex, anatomic location, diagnosis, Fitzpatrick type (recommended)
- Diagnostic gold standard: Histopathology confirmation for lesions classified as malignant
- Licensing: CC-BY-NC for research use
ISIC Melanoma Detection Challenge:
Annual competition for best-performing algorithms:
- Standardized evaluation metrics
- Hidden test set to prevent overfitting
- 2024 challenge includes Fitzpatrick type labels for bias assessment
ISIC 2024 Recommendations:
- Report algorithm performance by skin tone
- Include diverse skin tones in training data
- Evaluate algorithm generalization across institutions
- Consider clinical context beyond image analysis
Key Recommendations:
Clinical validation over theoretical performance: Real-world studies more important than retrospective dataset performance
Dermoscopy image quality matters: Consumer dermoscopy attachments produce variable image quality that may degrade AI performance
Differential diagnosis vs. binary classification: Dermatology requires distinguishing multiple conditions, not just “benign vs. malignant”
Longitudinal monitoring: AI should support lesion tracking over time, not just single-timepoint classification
Integration with electronic health records: AI recommendations should be documented in EHR for medicolegal purposes
IDS emphasizes: AI is a tool for dermatologists, not a replacement. Dermoscopy expertise remains essential.
Check Your Understanding
Clinical Scenario 1: Patient with “Low Risk” App Result
Case: A 52-year-old white woman presents to your dermatology clinic concerned about a mole on her upper back. She states, “I used a smartphone app called SkinVision, and it said this mole is low risk, but I’m still worried because my mother had melanoma.”
Question: How do you respond, and what do you do clinically?
Answer:
Immediate response:
“I’m glad you came in despite what the app said. Smartphone apps for skin cancer detection are not accurate enough to trust. Studies show these apps can miss up to 93% of melanomas. Your family history of melanoma is more important than any app result.”
Clinical approach:
- Obtain full history:
- How long has mole been present?
- Has it changed in size, shape, or color?
- Any bleeding, itching, or pain?
- Complete family history (first-degree relatives with melanoma)
- Personal history of melanoma or other skin cancers
- Perform complete skin examination:
- Use dermoscopy to evaluate concerning lesion
- Examine all skin surfaces (not just the lesion in question)
- Look for other atypical nevi
- Apply clinical judgment:
- Does lesion meet ABCDE criteria? (Asymmetry, Border irregularity, Color variation, Diameter >6mm, Evolution)
- Given family history, threshold for biopsy is lower
- Dermoscopy findings: pigment network, blue-white veil, irregular vessels?
- Decision:
- If concerning features present: perform biopsy regardless of app result
- If benign-appearing but patient has family history: consider biopsy or close monitoring (photography + 3-month follow-up)
- Do NOT reassure patient based on app result
- Documentation:
- Note that patient used smartphone app
- Document counseling about app limitations
- Record clinical reasoning for biopsy decision
Key teaching points:
- Consumer apps create false reassurance
- Family history trumps app result
- Complete skin exam necessary (not just lesion in question)
- Medicolegal risk: document counseling about apps
Clinical Scenario 2: Evaluating an AI-Assisted Dermoscopy Device
Case: Your dermatology practice is considering purchasing an AI-assisted dermoscopy device. The vendor presents data showing 95% sensitivity and 85% specificity for melanoma detection. The device costs $15,000 plus $3,000 annual software license.
Question: What questions should you ask before making a purchase decision?
Answer:
Critical questions to ask vendor:
- Fitzpatrick-stratified performance:
- “What is sensitivity and specificity for each Fitzpatrick type I-VI?”
- If vendor says “we don’t have that data,” do not purchase
- If vendor says “performance is similar across skin types,” ask for the actual numbers
- Training and validation datasets:
- “What datasets were used for training?” (Look for HAM10000, ISIC = biased)
- “How many images from Fitzpatrick V-VI skin were in training set?”
- “Was validation performed on a separate dataset from training?”
- External validation:
- “Has this device been tested at institutions other than your development site?”
- “What was real-world performance in clinical practice?”
- Ask for published peer-reviewed studies, not just vendor white papers
- FDA regulatory status:
- “What is FDA clearance status?” (510(k), De Novo, or none?)
- “What claims are FDA-cleared vs. marketing claims?”
- “What were sensitivity and specificity in FDA pivotal trial?”
- Clinical workflow integration:
- “How long does analysis take per lesion?” (>30 seconds per lesion = workflow disruption)
- “Does it integrate with our EHR?”
- “Can we adjust sensitivity thresholds for our patient population?”
- Failure modes:
- “What causes false negatives?” (amelanotic melanoma, acral melanoma?)
- “What causes false positives?” (seborrheic keratosis, benign nevi?)
- “How does device handle poor image quality or user error?”
- Liability and support:
- “What liability protection does vendor provide?”
- “What happens if device misses a melanoma? Does vendor share liability?”
- “Can we return device if performance is inadequate in our practice?”
- Cost-benefit analysis:
- Device cost: $15,000
- Annual license: $3,000
- Assume 5-year use: total cost $30,000
- How many melanomas must you detect to justify cost?
- Will this replace or supplement your dermoscopy expertise?
Red flags that should prompt you to decline purchase:
- Vendor cannot provide Fitzpatrick-stratified data
- Only validation study is Esteva et al. (2017) or similar outdated study
- Device marketed for “autonomous diagnosis” without physician oversight
- Cannot adjust sensitivity thresholds
- Vendor claims device will “replace dermoscopy expertise”
- No external validation outside vendor’s own study
Decision framework:
- Request trial period (30-60 days) to test in your practice
- Compare AI recommendations to your clinical judgment
- Track false positives and false negatives
- Survey staff about workflow integration
- Decide based on real-world performance, not vendor promises
Clinical Scenario 3: Dark Skin Melanoma Evaluation
Case: A 68-year-old Black man (Fitzpatrick type VI) presents with a dark lesion on his heel that has been present for 8 months. He delayed seeking care because “I didn’t think Black people could get skin cancer.” Dermoscopy shows irregular pigmentation and blue-white veil. Your AI-assisted dermoscopy device classifies the lesion as “benign, 15% risk.”
Question: How does skin tone bias in AI affect your clinical decision-making?
Answer:
Immediate recognition:
This presentation is concerning for acral lentiginous melanoma, which:
- Accounts for 70% of melanomas in Black patients
- Occurs on palms, soles, under nails
- Often diagnosed late (better prognosis if caught early)
- Is underrepresented in AI training datasets
Critical reasoning about AI recommendation:
- AI classification (“benign, 15% risk”) should be disregarded because:
- AI systems have 27-percentage-point sensitivity gap in Fitzpatrick V-VI skin
- Acral lentiginous melanoma is underrepresented in training data
- “15% risk” may actually be 40-50% risk when adjusted for skin tone bias
- Clinical features override AI:
- Blue-white veil on dermoscopy = high-risk feature
- Heel location = acral site (high suspicion)
- 8-month duration = prolonged lesion (concerning)
- Irregular pigmentation = atypical
- Patient education is essential:
- “Black patients absolutely can get skin cancer”
- “Melanoma in Black patients often occurs on palms, soles, or under nails”
- “Delayed diagnosis is common and leads to worse outcomes”
Clinical action:
Perform biopsy immediately, regardless of AI classification.
- Technique: punch biopsy or excisional biopsy (avoid shave biopsy for acral lesions)
- Send for histopathology with request to evaluate for melanoma
- Do NOT delay based on AI “benign” classification
Documentation:
“68-year-old Black man with 8-month history of pigmented lesion on heel. Dermoscopy shows blue-white veil and irregular pigmentation concerning for acral lentiginous melanoma. AI-assisted dermoscopy device classified lesion as low risk (15%); however, given known performance gap in AI systems for dark skin (Daneshjou et al., 2022: 27-percentage-point sensitivity gap in Fitzpatrick V-VI), clinical judgment takes precedence. Biopsy performed.”
Key teaching points:
- AI recommendations must be interpreted in context of known biases
- For dark-skinned patients, lower your threshold for biopsy due to AI underperformance
- Acral lentiginous melanoma is a critical diagnosis not to miss
- Patient education about melanoma risk in Black patients is essential
Follow-up:
If biopsy confirms melanoma:
- Stage appropriately (sentinel lymph node biopsy if indicated)
- Educate patient about surveillance
- Screen family members
- Report missed diagnosis to AI vendor (feedback for model improvement)
If biopsy shows benign lesion:
- False positive biopsy is acceptable given high suspicion
- Better to biopsy benign lesion than miss melanoma in high-risk population
- Document reasoning for medicolegal purposes
Clinical Scenario 4: Primary Care Physician Considering DermaSensor
Case: You’re a family medicine physician who performs many skin exams. You’re considering purchasing DermaSensor (FDA-cleared January 2024) to help decide which lesions to refer to dermatology. The device costs approximately $8,000-10,000 plus annual software license. It has 96% sensitivity for skin cancers (97% negative predictive value), but specificity data is limited in public disclosures.
Question: Is DermaSensor appropriate for your practice?
Answer:
Understanding the performance metrics:
| Metric | Value | Clinical Translation |
|---|---|---|
| Sensitivity | 96% overall (88% melanoma, 98% BCC, 99% SCC) | Detects 96 of 100 skin cancers |
| Negative predictive value | 97% | A “low risk” result has 97% chance of being benign |
| Clinical utility | Reduced missed cancers from 18% to 9% | Meaningful improvement in detection |
The referral volume question:
DermaSensor is designed to increase referral sensitivity. The device intentionally prioritizes not missing cancers over avoiding false positives. Before adoption, consider:
- How many additional referrals can your local dermatology network accommodate?
- What is your current referral-to-biopsy ratio?
- What is dermatology wait time in your area?
Cost-benefit analysis:
Assumptions: - Device cost: $8,000-10,000 plus annual license - You see 20 patients per day with skin concerns - Average 2 lesions evaluated per patient = 40 lesions/day - 200 working days/year = 8,000 lesions/year - Assume 1% skin cancer prevalence (typical for primary care)
Without DermaSensor: - 80 skin cancers per year (1% of 8,000) - You refer based on clinical judgment (assume you catch 82% based on published PCP rates) - Approximately 14 skin cancers missed
With DermaSensor: - Device detects approximately 77 of 80 skin cancers (96% sensitivity) - Only 3 cancers missed (benefit: 11 additional cancers detected) - Unknown increase in “evaluate further” recommendations for benign lesions
Key considerations:
Rural/underserved settings: DermaSensor may provide greatest value where dermatology access is limited and any improvement in detection is clinically meaningful.
High-volume suburban practices: Consider whether increased referrals will strain local dermatology capacity.
Selective use: Using device only for equivocal lesions (not screening all lesions) may optimize cost-effectiveness.
When DermaSensor might be useful:
- Rural practice with limited dermatology access (device helps prioritize urgent referrals)
- Pre-screened population (already selected for suspicious lesions)
- Used as “rule-out” tool for equivocal lesions
- Practice where current detection rate is below average
When DermaSensor may be less useful:
- If you already have dermoscopy training and high detection rates
- If local dermatology is already overwhelmed with referrals
- If using for low-suspicion screening (device designed for suspicious lesions)
Questions to ask yourself:
- “What is my current skin cancer detection rate?”
- If already high (>85%), device offers limited incremental benefit
- “What is dermatology wait time in my area?”
- If >3 months, additional referrals may not improve patient outcomes
- “Can I justify device cost for the number of additional cancers detected?”
- Calculate cost per additional cancer detected for your practice volume
Key teaching point:
DermaSensor was designed and validated for a specific use case: helping primary care physicians identify lesions warranting dermatology referral. It is not designed for autonomous diagnosis or for use by dermatologists. Understand the intended use before adoption.
Key Takeaways
Skin tone bias is the defining issue. AI performs 27 percentage points worse in Fitzpatrick V-VI skin compared to Fitzpatrick I-II. This is unacceptable.
Consumer apps are dangerous. Sensitivity ranges from 7-73%. False reassurance leads to delayed melanoma diagnosis. Actively counsel patients against using these apps.
Demand Fitzpatrick-stratified data. Before adopting any dermatology AI, require performance data broken down by skin type I-VI. If vendor can’t provide it, decline.
FDA clearance does not guarantee equity. Current FDA-cleared devices (DermaSensor, Nevisense) lack skin tone-stratified validation data.
Dermoscopy AI > smartphone AI. Algorithms trained on dermoscopy images perform better than those trained on clinical photographs, but bias persists.
Low specificity is a major problem. High sensitivity (good for not missing melanomas) often comes with low specificity (many false positives). This can overwhelm dermatology referral systems.
Acral lentiginous melanoma is underrepresented. This melanoma subtype (common in Black patients, occurs on palms/soles) is underrepresented in training datasets. AI will miss these.
External validation often fails. Algorithms that perform well in academic datasets perform worse in real-world clinical practice.
AI augments, never replaces, dermatologist judgment. Patient history, lesion evolution over time, and clinical context remain essential.
Health equity must be prioritized. Deploying biased AI in dermatology would worsen existing racial disparities in melanoma outcomes.
For primary care physicians:
- Do not recommend consumer skin cancer apps
- Consider dermoscopy training over AI devices
- Refer based on ABCDE criteria + dermoscopy
- DermaSensor (FDA-cleared 2024) may increase referral volume; understand your local dermatology capacity before adoption
For dermatologists:
- Advocate for skin tone equity in AI development
- Demand Fitzpatrick-stratified validation before adopting AI systems
- Maintain dermoscopy expertise (AI is a supplement, not replacement)
- Educate patients about limitations of consumer apps
- Lower your threshold for biopsy in dark-skinned patients when using AI tools (to compensate for bias)
For patients:
- Do not trust smartphone apps for skin cancer detection
- See a dermatologist for any concerning mole, regardless of app results
- Understand that AI is a tool, not a replacement for expert evaluation
- Advocate for yourself if you have dark skin and are concerned about a lesion (demand full evaluation, not just AI scan)
Further Reading
Essential articles on skin tone bias:
- Daneshjou, R. et al. (2022). Disparities in dermatology AI performance on a diverse, curated clinical image set. Science Advances. DOI: 10.1126/sciadv.abq6147
- Groh, M. et al. (2021). Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset. CVPR Workshop on ISIC Skin Image Analysis. DOI: 10.1109/CVPRW53098.2021.00201
- Adamson, A.S. & Smith, A. (2018). Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatology. DOI: 10.1001/jamadermatol.2018.2348
Consumer app studies:
- Freeman, K. et al. (2020). Smartphone apps for the detection of melanoma: systematic assessment of their diagnostic accuracy. BMJ. DOI: 10.1136/bmj.m127
Melanoma disparities:
- Bradford, P.T. et al. (2009). Acral Lentiginous Melanoma: Incidence and Survival Patterns in the United States, 1986-2005. Archives of Dermatology. DOI: 10.1001/archdermatol.2009.323
AI validation studies:
- Tschandl, P. et al. (2020). Human-computer collaboration for skin cancer recognition. Nature Medicine. DOI: 10.1038/s41591-020-0942-0
- Esteva, A. et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. DOI: 10.1038/nature21056
FDA-cleared devices:
- DermaSensor FDA De Novo Summary (DEN230008): FDA CDRH Database
- Merry SP et al. (2025). Primary Care Physician Use of Elastic Scattering Spectroscopy on Skin Lesions Suggestive of Skin Cancer. J Prim Care Community Health. DOI: 10.1177/21501319251344423
- Ferris LK, Seiverling EV et al. (2025). DERM-SUCCESS FDA Pivotal Study: A Multi-Reader Multi-Case Evaluation. J Prim Care Community Health. DOI: 10.1177/21501319251342106
- Nevisense FDA PMA Summary (P150046): FDA PMA Database
- Malvehy J et al. (2014). Clinical performance of the Nevisense system in cutaneous melanoma detection. Br J Dermatol. DOI: 10.1111/bjd.13121
Professional society guidelines:
- Kovarik C, Lee I, Ko J, et al. Commentary: Position statement on augmented intelligence (AuI). JAAD. 2019;81(4):998-1000. DOI: 10.1016/j.jaad.2019.06.032
- International Skin Imaging Collaboration Archive: isic-archive.com
For deeper dives:
- See Chapter 12 (Radiology) for comparison of imaging AI across specialties
- See Chapter 19 (Clinical AI Safety) for failure mode analysis
- See Chapter 21 (Medical Liability) for medicolegal considerations
- See Chapter 22 (Algorithmic Bias and Health Equity) for broader equity framework