Dermatology

Dermatology’s visual nature made it an early AI target. But the algorithmic bias crisis threatens to worsen existing health disparities. Melanoma detection algorithms trained predominantly on light skin show 27-percentage-point sensitivity gaps between Fitzpatrick I-II and V-VI skin. Consumer melanoma apps achieve sensitivities as low as 7%. Black patients with melanoma already face 1-2 years longer time to diagnosis and 52% advanced-stage presentation rates. Deploying biased AI would deepen these inequities. This chapter examines what works, what fails, and why skin tone matters.

Learning Objectives

After reading this chapter, you will be able to:

  • Evaluate AI systems for skin cancer detection and dermatologic diagnosis
  • Understand performance variations across different skin tones (Fitzpatrick scale I-VI)
  • Critically assess direct-to-consumer melanoma detection apps and their dangers
  • Navigate FDA regulatory landscape for dermatology AI devices
  • Recognize dataset bias and its clinical consequences
  • Apply evidence-based frameworks for evaluating dermatology AI before clinical adoption
  • Counsel patients on appropriate and inappropriate use of skin lesion apps

The Clinical Context: Dermatology is an image-intensive specialty where diagnosis depends heavily on visual pattern recognition. This makes it theoretically ideal for computer vision AI, but significant equity challenges exist.

Critical Issue: Skin Tone Bias

Most dermatology AI training data is predominantly light skin (Fitzpatrick I-III):

Dataset Light Skin Representation
HAM10000 81% Fitzpatrick I-III
ISIC Archive 74% Fitzpatrick I-III
Fitzpatrick17k (2021) First balanced dataset

Performance Gap: Daneshjou et al. (2022) Study

(Daneshjou et al., 2022)

Fitzpatrick Type Sensitivity Specificity
I-II (very light) 92% 88%
III-IV (medium) 87% 85%
V-VI (dark) 65% 78%

27-percentage-point sensitivity gap between lightest and darkest skin. This would worsen existing health disparities.

What Actually Works:

  1. FDA-cleared devices for clinical use:
    • DermaSensor (2024): Elastic scattering spectroscopy for skin cancer evaluation (96% sensitivity, first AI device cleared for primary care)
    • SciBase Nevisense (2017): Electrical impedance spectroscopy for melanoma assessment (only FDA-approved device for melanoma detection)
    • Both require physician oversight, not autonomous diagnosis
  2. Dermoscopy-assisted AI: Algorithms trained on high-quality dermoscopy images perform better than smartphone apps, but still show skin tone bias

What Doesn’t Work:

  1. Consumer melanoma detection apps: Sensitivity ranges from 7-73% (Freeman et al., 2020)
  2. AI trained only on light skin: 10-30% accuracy degradation in darker skin tones
  3. Autonomous diagnosis without clinical context: Algorithms see pixels, not patient history, family history, or lesion change over time

Health Equity Context:

Black patients with melanoma already experience:

  • Later diagnosis (average 1-2 years delay)
  • More advanced stage at presentation (Stage III-IV at diagnosis: 52% vs. 16% for white patients)
  • Worse 5-year survival (68% vs. 92% in white patients)

Deploying biased AI would worsen existing health disparities.

The Bottom Line: Exercise profound skepticism about dermatology AI. Demand performance data stratified by skin tone (Fitzpatrick I-VI). Consumer apps are potentially dangerous. Until equity is achieved, traditional dermoscopy by trained dermatologists remains the standard. Do not recommend consumer skin cancer apps to patients.


Introduction: The Promise and Peril

Dermatology presents a unique opportunity for computer vision AI. Unlike radiology (where AI analyzes standardized DICOM images) or pathology (where AI analyzes stained tissue slides), dermatology involves visual assessment of external skin surfaces. This accessibility has led to:

The promise: - Non-invasive image capture (smartphone, dermoscopy, clinical photography) - Large potential training datasets (millions of skin lesion images) - Clear diagnostic tasks (benign vs. malignant, specific diagnoses) - Potential to expand access (teledermatology, underserved areas)

The peril: - Training data heavily biased toward light skin - Direct-to-consumer apps marketed without adequate validation - False reassurance leading to delayed diagnosis of melanoma - Worsening of existing racial health disparities

This chapter focuses on what evidence reveals about dermatology AI performance, with particular emphasis on the skin tone bias crisis that threatens to make AI a tool that widens rather than narrows health inequity.


Part 1: The Skin Tone Bias Crisis

The Fitzpatrick Scale

The Fitzpatrick skin type classification system categorizes skin into six types based on response to UV exposure:

Fitzpatrick Type Description Typical Ethnicity
I Always burns, never tans Very fair, freckles
II Usually burns, tans minimally Fair
III Sometimes burns, tans uniformly Light brown
IV Rarely burns, tans easily Moderate brown
V Very rarely burns, tans darkly Dark brown
VI Never burns, deeply pigmented Very dark brown to black

Clinical relevance: Melanoma presentation, diagnostic features, and differential diagnoses vary significantly across Fitzpatrick types.

The Data Problem

Major dermatology training datasets show severe imbalance:

HAM10000 (Human Against Machine with 10,000 training images): - 81% Fitzpatrick I-III (light skin) - 19% Fitzpatrick IV-VI (medium to dark skin) - Most widely used dataset for skin lesion classification

ISIC Archive (International Skin Imaging Collaboration): - 74% Fitzpatrick I-III - Only 9% Fitzpatrick V-VI - Used for annual melanoma detection challenges

Edinburgh Dermofit Library: - 93% light skin - Only 7% dark skin

Fitzpatrick17k (Groh et al., 2021): - First balanced dataset across skin tones - Created specifically to address bias in existing datasets - 16,577 images with expert-labeled Fitzpatrick types

(Groh et al., 2021)

Why this matters: Machine learning algorithms learn patterns from training data. If training data is predominantly light skin, algorithms learn to recognize pathology on light skin. Performance on dark skin suffers.

The Performance Gap: Quantifying the Bias

Daneshjou et al. (2022), Stanford University Study

Published in Science Advances, this comprehensive analysis examined multiple dermatology AI models across skin tones.

(Daneshjou et al., 2022)

Study design: - Tested 4 major AI systems for skin cancer classification - Used clinically validated images across Fitzpatrick I-VI - Measured sensitivity and specificity by skin tone

Key findings:

Fitzpatrick Type Sensitivity for Malignancy Specificity
I-II (very light) 92% 88%
III-IV (medium) 87% 85%
V-VI (dark) 65% 78%

Clinical translation:

  • For Fitzpatrick I-II: AI detects 92 of 100 melanomas
  • For Fitzpatrick V-VI: AI detects only 65 of 100 melanomas

35 additional melanomas are missed in dark-skinned patients per 100 cases.

Additional findings: - False negative rate in dark skin: 35% vs. 8% in light skin - Lower specificity in dark skin leads to more unnecessary biopsies - Performance degradation was consistent across all 4 tested AI systems

Why Algorithms Fail on Dark Skin

Technical explanations:

  1. Contrast differences: Pigmented lesions have lower contrast against dark skin, making border detection difficult
  2. Color space limitations: RGB color models optimized for light skin photography
  3. Morphologic features: Dermoscopic patterns (pigment network, globules) appear different on dark skin
  4. Training data imbalance: Algorithms never learned dark skin pathology adequately

Clinical explanations:

  1. Different melanoma presentations: Acral lentiginous melanoma (more common in Black patients) underrepresented in training data
  2. Different benign lesion patterns: Dermatosis papulosa nigra, seborrheic keratosis patterns differ by skin tone
  3. Pigmentary variation: Post-inflammatory hyperpigmentation, normal pigmentation variance higher in dark skin

Health Equity Implications

Melanoma in Black patients: The existing disparity

(Bradford et al., 2009)

  • Diagnosis stage: 52% present with Stage III-IV disease (vs. 16% for white patients)
  • 5-year survival: 68% for Black patients vs. 92% for white patients
  • Median time to diagnosis: 1-2 years longer for Black patients
  • Most common subtype: Acral lentiginous (palms, soles, under nails), often missed

Deploying biased AI would:

  1. Miss more melanomas in Black patients (already diagnosed later)
  2. Create false reassurance from consumer apps (“low risk” result for actual melanoma)
  3. Reduce access to dermatology (if AI “triage” excludes patients with missed lesions)
  4. Widen existing survival disparities
The Ethical Imperative

Using AI systems that perform worse on dark skin is not merely suboptimal, it is unethical. It takes an existing disparity (later melanoma diagnosis in Black patients) and makes it worse.

Before deploying any dermatology AI system, demand performance data stratified by Fitzpatrick type I-VI.

If a vendor cannot provide this data, do not deploy their system.


Part 2: Consumer Melanoma Apps: The Danger

Dozens of smartphone apps claim to assess skin cancer risk. Examples include:

  • SkinVision
  • MoleScope
  • Skin Scanner
  • MySkinPal
  • UMSkinCheck

The marketing claims: - “Detect skin cancer early” - “AI-powered melanoma detection” - “Instant risk assessment”

The regulatory status: - Most are marketed as “wellness” or “educational” tools, not medical devices - This allows them to bypass FDA review - No requirement for clinical validation before app store release

Freeman et al. (2020) BMJ Study

Systematic evaluation of consumer skin cancer apps.

(Freeman et al., 2020)

Study design: - Tested 4 popular apps - Used 188 clinical images of melanoma and benign lesions - Compared app recommendations to dermatologist diagnosis

Results:

App Sensitivity (Melanoma Detection) Specificity
App 1 73% 37%
App 2 30% 94%
App 3 7% 83%
App 4 70% 65%

Key findings: - Only 1 app detected >70% of melanomas - App 3 missed 93% of melanomas (7% sensitivity) - High false positive rates lead to unnecessary dermatology visits - No app provided skin tone-stratified performance data

The False Reassurance Problem

Clinical scenario:

  1. Patient notices changing mole
  2. Uses consumer app for “risk assessment”
  3. App returns “low risk” or “benign” classification
  4. Patient delays seeing dermatologist
  5. Melanoma progresses from Stage I to Stage II or III
  6. Prognosis worsens significantly

Real case reports:

  • Multiple lawsuits filed against app developers for missed melanomas
  • Published case reports of delayed melanoma diagnosis attributed to app reassurance
  • No systematic data on how many melanomas are missed due to consumer apps (underreported)

The Liability Question

Who is liable when a consumer app misses a melanoma?

Current legal landscape:

  1. App developers: Generally shield themselves with disclaimers (“not medical advice,” “not a substitute for physician evaluation”)
  2. Patients: Difficult to prove causation (would they have seen dermatologist earlier without app?)
  3. Physicians: If patient mentions using app and physician doesn’t counsel against it, potential liability

Dermatologist responsibility:

  • If patient discloses using skin cancer app, document counseling about limitations
  • Advise patients NOT to use apps for self-diagnosis
  • Perform complete skin exam regardless of app results

Professional Society Position

American Academy of Dermatology (AAD) Statement:

The AAD advises against reliance on smartphone apps for skin cancer detection, emphasizing that:

  • Apps are not validated for clinical accuracy
  • Apps cannot replace in-person dermatologic evaluation
  • Delayed diagnosis is a significant risk

Recommendation to dermatologists:

  • Actively counsel patients against using consumer apps for melanoma detection
  • If patients insist on using apps, explain they should not delay evaluation based on “low risk” results
  • Perform full skin examination regardless of app recommendations

Part 3: FDA-Cleared Dermatology AI Devices

DermaSensor (2024)

Technology: Elastic scattering spectroscopy (ESS)

Mechanism: - Handheld wireless device placed against skin lesion - Measures light scattering properties of tissue at cellular and subcellular levels - AI algorithm (trained on 20,000+ scans of approximately 4,000 lesions) classifies as “evaluate further” or “low risk”

FDA clearance: De Novo classification (Class II), January 17, 2024. FDA Breakthrough Device Designation granted 2021.

Intended use: - Aid in skin cancer evaluation for primary care physicians - Used as adjunct to visual examination - Not intended for dermatologists (assumed to have dermoscopy) - Designed to inform referral decisions to dermatology

Evidence:

DERM-SUCCESS pivotal trial (led by Mayo Clinic across 22 study centers, 1,000+ patients):

Metric Performance
Sensitivity 96% across all skin cancers (224 cancers detected)
Sensitivity by type Melanoma: 88%, BCC: 98%, SCC: 99%
Negative predictive value 97% (negative result has 97% chance of being benign)
Clinical utility Diagnostic sensitivity increased from 71% to 82% with device; referral sensitivity increased from 82% to 91%

(FDA De Novo Summary, DEN230008) | (DermaSensor FDA Clearance Announcement)

Published clinical studies:

Limitations: - Specificity not publicly disclosed in detail (high false positive rate expected based on device design prioritizing sensitivity) - Not validated for use on dark skin (Fitzpatrick V-VI) - Requires contact with lesion (not image-based) - Cost per device: approximately $8,000-10,000 plus annual software license

Clinical role: - May help primary care physicians decide which lesions to refer to dermatology - Should not replace clinical judgment - Cannot be used to reassure patient that lesion is benign (too many false positives)

SciBase Nevisense (2017)

Technology: Electrical impedance spectroscopy (EIS)

Mechanism: - Measures electrical properties of skin tissue at multiple frequencies - Different tissues have different impedance patterns - Malignant tissue shows altered impedance due to cellular changes - Based on 20+ years of research from Karolinska Institute

FDA approval: Class III medical device (PMA P150046), June 29, 2017. This is the only FDA-approved device specifically for melanoma detection assistance.

Intended use: - Aid in melanoma detection for lesions with clinical or historical characteristics of melanoma - Used in conjunction with clinical and dermoscopic examination - Intended for dermatologists when considering biopsy - Should not be used on clinically obvious melanoma

Evidence:

Pivotal clinical trial (international, multicenter, prospective, blinded):

Study Sensitivity Specificity Population
Malvehy et al. (2014) 96.6% 34.4% 2,416 lesions (265 melanomas)
Mohr et al. (2013) 98.1% 23.6% Algorithm development study

(Malvehy et al., 2014) | (FDA PMA Summary) | (SciBase)

Additional validation: - Negative predictive value: 98.2% - 100% sensitivity for non-melanoma skin cancers (BCC, SCC) - Over 300,000 patients tested globally - 60+ peer-reviewed publications

Limitations: - Low specificity (23-34%) results in many false positives - Not validated across skin tones (Fitzpatrick stratification not reported) - Requires physical contact with lesion - Device cost limits widespread adoption

Clinical utility: - High sensitivity useful for “ruling out” melanoma in equivocal cases - Should not be used to “rule in” melanoma (low PPV would result in many unnecessary biopsies) - Best used when dermatologist is uncertain whether to biopsy

VisualDx and DermExpert (FDA-Exempt Clinical Decision Support)

Important distinction: VisualDx is not an FDA-cleared medical device. It is classified as FDA-exempt clinical decision support software under FDA guidance for software that provides recommendations to healthcare providers by matching patient information to reference information.

Technology: Clinical decision support system with AI-assisted image analysis

Mechanism: - DermExpert (VisualDx’s dermatology AI feature) allows clinicians to photograph skin lesions - AI analyzes images against approximately 80 lesion types - Returns confidence scores and differential diagnoses for physician consideration - Database of 120,000+ medical images with expert-labeled diagnoses

Regulatory status: FDA-exempt (not cleared, not requiring clearance)

Per FDA guidance (2019), clinical decision support software that “provides recommendations to healthcare providers by matching patient-specific information to reference information the medical community routinely uses in clinical practice” is exempt from device regulation.

Intended use: - Clinical decision support for dermatologists and non-dermatologists - Differential diagnosis generation based on visual and clinical features - Educational resource for skin condition identification - EHR integration and telemedicine support

Evidence: - No FDA pivotal trial required (exempt status) - Limited peer-reviewed validation studies in public domain - Image diversity: 28.5% of images represent Fitzpatrick skin types IV-VI (Alvarado & Feng, JAAD, 2021)

Limitations: - No FDA review of diagnostic accuracy claims - No published sensitivity/specificity data for skin cancer detection - Not validated for autonomous diagnosis - Requires physician interpretation of AI suggestions

Clinical role: - May assist clinicians in generating differential diagnoses - Should not be used as standalone diagnostic tool - Useful for educational purposes and rare disease identification - Physicians remain responsible for diagnostic decisions

FDA-Exempt vs. FDA-Cleared

FDA-cleared devices (DermaSensor, Nevisense) have undergone FDA review with clinical trial data demonstrating safety and effectiveness.

FDA-exempt software (VisualDx) has not undergone this review. The FDA has determined that certain clinical decision support tools pose low enough risk to not require premarket review. This does NOT mean they are validated for clinical accuracy.

When evaluating AI tools for clinical use, verify regulatory status and demand evidence regardless of FDA classification.

Limitations of Current FDA-Cleared Devices

Common problems:

  1. Low specificity: Both devices have high sensitivity (good) but low specificity (many false positives)
  2. Lack of skin tone validation: Neither device provides Fitzpatrick-stratified performance data
  3. Cost barriers: Device cost limits access to well-resourced practices
  4. Not image-based: Cannot be used for telemedicine or remote evaluation

What FDA clearance does NOT guarantee:

  • Clearance does not mean “accurate” or “ready for widespread use”
  • Class II devices have lower evidence bar than Class III
  • FDA does not require prospective clinical outcomes data (e.g., does device reduce melanoma mortality?)
  • Performance in clinical practice may differ from pivotal trial conditions

Part 4: Dermoscopy-Based AI

The Dermoscopy Advantage

Dermoscopy provides:

  • 10x magnification
  • Polarized or immersion lighting to reduce surface reflection
  • Visualization of subsurface structures (pigment network, vessels)
  • Standardized image capture conditions

AI trained on dermoscopy images performs better than AI trained on clinical photographs.

Man Against Machine Studies

Esteva et al. (2017), Stanford/Nature Study

High-profile study claimed dermatologist-level classification.

(Esteva et al., 2017)

Study design: - Deep learning algorithm trained on 129,450 images - Tested against 21 board-certified dermatologists - Binary classification tasks (malignant vs. benign)

Results: - Algorithm performance comparable to dermatologists - AUC 0.94-0.96 for various tasks

Limitations revealed by subsequent analysis:

  1. Dataset bias: Predominantly light skin (not reported in original paper)
  2. Task simplification: Real dermatology involves differential diagnosis, not just binary classification
  3. No clinical context: Algorithm didn’t know patient age, lesion history, family history
  4. Dermoscopy vs. clinical photos: Mixed image types in dataset

Follow-up validation studies showed performance degradation in real-world conditions.

Tschandl et al. (2020) Study

More rigorous “Human vs. Machine” evaluation.

(Tschandl et al., 2020)

Study design: - 11 AI algorithms tested - 511 dermatologists evaluated same images - Used HAM10000 dataset

Results:

Evaluator Sensitivity Specificity
Best AI algorithm 82% 77%
Average dermatologist 70% 80%
Expert dermatologists (>10 years) 78% 82%

Key insight: AI performed comparably to average dermatologists but not better than experts.

Critical limitations not addressed: - No skin tone stratification - No long-term clinical outcomes - Diagnostic accuracy ≠ patient outcomes

MoleMapper and Patient-Generated Dermoscopy

Consumer dermoscopy attachments:

  • DermLite (smartphone attachment)
  • Handyscope
  • iDoc

Potential use case: - Patients take dermoscopy images at home - Send to dermatologist via telemedicine - AI pre-screens for high-risk lesions

Current status: - Limited validation data - Image quality highly variable (patient technique) - No clear evidence of clinical benefit over in-person examination

Research applications: - MoleMapper app (Oregon Health & Science University) - Patients track moles over time - Data used for melanoma epidemiology research - Not validated for clinical diagnosis


Part 5: Addressing Skin Tone Bias: Proposed Solutions

Dataset Diversification

Fitzpatrick17k (Groh et al., 2021):

Created to address imbalance in existing datasets.

(Groh et al., 2021)

Features: - 16,577 clinical images - Expert-labeled Fitzpatrick types (I-VI) - Balanced representation across skin tones - Publicly available for research

Impact: - New algorithms trained on Fitzpatrick17k show reduced performance gap - But still lag behind performance on light skin

Challenge: Even balanced datasets may not eliminate bias if pathology presentation differs by skin tone (not just data quantity issue).

Transfer Learning and Fine-Tuning

Approach: - Train algorithm on large light skin dataset - Fine-tune on smaller dark skin dataset - Test whether performance gap narrows

Results: - Modest improvement (5-10 percentage points) - Does not eliminate 27-point gap identified by Daneshjou et al. - Requires high-quality dark skin training data (still scarce)

Explainable AI (XAI) for Dermatology

Goal: Understand what features algorithms use to classify lesions

Techniques: - Saliency maps (which pixels most influenced classification?) - Attention mechanisms (where did algorithm “look”?) - Feature importance analysis

Findings: - Some algorithms learned spurious correlations (rulers in dermoscopy images, skin markers) - Pigmentation patterns learned differently for light vs. dark skin - Algorithms may rely on different features than dermatologists use

Clinical utility: - Helps identify when algorithm is making decisions for wrong reasons - Allows targeted dataset improvements - Builds trust (or appropriate distrust) in AI recommendations

Multi-Modal Approaches

Combining image data with clinical context:

  • Patient age
  • Lesion history (stable vs. changing)
  • Family history of melanoma
  • Anatomic location
  • Patient-reported symptoms (bleeding, itching)

Hypothesis: Adding clinical context may reduce performance gap

Current status: Early research phase, no deployed systems


Part 6: International Perspectives and Guidelines

International Skin Imaging Collaboration (ISIC)

Mission: Create large, publicly available dermoscopy archive

ISIC Archive: - Over 40,000 dermoscopy images - Used for annual melanoma detection challenges - Free for research use

Acknowledged limitations: - 74% Fitzpatrick I-III (light skin bias) - ISIC 2024 challenge began including Fitzpatrick labels for bias mitigation

International Dermoscopy Society (IDS)

Position on AI:

  • AI should augment, not replace, dermatologist expertise
  • Clinical validation required before deployment
  • Skin tone equity must be addressed
  • Patient communication about AI use essential

European Academy of Dermatology and Venereology (EADV)

EADV Task Force on AI (2023):

Recommendations for dermatology AI deployment:

  1. Validation: External validation across institutions and skin tones required
  2. Transparency: Algorithms should disclose training data demographics
  3. Clinical integration: AI should fit into existing workflow, not create new burden
  4. Patient autonomy: Patients should be informed when AI is used in their care
  5. Liability: Clear responsibility when AI-assisted diagnoses are wrong

Part 7: Clinical Scenarios and Practical Guidance

Before Adopting Dermatology AI: Questions to Ask Vendors

Vendor Evaluation Checklist

Performance data:

  1. “What is sensitivity and specificity stratified by Fitzpatrick type I-VI?”
    • If vendor cannot provide this, do not adopt
  2. “What datasets were used for training and validation?”
    • Look for HAM10000, ISIC (biased) vs. Fitzpatrick17k (balanced)
  3. “What external validation has been performed?”
    • Single-site validation is insufficient

Regulatory status:

  1. “What is FDA clearance status?”
    • Class II De Novo? 510(k)? No clearance?
  2. “Is this marketed as a medical device or wellness tool?”
    • Wellness tools bypass FDA review

Clinical integration:

  1. “How does this integrate into existing workflow?”
    • Standalone dashboards often fail
  2. “What dermoscopy equipment is required?”
    • Equipment costs may be prohibitive
  3. “What training is provided for clinical staff?”

Liability and cost:

  1. “What is total cost of ownership?” (device, software license, support)
  2. “What liability protection does vendor provide?”
  3. “Can we discontinue if performance is inadequate?”

Red Flags (Walk Away)

  • No Fitzpatrick-stratified performance data
  • Claims to “replace dermatologist”
  • Validated only on academic datasets, not real-world clinical use
  • No FDA clearance for diagnostic claims
  • Vendor cites Esteva et al. (2017) as sole validation evidence (outdated)
  • Cannot adjust sensitivity/specificity thresholds for clinical priorities

Patient Counseling Scripts

When patient asks about smartphone skin cancer apps:

“I understand these apps are convenient, but current apps are not accurate enough for skin cancer detection. Studies show some apps miss up to 93% of melanomas. A ‘low risk’ result from an app should not reassure you or delay seeing a dermatologist. If you’re concerned about a mole, the safest approach is an in-person skin examination.”

When patient has used app and received “low risk” result:

“Smartphone apps cannot replace a dermatologist’s evaluation. These apps have high error rates and are not validated across different skin tones. I will examine your skin thoroughly regardless of what the app said. Please don’t use app results to decide whether to see a doctor. Any changing mole, especially if it bleeds, itches, or has irregular borders, should be evaluated by a dermatologist.”

When discussing AI during dermoscopy examination:

“This dermoscopy device has an AI feature that helps identify lesions that might need biopsy. The AI is a tool that augments my judgment, it doesn’t make the final decision. I will review the AI recommendation along with your history, the appearance of the lesion, and my clinical experience to decide if biopsy is needed.”


Professional Society Guidelines

American Academy of Dermatology (AAD) Position Statement on Augmented Intelligence (2019, updated 2022)

Core Principles:

  1. Augmentation, not replacement: AI should support dermatologists’ clinical decision-making, not replace it. The term “augmented intelligence” is preferred over “artificial intelligence.”

  2. Skin tone diversity in training data is essential: Algorithms must be trained and validated on diverse skin tones. Performance data should be stratified by Fitzpatrick type.

  3. Local validation required: AI systems validated at one institution may not generalize to others. Prospective validation at point of deployment is necessary.

  4. Patient communication: Patients should be informed when AI tools are used in their diagnosis and treatment.

  5. Data privacy: Patient images used for AI training must have appropriate consent and de-identification.

  6. Continuing education: Dermatologists should receive training on AI capabilities and limitations.

Specific Recommendations:

  • Do not recommend consumer skin cancer detection apps to patients
  • Demand Fitzpatrick-stratified performance data before adopting AI systems
  • Maintain clinical override capability for all AI recommendations
  • Monitor AI performance continuously after deployment

Source:

Kovarik C, Lee I, Ko J, et al. Commentary: Position statement on augmented intelligence (AuI). JAAD. 2019;81(4):998-1000.

International Skin Imaging Collaboration (ISIC) Standards

ISIC Archive Requirements:

  1. Image quality standards: Minimum resolution, lighting conditions, focus requirements
  2. Metadata requirements: Age, sex, anatomic location, diagnosis, Fitzpatrick type (recommended)
  3. Diagnostic gold standard: Histopathology confirmation for lesions classified as malignant
  4. Licensing: CC-BY-NC for research use

ISIC Melanoma Detection Challenge:

Annual competition for best-performing algorithms:

  • Standardized evaluation metrics
  • Hidden test set to prevent overfitting
  • 2024 challenge includes Fitzpatrick type labels for bias assessment

ISIC 2024 Recommendations:

  • Report algorithm performance by skin tone
  • Include diverse skin tones in training data
  • Evaluate algorithm generalization across institutions
  • Consider clinical context beyond image analysis
International Dermoscopy Society (IDS) AI Guidance

Key Recommendations:

  1. Clinical validation over theoretical performance: Real-world studies more important than retrospective dataset performance

  2. Dermoscopy image quality matters: Consumer dermoscopy attachments produce variable image quality that may degrade AI performance

  3. Differential diagnosis vs. binary classification: Dermatology requires distinguishing multiple conditions, not just “benign vs. malignant”

  4. Longitudinal monitoring: AI should support lesion tracking over time, not just single-timepoint classification

  5. Integration with electronic health records: AI recommendations should be documented in EHR for medicolegal purposes

IDS emphasizes: AI is a tool for dermatologists, not a replacement. Dermoscopy expertise remains essential.


Check Your Understanding

Clinical Scenario 1: Patient with “Low Risk” App Result

Case: A 52-year-old white woman presents to your dermatology clinic concerned about a mole on her upper back. She states, “I used a smartphone app called SkinVision, and it said this mole is low risk, but I’m still worried because my mother had melanoma.”

Question: How do you respond, and what do you do clinically?

Answer:

Immediate response:

“I’m glad you came in despite what the app said. Smartphone apps for skin cancer detection are not accurate enough to trust. Studies show these apps can miss up to 93% of melanomas. Your family history of melanoma is more important than any app result.”

Clinical approach:

  1. Obtain full history:
    • How long has mole been present?
    • Has it changed in size, shape, or color?
    • Any bleeding, itching, or pain?
    • Complete family history (first-degree relatives with melanoma)
    • Personal history of melanoma or other skin cancers
  2. Perform complete skin examination:
    • Use dermoscopy to evaluate concerning lesion
    • Examine all skin surfaces (not just the lesion in question)
    • Look for other atypical nevi
  3. Apply clinical judgment:
    • Does lesion meet ABCDE criteria? (Asymmetry, Border irregularity, Color variation, Diameter >6mm, Evolution)
    • Given family history, threshold for biopsy is lower
    • Dermoscopy findings: pigment network, blue-white veil, irregular vessels?
  4. Decision:
    • If concerning features present: perform biopsy regardless of app result
    • If benign-appearing but patient has family history: consider biopsy or close monitoring (photography + 3-month follow-up)
    • Do NOT reassure patient based on app result
  5. Documentation:
    • Note that patient used smartphone app
    • Document counseling about app limitations
    • Record clinical reasoning for biopsy decision

Key teaching points:

  • Consumer apps create false reassurance
  • Family history trumps app result
  • Complete skin exam necessary (not just lesion in question)
  • Medicolegal risk: document counseling about apps
Clinical Scenario 2: Evaluating an AI-Assisted Dermoscopy Device

Case: Your dermatology practice is considering purchasing an AI-assisted dermoscopy device. The vendor presents data showing 95% sensitivity and 85% specificity for melanoma detection. The device costs $15,000 plus $3,000 annual software license.

Question: What questions should you ask before making a purchase decision?

Answer:

Critical questions to ask vendor:

  1. Fitzpatrick-stratified performance:
    • “What is sensitivity and specificity for each Fitzpatrick type I-VI?”
    • If vendor says “we don’t have that data,” do not purchase
    • If vendor says “performance is similar across skin types,” ask for the actual numbers
  2. Training and validation datasets:
    • “What datasets were used for training?” (Look for HAM10000, ISIC = biased)
    • “How many images from Fitzpatrick V-VI skin were in training set?”
    • “Was validation performed on a separate dataset from training?”
  3. External validation:
    • “Has this device been tested at institutions other than your development site?”
    • “What was real-world performance in clinical practice?”
    • Ask for published peer-reviewed studies, not just vendor white papers
  4. FDA regulatory status:
    • “What is FDA clearance status?” (510(k), De Novo, or none?)
    • “What claims are FDA-cleared vs. marketing claims?”
    • “What were sensitivity and specificity in FDA pivotal trial?”
  5. Clinical workflow integration:
    • “How long does analysis take per lesion?” (>30 seconds per lesion = workflow disruption)
    • “Does it integrate with our EHR?”
    • “Can we adjust sensitivity thresholds for our patient population?”
  6. Failure modes:
    • “What causes false negatives?” (amelanotic melanoma, acral melanoma?)
    • “What causes false positives?” (seborrheic keratosis, benign nevi?)
    • “How does device handle poor image quality or user error?”
  7. Liability and support:
    • “What liability protection does vendor provide?”
    • “What happens if device misses a melanoma? Does vendor share liability?”
    • “Can we return device if performance is inadequate in our practice?”
  8. Cost-benefit analysis:
    • Device cost: $15,000
    • Annual license: $3,000
    • Assume 5-year use: total cost $30,000
    • How many melanomas must you detect to justify cost?
    • Will this replace or supplement your dermoscopy expertise?

Red flags that should prompt you to decline purchase:

  • Vendor cannot provide Fitzpatrick-stratified data
  • Only validation study is Esteva et al. (2017) or similar outdated study
  • Device marketed for “autonomous diagnosis” without physician oversight
  • Cannot adjust sensitivity thresholds
  • Vendor claims device will “replace dermoscopy expertise”
  • No external validation outside vendor’s own study

Decision framework:

  • Request trial period (30-60 days) to test in your practice
  • Compare AI recommendations to your clinical judgment
  • Track false positives and false negatives
  • Survey staff about workflow integration
  • Decide based on real-world performance, not vendor promises
Clinical Scenario 3: Dark Skin Melanoma Evaluation

Case: A 68-year-old Black man (Fitzpatrick type VI) presents with a dark lesion on his heel that has been present for 8 months. He delayed seeking care because “I didn’t think Black people could get skin cancer.” Dermoscopy shows irregular pigmentation and blue-white veil. Your AI-assisted dermoscopy device classifies the lesion as “benign, 15% risk.”

Question: How does skin tone bias in AI affect your clinical decision-making?

Answer:

Immediate recognition:

This presentation is concerning for acral lentiginous melanoma, which:

  • Accounts for 70% of melanomas in Black patients
  • Occurs on palms, soles, under nails
  • Often diagnosed late (better prognosis if caught early)
  • Is underrepresented in AI training datasets

Critical reasoning about AI recommendation:

  1. AI classification (“benign, 15% risk”) should be disregarded because:
    • AI systems have 27-percentage-point sensitivity gap in Fitzpatrick V-VI skin
    • Acral lentiginous melanoma is underrepresented in training data
    • “15% risk” may actually be 40-50% risk when adjusted for skin tone bias
  2. Clinical features override AI:
    • Blue-white veil on dermoscopy = high-risk feature
    • Heel location = acral site (high suspicion)
    • 8-month duration = prolonged lesion (concerning)
    • Irregular pigmentation = atypical
  3. Patient education is essential:
    • “Black patients absolutely can get skin cancer”
    • “Melanoma in Black patients often occurs on palms, soles, or under nails”
    • “Delayed diagnosis is common and leads to worse outcomes”

Clinical action:

Perform biopsy immediately, regardless of AI classification.

  • Technique: punch biopsy or excisional biopsy (avoid shave biopsy for acral lesions)
  • Send for histopathology with request to evaluate for melanoma
  • Do NOT delay based on AI “benign” classification

Documentation:

“68-year-old Black man with 8-month history of pigmented lesion on heel. Dermoscopy shows blue-white veil and irregular pigmentation concerning for acral lentiginous melanoma. AI-assisted dermoscopy device classified lesion as low risk (15%); however, given known performance gap in AI systems for dark skin (Daneshjou et al., 2022: 27-percentage-point sensitivity gap in Fitzpatrick V-VI), clinical judgment takes precedence. Biopsy performed.”

Key teaching points:

  • AI recommendations must be interpreted in context of known biases
  • For dark-skinned patients, lower your threshold for biopsy due to AI underperformance
  • Acral lentiginous melanoma is a critical diagnosis not to miss
  • Patient education about melanoma risk in Black patients is essential

Follow-up:

If biopsy confirms melanoma:

  • Stage appropriately (sentinel lymph node biopsy if indicated)
  • Educate patient about surveillance
  • Screen family members
  • Report missed diagnosis to AI vendor (feedback for model improvement)

If biopsy shows benign lesion:

  • False positive biopsy is acceptable given high suspicion
  • Better to biopsy benign lesion than miss melanoma in high-risk population
  • Document reasoning for medicolegal purposes
Clinical Scenario 4: Primary Care Physician Considering DermaSensor

Case: You’re a family medicine physician who performs many skin exams. You’re considering purchasing DermaSensor (FDA-cleared January 2024) to help decide which lesions to refer to dermatology. The device costs approximately $8,000-10,000 plus annual software license. It has 96% sensitivity for skin cancers (97% negative predictive value), but specificity data is limited in public disclosures.

Question: Is DermaSensor appropriate for your practice?

Answer:

Understanding the performance metrics:

Metric Value Clinical Translation
Sensitivity 96% overall (88% melanoma, 98% BCC, 99% SCC) Detects 96 of 100 skin cancers
Negative predictive value 97% A “low risk” result has 97% chance of being benign
Clinical utility Reduced missed cancers from 18% to 9% Meaningful improvement in detection

The referral volume question:

DermaSensor is designed to increase referral sensitivity. The device intentionally prioritizes not missing cancers over avoiding false positives. Before adoption, consider:

  • How many additional referrals can your local dermatology network accommodate?
  • What is your current referral-to-biopsy ratio?
  • What is dermatology wait time in your area?

Cost-benefit analysis:

Assumptions: - Device cost: $8,000-10,000 plus annual license - You see 20 patients per day with skin concerns - Average 2 lesions evaluated per patient = 40 lesions/day - 200 working days/year = 8,000 lesions/year - Assume 1% skin cancer prevalence (typical for primary care)

Without DermaSensor: - 80 skin cancers per year (1% of 8,000) - You refer based on clinical judgment (assume you catch 82% based on published PCP rates) - Approximately 14 skin cancers missed

With DermaSensor: - Device detects approximately 77 of 80 skin cancers (96% sensitivity) - Only 3 cancers missed (benefit: 11 additional cancers detected) - Unknown increase in “evaluate further” recommendations for benign lesions

Key considerations:

  1. Rural/underserved settings: DermaSensor may provide greatest value where dermatology access is limited and any improvement in detection is clinically meaningful.

  2. High-volume suburban practices: Consider whether increased referrals will strain local dermatology capacity.

  3. Selective use: Using device only for equivocal lesions (not screening all lesions) may optimize cost-effectiveness.

When DermaSensor might be useful:

  • Rural practice with limited dermatology access (device helps prioritize urgent referrals)
  • Pre-screened population (already selected for suspicious lesions)
  • Used as “rule-out” tool for equivocal lesions
  • Practice where current detection rate is below average

When DermaSensor may be less useful:

  • If you already have dermoscopy training and high detection rates
  • If local dermatology is already overwhelmed with referrals
  • If using for low-suspicion screening (device designed for suspicious lesions)

Questions to ask yourself:

  1. “What is my current skin cancer detection rate?”
    • If already high (>85%), device offers limited incremental benefit
  2. “What is dermatology wait time in my area?”
    • If >3 months, additional referrals may not improve patient outcomes
  3. “Can I justify device cost for the number of additional cancers detected?”
    • Calculate cost per additional cancer detected for your practice volume

Key teaching point:

DermaSensor was designed and validated for a specific use case: helping primary care physicians identify lesions warranting dermatology referral. It is not designed for autonomous diagnosis or for use by dermatologists. Understand the intended use before adoption.


Key Takeaways

Clinical Bottom Line for Dermatology AI
  1. Skin tone bias is the defining issue. AI performs 27 percentage points worse in Fitzpatrick V-VI skin compared to Fitzpatrick I-II. This is unacceptable.

  2. Consumer apps are dangerous. Sensitivity ranges from 7-73%. False reassurance leads to delayed melanoma diagnosis. Actively counsel patients against using these apps.

  3. Demand Fitzpatrick-stratified data. Before adopting any dermatology AI, require performance data broken down by skin type I-VI. If vendor can’t provide it, decline.

  4. FDA clearance does not guarantee equity. Current FDA-cleared devices (DermaSensor, Nevisense) lack skin tone-stratified validation data.

  5. Dermoscopy AI > smartphone AI. Algorithms trained on dermoscopy images perform better than those trained on clinical photographs, but bias persists.

  6. Low specificity is a major problem. High sensitivity (good for not missing melanomas) often comes with low specificity (many false positives). This can overwhelm dermatology referral systems.

  7. Acral lentiginous melanoma is underrepresented. This melanoma subtype (common in Black patients, occurs on palms/soles) is underrepresented in training datasets. AI will miss these.

  8. External validation often fails. Algorithms that perform well in academic datasets perform worse in real-world clinical practice.

  9. AI augments, never replaces, dermatologist judgment. Patient history, lesion evolution over time, and clinical context remain essential.

  10. Health equity must be prioritized. Deploying biased AI in dermatology would worsen existing racial disparities in melanoma outcomes.

For primary care physicians:

  • Do not recommend consumer skin cancer apps
  • Consider dermoscopy training over AI devices
  • Refer based on ABCDE criteria + dermoscopy
  • DermaSensor (FDA-cleared 2024) may increase referral volume; understand your local dermatology capacity before adoption

For dermatologists:

  • Advocate for skin tone equity in AI development
  • Demand Fitzpatrick-stratified validation before adopting AI systems
  • Maintain dermoscopy expertise (AI is a supplement, not replacement)
  • Educate patients about limitations of consumer apps
  • Lower your threshold for biopsy in dark-skinned patients when using AI tools (to compensate for bias)

For patients:

  • Do not trust smartphone apps for skin cancer detection
  • See a dermatologist for any concerning mole, regardless of app results
  • Understand that AI is a tool, not a replacement for expert evaluation
  • Advocate for yourself if you have dark skin and are concerned about a lesion (demand full evaluation, not just AI scan)

Further Reading

Essential articles on skin tone bias:

  • Daneshjou, R. et al. (2022). Disparities in dermatology AI performance on a diverse, curated clinical image set. Science Advances. DOI: 10.1126/sciadv.abq6147
  • Groh, M. et al. (2021). Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset. CVPR Workshop on ISIC Skin Image Analysis. DOI: 10.1109/CVPRW53098.2021.00201
  • Adamson, A.S. & Smith, A. (2018). Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatology. DOI: 10.1001/jamadermatol.2018.2348

Consumer app studies:

  • Freeman, K. et al. (2020). Smartphone apps for the detection of melanoma: systematic assessment of their diagnostic accuracy. BMJ. DOI: 10.1136/bmj.m127

Melanoma disparities:

  • Bradford, P.T. et al. (2009). Acral Lentiginous Melanoma: Incidence and Survival Patterns in the United States, 1986-2005. Archives of Dermatology. DOI: 10.1001/archdermatol.2009.323

AI validation studies:

  • Tschandl, P. et al. (2020). Human-computer collaboration for skin cancer recognition. Nature Medicine. DOI: 10.1038/s41591-020-0942-0
  • Esteva, A. et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. DOI: 10.1038/nature21056

FDA-cleared devices:

  • DermaSensor FDA De Novo Summary (DEN230008): FDA CDRH Database
  • Merry SP et al. (2025). Primary Care Physician Use of Elastic Scattering Spectroscopy on Skin Lesions Suggestive of Skin Cancer. J Prim Care Community Health. DOI: 10.1177/21501319251344423
  • Ferris LK, Seiverling EV et al. (2025). DERM-SUCCESS FDA Pivotal Study: A Multi-Reader Multi-Case Evaluation. J Prim Care Community Health. DOI: 10.1177/21501319251342106
  • Nevisense FDA PMA Summary (P150046): FDA PMA Database
  • Malvehy J et al. (2014). Clinical performance of the Nevisense system in cutaneous melanoma detection. Br J Dermatol. DOI: 10.1111/bjd.13121

Professional society guidelines:

For deeper dives:

  • See Chapter 12 (Radiology) for comparison of imaging AI across specialties
  • See Chapter 19 (Clinical AI Safety) for failure mode analysis
  • See Chapter 21 (Medical Liability) for medicolegal considerations
  • See Chapter 22 (Algorithmic Bias and Health Equity) for broader equity framework

References