[Dermatology]{.chapter-title}

doi:10.5281/zenodo.18251405

Dermatology

Dermatology’s visual nature made it an early AI target. But the algorithmic bias crisis threatens to worsen existing health disparities. Melanoma detection algorithms trained predominantly on light skin show 27-percentage-point sensitivity gaps between Fitzpatrick I-II and V-VI skin. Consumer melanoma apps achieve sensitivities as low as 7%. Black patients with melanoma already face 1-2 years longer time to diagnosis and 52% advanced-stage presentation rates. Deploying biased AI would deepen these inequities.

Learning Objectives

After reading this chapter, you will be able to:

Evaluate AI systems for skin cancer detection and dermatologic diagnosis
Understand performance variations across different skin tones (Fitzpatrick scale I-VI)
Critically assess direct-to-consumer melanoma detection apps and their dangers
Navigate FDA regulatory landscape for dermatology AI devices
Recognize dataset bias and its clinical consequences
Apply evidence-based frameworks for evaluating dermatology AI before clinical adoption
Counsel patients on appropriate and inappropriate use of skin lesion apps

Chapter Summary (TL;DR)

The Clinical Context: Dermatology is an image-intensive specialty where diagnosis depends heavily on visual pattern recognition. This makes it theoretically ideal for computer vision AI, but significant equity challenges exist.

Critical Issue: Skin Tone Bias

Most dermatology AI training data is predominantly light skin (Fitzpatrick I-III):

Dataset	Light Skin Representation
HAM10000	81% Fitzpatrick I-III
ISIC Archive	74% Fitzpatrick I-III
Fitzpatrick17k (2021)	First balanced dataset

Performance Gap: Daneshjou et al. (2022) Study

(Daneshjou et al., 2022)

Fitzpatrick Type	Sensitivity	Specificity
I-II (very light)	92%	88%
III-IV (medium)	87%	85%
V-VI (dark)	65%	78%

27-percentage-point sensitivity gap between lightest and darkest skin. This would worsen existing health disparities.

What Actually Works:

FDA-cleared devices for clinical use:
- DermaSensor (2024): Elastic scattering spectroscopy for skin cancer evaluation (96% sensitivity, first AI device cleared for primary care)
- SciBase Nevisense (2017): Electrical impedance spectroscopy for melanoma assessment (only FDA-approved device for melanoma detection)
- Both require physician oversight, not autonomous diagnosis
Dermoscopy-assisted AI: Algorithms trained on high-quality dermoscopy images perform better than smartphone apps, but still show skin tone bias

What Doesn’t Work:

Consumer melanoma detection apps: Sensitivity ranges from 7-73% (Freeman et al., 2020)
AI trained only on light skin: 10-30% accuracy degradation in darker skin tones
Autonomous diagnosis without clinical context: Algorithms see pixels, not patient history, family history, or lesion change over time

Health Equity Context:

Black patients with melanoma already experience:

Later diagnosis (average 1-2 years delay)
More advanced stage at presentation (Stage III-IV at diagnosis: 52% vs. 16% for white patients)
Worse 5-year survival (68% vs. 92% in white patients)

Deploying biased AI would worsen existing health disparities.

The Bottom Line: Exercise profound skepticism about dermatology AI. Demand performance data stratified by skin tone (Fitzpatrick I-VI). Consumer apps are potentially dangerous. Until equity is achieved, traditional dermoscopy by trained dermatologists remains the standard. Do not recommend consumer skin cancer apps to patients.

Introduction: The Promise and Peril

Dermatology presents a unique opportunity for computer vision AI. Unlike radiology (where AI analyzes standardized DICOM images) or pathology (where AI analyzes stained tissue slides), dermatology involves visual assessment of external skin surfaces. This accessibility has led to:

The promise: - Non-invasive image capture (smartphone, dermoscopy, clinical photography) - Large potential training datasets (millions of skin lesion images) - Clear diagnostic tasks (benign vs. malignant, specific diagnoses) - Potential to expand access (teledermatology, underserved areas)

The peril: - Training data heavily biased toward light skin - Direct-to-consumer apps marketed without adequate validation - False reassurance leading to delayed diagnosis of melanoma - Worsening of existing racial health disparities

Part 1: The Skin Tone Bias Crisis

The Fitzpatrick Scale

The Fitzpatrick skin type classification system categorizes skin into six types based on response to UV exposure:

Fitzpatrick Type	Description	Typical Ethnicity
I	Always burns, never tans	Very fair, freckles
II	Usually burns, tans minimally	Fair
III	Sometimes burns, tans uniformly	Light brown
IV	Rarely burns, tans easily	Moderate brown
V	Very rarely burns, tans darkly	Dark brown
VI	Never burns, deeply pigmented	Very dark brown to black

Clinical relevance: Melanoma presentation, diagnostic features, and differential diagnoses vary significantly across Fitzpatrick types.

The Data Problem

Major dermatology training datasets show severe imbalance:

HAM10000 (Human Against Machine with 10,000 training images): - 81% Fitzpatrick I-III (light skin) - 19% Fitzpatrick IV-VI (medium to dark skin) - Most widely used dataset for skin lesion classification

ISIC Archive (International Skin Imaging Collaboration): - 74% Fitzpatrick I-III - Only 9% Fitzpatrick V-VI - Used for annual melanoma detection challenges

Edinburgh Dermofit Library: - 93% light skin - Only 7% dark skin

Fitzpatrick17k (Groh et al., 2021): - First balanced dataset across skin tones - Created specifically to address bias in existing datasets - 16,577 images with expert-labeled Fitzpatrick types

(Groh et al., 2021)

Why this matters: Machine learning algorithms learn patterns from training data. If training data is predominantly light skin, algorithms learn to recognize pathology on light skin. Performance on dark skin suffers.

The Performance Gap: Quantifying the Bias

Daneshjou et al. (2022), Stanford University Study

Published in Science Advances, this comprehensive analysis examined multiple dermatology AI models across skin tones.

(Daneshjou et al., 2022)

Study design: - Tested 4 major AI systems for skin cancer classification - Used clinically validated images across Fitzpatrick I-VI - Measured sensitivity and specificity by skin tone

Key findings:

Fitzpatrick Type	Sensitivity for Malignancy	Specificity
I-II (very light)	92%	88%
III-IV (medium)	87%	85%
V-VI (dark)	65%	78%

Clinical translation:

For Fitzpatrick I-II: AI detects 92 of 100 melanomas
For Fitzpatrick V-VI: AI detects only 65 of 100 melanomas

35 additional melanomas are missed in dark-skinned patients per 100 cases.

Additional findings: - False negative rate in dark skin: 35% vs. 8% in light skin - Lower specificity in dark skin leads to more unnecessary biopsies - Performance degradation was consistent across all 4 tested AI systems

Why Algorithms Fail on Dark Skin

Technical explanations:

Contrast differences: Pigmented lesions have lower contrast against dark skin, making border detection difficult
Color space limitations: RGB color models optimized for light skin photography
Morphologic features: Dermoscopic patterns (pigment network, globules) appear different on dark skin
Training data imbalance: Algorithms never learned dark skin pathology adequately

Clinical explanations:

Different melanoma presentations: Acral lentiginous melanoma (more common in Black patients) underrepresented in training data
Different benign lesion patterns: Dermatosis papulosa nigra, seborrheic keratosis patterns differ by skin tone
Pigmentary variation: Post-inflammatory hyperpigmentation, normal pigmentation variance higher in dark skin

Health Equity Implications

Melanoma in Black patients: The existing disparity

(Bradford et al., 2009)

Diagnosis stage: 52% present with Stage III-IV disease (vs. 16% for white patients)
5-year survival: 68% for Black patients vs. 92% for white patients
Median time to diagnosis: 1-2 years longer for Black patients
Most common subtype: Acral lentiginous (palms, soles, under nails), often missed

Deploying biased AI would:

Miss more melanomas in Black patients (already diagnosed later)
Create false reassurance from consumer apps (“low risk” result for actual melanoma)
Reduce access to dermatology (if AI “triage” excludes patients with missed lesions)
Widen existing survival disparities

The Ethical Imperative

Using AI systems that perform worse on dark skin is not merely suboptimal, it is unethical. It takes an existing disparity (later melanoma diagnosis in Black patients) and makes it worse.

Before deploying any dermatology AI system, demand performance data stratified by Fitzpatrick type I-VI.

If a vendor cannot provide this data, do not deploy their system.

Part 2: Consumer Melanoma Apps: The Danger

Dozens of smartphone apps claim to assess skin cancer risk. Examples include:

SkinVision
MoleScope
Skin Scanner
MySkinPal
UMSkinCheck

The marketing claims: - “Detect skin cancer early” - “AI-powered melanoma detection” - “Instant risk assessment”

The regulatory status: - Most are marketed as “wellness” or “educational” tools, not medical devices - This allows them to bypass FDA review - No requirement for clinical validation before app store release

Freeman et al. (2020) BMJ Study

Systematic evaluation of consumer skin cancer apps.

(Freeman et al., 2020)

Study design: - Tested 4 popular apps - Used 188 clinical images of melanoma and benign lesions - Compared app recommendations to dermatologist diagnosis

Results:

App	Sensitivity (Melanoma Detection)	Specificity
App 1	73%	37%
App 2	30%	94%
App 3	7%	83%
App 4	70%	65%

Key findings: - Only 1 app detected >70% of melanomas - App 3 missed 93% of melanomas (7% sensitivity) - High false positive rates lead to unnecessary dermatology visits - No app provided skin tone-stratified performance data

The False Reassurance Problem

Clinical scenario:

Patient notices changing mole
Uses consumer app for “risk assessment”
App returns “low risk” or “benign” classification
Patient delays seeing dermatologist
Melanoma progresses from Stage I to Stage II or III
Prognosis worsens significantly

Real case reports:

Multiple lawsuits filed against app developers for missed melanomas
Published case reports of delayed melanoma diagnosis attributed to app reassurance
No systematic data on how many melanomas are missed due to consumer apps (underreported)

The Liability Question

Liability for Missed Melanoma Diagnosis

Current legal landscape:

App developers: Generally shield themselves with disclaimers (“not medical advice,” “not a substitute for physician evaluation”)
Patients: Difficult to prove causation (would they have seen dermatologist earlier without app?)
Physicians: If patient mentions using app and physician doesn’t counsel against it, potential liability

Dermatologist responsibility:

If patient discloses using skin cancer app, document counseling about limitations
Advise patients NOT to use apps for self-diagnosis
Perform complete skin exam regardless of app results

Professional Society Position

American Academy of Dermatology (AAD) Statement:

The AAD advises against reliance on smartphone apps for skin cancer detection, emphasizing that:

Apps are not validated for clinical accuracy
Apps cannot replace in-person dermatologic evaluation
Delayed diagnosis is a significant risk

Recommendation to dermatologists:

Actively counsel patients against using consumer apps for melanoma detection
If patients insist on using apps, explain they should not delay evaluation based on “low risk” results
Perform full skin examination regardless of app recommendations

Part 3: FDA-Cleared Dermatology AI Devices

DermaSensor (2024)

Technology: Elastic scattering spectroscopy (ESS)

Mechanism: - Handheld wireless device placed against skin lesion - Measures light scattering properties of tissue at cellular and subcellular levels - AI algorithm (trained on 20,000+ scans of approximately 4,000 lesions) classifies as “evaluate further” or “low risk”

FDA clearance: De Novo classification (Class II), January 17, 2024. FDA Breakthrough Device Designation granted 2021.

Intended use: - Aid in skin cancer evaluation for primary care physicians - Used as adjunct to visual examination - Not intended for dermatologists (assumed to have dermoscopy) - Designed to inform referral decisions to dermatology

Evidence:

DERM-SUCCESS pivotal trial (led by Mayo Clinic across 22 study centers, 1,000+ patients):

Metric	Performance
Sensitivity	96% across all skin cancers (224 cancers detected)
Sensitivity by type	Melanoma: 88%, BCC: 98%, SCC: 99%
Negative predictive value	97% (negative result has 97% chance of being benign)
Clinical utility	Diagnostic sensitivity increased from 71% to 82% with device; referral sensitivity increased from 82% to 91%

(FDA De Novo Summary, DEN230008) | (DermaSensor FDA Clearance Announcement)

Published clinical studies:

Limitations: - Specificity not publicly disclosed in detail (high false positive rate expected based on device design prioritizing sensitivity) - Not validated for use on dark skin (Fitzpatrick V-VI) - Requires contact with lesion (not image-based) - Cost per device: approximately $8,000-10,000 plus annual software license

Clinical role: - May help primary care physicians decide which lesions to refer to dermatology - Should not replace clinical judgment - Cannot be used to reassure patient that lesion is benign (too many false positives)

SciBase Nevisense (2017)

Technology: Electrical impedance spectroscopy (EIS)

Mechanism: - Measures electrical properties of skin tissue at multiple frequencies - Different tissues have different impedance patterns - Malignant tissue shows altered impedance due to cellular changes - Based on 20+ years of research from Karolinska Institute

FDA approval: Class III medical device (PMA P150046), June 29, 2017. This is the only FDA-approved device specifically for melanoma detection assistance.

Intended use: - Aid in melanoma detection for lesions with clinical or historical characteristics of melanoma - Used in conjunction with clinical and dermoscopic examination - Intended for dermatologists when considering biopsy - Should not be used on clinically obvious melanoma

Evidence:

Pivotal clinical trial (international, multicenter, prospective, blinded):

Study	Sensitivity	Specificity	Population
Malvehy et al. (2014)	96.6%	34.4%	2,416 lesions (265 melanomas)
Mohr et al. (2013)	98.1%	23.6%	Algorithm development study

(Malvehy et al., 2014) | (FDA PMA Summary) | (SciBase)

Additional validation: - Negative predictive value: 98.2% - 100% sensitivity for non-melanoma skin cancers (BCC, SCC) - Over 300,000 patients tested globally - 60+ peer-reviewed publications

Limitations: - Low specificity (23-34%) results in many false positives - Not validated across skin tones (Fitzpatrick stratification not reported) - Requires physical contact with lesion - Device cost limits widespread adoption

Clinical utility: - High sensitivity useful for “ruling out” melanoma in equivocal cases - Should not be used to “rule in” melanoma (low PPV would result in many unnecessary biopsies) - Best used when dermatologist is uncertain whether to biopsy

VisualDx and DermExpert (FDA-Exempt Clinical Decision Support)

Important distinction: VisualDx is not an FDA-cleared medical device. It is classified as FDA-exempt clinical decision support software under FDA guidance for software that provides recommendations to healthcare providers by matching patient information to reference information.

Technology: Clinical decision support system with AI-assisted image analysis

Mechanism: - DermExpert (VisualDx’s dermatology AI feature) allows clinicians to photograph skin lesions - AI analyzes images against approximately 80 lesion types - Returns confidence scores and differential diagnoses for physician consideration - Database of 120,000+ medical images with expert-labeled diagnoses

Regulatory status: FDA-exempt (not cleared, not requiring clearance)

Per FDA guidance (2019), clinical decision support software that “provides recommendations to healthcare providers by matching patient-specific information to reference information the medical community routinely uses in clinical practice” is exempt from device regulation.

Intended use: - Clinical decision support for dermatologists and non-dermatologists - Differential diagnosis generation based on visual and clinical features - Educational resource for skin condition identification - EHR integration and telemedicine support

Evidence: - No FDA pivotal trial required (exempt status) - Limited peer-reviewed validation studies in public domain - Image diversity: 28.5% of images represent Fitzpatrick skin types IV-VI (Alvarado & Feng, JAAD, 2021)

Limitations: - No FDA review of diagnostic accuracy claims - No published sensitivity/specificity data for skin cancer detection - Not validated for autonomous diagnosis - Requires physician interpretation of AI suggestions

Clinical role: - May assist clinicians in generating differential diagnoses - Should not be used as standalone diagnostic tool - Useful for educational purposes and rare disease identification - Physicians remain responsible for diagnostic decisions

FDA-Exempt vs. FDA-Cleared

FDA-cleared devices (DermaSensor, Nevisense) have undergone FDA review with clinical trial data demonstrating safety and effectiveness.

FDA-exempt software (VisualDx) has not undergone this review. The FDA has determined that certain clinical decision support tools pose low enough risk to not require premarket review. This does NOT mean they are validated for clinical accuracy.

When evaluating AI tools for clinical use, verify regulatory status and demand evidence regardless of FDA classification.

Limitations of Current FDA-Cleared Devices

Common problems:

Low specificity: Both devices have high sensitivity (good) but low specificity (many false positives)
Lack of skin tone validation: Neither device provides Fitzpatrick-stratified performance data
Cost barriers: Device cost limits access to well-resourced practices
Not image-based: Cannot be used for telemedicine or remote evaluation

What FDA clearance does NOT guarantee:

Clearance does not mean “accurate” or “ready for widespread use”
Class II devices have lower evidence bar than Class III
FDA does not require prospective clinical outcomes data (e.g., does device reduce melanoma mortality?)
Performance in clinical practice may differ from pivotal trial conditions

Part 4: Dermoscopy-Based AI

The Dermoscopy Advantage

Dermoscopy provides:

10x magnification
Polarized or immersion lighting to reduce surface reflection
Visualization of subsurface structures (pigment network, vessels)
Standardized image capture conditions

AI trained on dermoscopy images performs better than AI trained on clinical photographs.

Man Against Machine Studies

Esteva et al. (2017), Stanford/Nature Study

High-profile study claimed dermatologist-level classification.

(Esteva et al., 2017)

Study design: - Deep learning algorithm trained on 129,450 images - Tested against 21 board-certified dermatologists - Binary classification tasks (malignant vs. benign)

Results: - Algorithm performance comparable to dermatologists - AUC 0.94-0.96 for various tasks

Limitations revealed by subsequent analysis:

Dataset bias: Predominantly light skin (not reported in original paper)
Task simplification: Real dermatology involves differential diagnosis, not just binary classification
No clinical context: Algorithm didn’t know patient age, lesion history, family history
Dermoscopy vs. clinical photos: Mixed image types in dataset

Follow-up validation studies showed performance degradation in real-world conditions.

Tschandl et al. (2019) Study

More rigorous “Human vs. Machine” evaluation.

(Tschandl et al., 2019)

Study design: - 11 AI algorithms tested - 511 dermatologists evaluated same images - Used HAM10000 dataset

Results:

Evaluator	Sensitivity	Specificity
Best AI algorithm	82%	77%
Average dermatologist	70%	80%
Expert dermatologists (>10 years)	78%	82%

Key insight: AI performed comparably to average dermatologists but not better than experts.

Critical limitations not addressed: - No skin tone stratification - No long-term clinical outcomes - Diagnostic accuracy ≠ patient outcomes

MoleMapper and Patient-Generated Dermoscopy

Consumer dermoscopy attachments:

DermLite (smartphone attachment)
Handyscope
iDoc

Potential use case: - Patients take dermoscopy images at home - Send to dermatologist via telemedicine - AI pre-screens for high-risk lesions

Current status: - Limited validation data - Image quality highly variable (patient technique) - No clear evidence of clinical benefit over in-person examination

Research applications: - MoleMapper app (Oregon Health & Science University) - Patients track moles over time - Data used for melanoma epidemiology research - Not validated for clinical diagnosis

Part 5: Addressing Skin Tone Bias: Proposed Solutions

Dataset Diversification

Fitzpatrick17k (Groh et al., 2021):

Created to address imbalance in existing datasets.

(Groh et al., 2021)

Features: - 16,577 clinical images - Expert-labeled Fitzpatrick types (I-VI) - Balanced representation across skin tones - Publicly available for research

Impact: - New algorithms trained on Fitzpatrick17k show reduced performance gap - But still lag behind performance on light skin

Challenge: Even balanced datasets may not eliminate bias if pathology presentation differs by skin tone (not just data quantity issue).

Transfer Learning and Fine-Tuning

Approach: - Train algorithm on large light skin dataset - Fine-tune on smaller dark skin dataset - Test whether performance gap narrows

Results: - Modest improvement (5-10 percentage points) - Does not eliminate 27-point gap identified by Daneshjou et al. - Requires high-quality dark skin training data (still scarce)

Explainable AI (XAI) for Dermatology

Goal: Understand what features algorithms use to classify lesions

Techniques: - Saliency maps (which pixels most influenced classification?) - Attention mechanisms (where did algorithm “look”?) - Feature importance analysis

Findings: - Some algorithms learned spurious correlations (rulers in dermoscopy images, skin markers) - Pigmentation patterns learned differently for light vs. dark skin - Algorithms may rely on different features than dermatologists use

Clinical utility: - Helps identify when algorithm is making decisions for wrong reasons - Allows targeted dataset improvements - Builds trust (or appropriate distrust) in AI recommendations

Part 6: International Perspectives and Guidelines

International Skin Imaging Collaboration (ISIC)

Mission: Create large, publicly available dermoscopy archive

ISIC Archive: - Over 40,000 dermoscopy images - Used for annual melanoma detection challenges - Free for research use

Acknowledged limitations: - 74% Fitzpatrick I-III (light skin bias) - ISIC 2024 challenge began including Fitzpatrick labels for bias mitigation

International Dermoscopy Society (IDS)

Position on AI:

AI should augment, not replace, dermatologist expertise
Clinical validation required before deployment
Skin tone equity must be addressed
Patient communication about AI use essential

European Academy of Dermatology and Venereology (EADV)

EADV AI Task Force Position Statement (2024):

Position statement on AI-assisted smartphone apps and web-based services for skin disease (JEADV, 2024):

Transparency and trust: App developers should prioritize transparency in data quality, accuracy, intended use, privacy, and costs
Inclusive user experience: Apps and web-based services should ensure a uniform user experience for diverse groups of patients
Regulatory framework: European authorities should adopt a rigorous and consistent regulatory framework for dermatology apps to ensure their safety and accuracy for users

The Task Force also identified eight key considerations addressing risks including inaccuracy, skill decline, commercial interests, data security, costs, regulatory gaps, and implementation complexity.

Part 7: Clinical Scenarios and Practical Guidance

Before Adopting Dermatology AI: Questions to Ask Vendors

Vendor Evaluation Checklist

Performance data:

“What is sensitivity and specificity stratified by Fitzpatrick type I-VI?”
- If vendor cannot provide this, do not adopt
“What datasets were used for training and validation?”
- Look for HAM10000, ISIC (biased) vs. Fitzpatrick17k (balanced)
“What external validation has been performed?”
- Single-site validation is insufficient

Regulatory status:

“What is FDA clearance status?”
- Class II De Novo? 510(k)? No clearance?
“Is this marketed as a medical device or wellness tool?”
- Wellness tools bypass FDA review

Clinical integration:

“How does this integrate into existing workflow?”
- Standalone dashboards often fail
“What dermoscopy equipment is required?”
- Equipment costs may be prohibitive
“What training is provided for clinical staff?”

Liability and cost:

“What is total cost of ownership?” (device, software license, support)
“What liability protection does vendor provide?”
“Can we discontinue if performance is inadequate?”

Red Flags (Walk Away)

No Fitzpatrick-stratified performance data
Claims to “replace dermatologist”
Validated only on academic datasets, not real-world clinical use
No FDA clearance for diagnostic claims
Vendor cites Esteva et al. (2017) as sole validation evidence (outdated)
Cannot adjust sensitivity/specificity thresholds for clinical priorities

Patient Counseling Scripts

When patient asks about smartphone skin cancer apps:

“I understand these apps are convenient, but current apps are not accurate enough for skin cancer detection. Studies show some apps miss up to 93% of melanomas. A ‘low risk’ result from an app should not reassure you or delay seeing a dermatologist. If you’re concerned about a mole, the safest approach is an in-person skin examination.”

When patient has used app and received “low risk” result:

“Smartphone apps cannot replace a dermatologist’s evaluation. These apps have high error rates and are not validated across different skin tones. I will examine your skin thoroughly regardless of what the app said. Please don’t use app results to decide whether to see a doctor. Any changing mole, especially if it bleeds, itches, or has irregular borders, should be evaluated by a dermatologist.”

When discussing AI during dermoscopy examination:

“This dermoscopy device has an AI feature that helps identify lesions that might need biopsy. The AI is a tool that augments my judgment, it doesn’t make the final decision. I will review the AI recommendation along with your history, the appearance of the lesion, and my clinical experience to decide if biopsy is needed.”

Professional Society Guidelines

American Academy of Dermatology (AAD) Position Statement on Augmented Intelligence (2019, updated 2022)

Core Principles:

Augmentation, not replacement: AI should support dermatologists’ clinical decision-making, not replace it. The term “augmented intelligence” is preferred over “artificial intelligence.”
Skin tone diversity in training data is essential: Algorithms must be trained and validated on diverse skin tones. Performance data should be stratified by Fitzpatrick type.
Local validation required: AI systems validated at one institution may not generalize to others. Prospective validation at point of deployment is necessary.
Patient communication: Patients should be informed when AI tools are used in their diagnosis and treatment.
Data privacy: Patient images used for AI training must have appropriate consent and de-identification.
Continuing education: Dermatologists should receive training on AI capabilities and limitations.

Specific Recommendations:

Do not recommend consumer skin cancer detection apps to patients
Demand Fitzpatrick-stratified performance data before adopting AI systems
Maintain clinical override capability for all AI recommendations
Monitor AI performance continuously after deployment

Source:

Kovarik C, Lee I, Ko J, et al. Commentary: Position statement on augmented intelligence (AuI). JAAD. 2019;81(4):998-1000.

International Skin Imaging Collaboration (ISIC) Standards

ISIC Archive Requirements:

Image quality standards: Minimum resolution, lighting conditions, focus requirements
Metadata requirements: Age, sex, anatomic location, diagnosis, Fitzpatrick type (recommended)
Diagnostic gold standard: Histopathology confirmation for lesions classified as malignant
Licensing: CC-BY-NC for research use

ISIC Melanoma Detection Challenge:

Annual competition for best-performing algorithms:

Standardized evaluation metrics
Hidden test set to prevent overfitting
2024 challenge includes Fitzpatrick type labels for bias assessment

ISIC 2024 Recommendations:

Report algorithm performance by skin tone
Include diverse skin tones in training data
Evaluate algorithm generalization across institutions
Consider clinical context beyond image analysis

International Dermoscopy Society (IDS) AI Guidance

Key Recommendations:

Clinical validation over theoretical performance: Real-world studies more important than retrospective dataset performance
Dermoscopy image quality matters: Consumer dermoscopy attachments produce variable image quality that may degrade AI performance
Differential diagnosis vs. binary classification: Dermatology requires distinguishing multiple conditions, not just “benign vs. malignant”
Longitudinal monitoring: AI should support lesion tracking over time, not just single-timepoint classification
Integration with electronic health records: AI recommendations should be documented in EHR for medicolegal purposes

IDS emphasizes: AI is a tool for dermatologists, not a replacement. Dermoscopy expertise remains essential.

Check Your Understanding

Clinical Scenario 1: Patient with “Low Risk” App Result

Case: A 52-year-old white woman presents to your dermatology clinic concerned about a mole on her upper back. She states, “I used a smartphone app called SkinVision, and it said this mole is low risk, but I’m still worried because my mother had melanoma.”

Question: How do you respond, and what do you do clinically?

Answer:

Immediate response:

“I’m glad you came in despite what the app said. Smartphone apps for skin cancer detection are not accurate enough to trust. Studies show these apps can miss up to 93% of melanomas. Your family history of melanoma is more important than any app result.”

Clinical approach:

Obtain full history:
- How long has mole been present?
- Has it changed in size, shape, or color?
- Any bleeding, itching, or pain?
- Complete family history (first-degree relatives with melanoma)
- Personal history of melanoma or other skin cancers
Perform complete skin examination:
- Use dermoscopy to evaluate concerning lesion
- Examine all skin surfaces (not just the lesion in question)
- Look for other atypical nevi
Apply clinical judgment:
- Does lesion meet ABCDE criteria? (Asymmetry, Border irregularity, Color variation, Diameter >6mm, Evolution)
- Given family history, threshold for biopsy is lower
- Dermoscopy findings: pigment network, blue-white veil, irregular vessels?
Decision:
- If concerning features present: perform biopsy regardless of app result
- If benign-appearing but patient has family history: consider biopsy or close monitoring (photography + 3-month follow-up)
- Do NOT reassure patient based on app result
Documentation:
- Note that patient used smartphone app
- Document counseling about app limitations
- Record clinical reasoning for biopsy decision

Key teaching points:

Consumer apps create false reassurance
Family history trumps app result
Complete skin exam necessary (not just lesion in question)
Medicolegal risk: document counseling about apps

Clinical Scenario 2: Evaluating an AI-Assisted Dermoscopy Device

Case: Your dermatology practice is considering purchasing an AI-assisted dermoscopy device. The vendor presents data showing 95% sensitivity and 85% specificity for melanoma detection. The device costs $15,000 plus $3,000 annual software license.

Question: What questions should you ask before making a purchase decision?

Answer:

Critical questions to ask vendor:

Fitzpatrick-stratified performance:
- “What is sensitivity and specificity for each Fitzpatrick type I-VI?”
- If vendor says “we don’t have that data,” do not purchase
- If vendor says “performance is similar across skin types,” ask for the actual numbers
Training and validation datasets:
- “What datasets were used for training?” (Look for HAM10000, ISIC = biased)
- “How many images from Fitzpatrick V-VI skin were in training set?”
- “Was validation performed on a separate dataset from training?”
External validation:
- “Has this device been tested at institutions other than your development site?”
- “What was real-world performance in clinical practice?”
- Ask for published peer-reviewed studies, not just vendor white papers
FDA regulatory status:
- “What is FDA clearance status?” (510(k), De Novo, or none?)
- “What claims are FDA-cleared vs. marketing claims?”
- “What were sensitivity and specificity in FDA pivotal trial?”
Clinical workflow integration:
- “How long does analysis take per lesion?” (>30 seconds per lesion = workflow disruption)
- “Does it integrate with our EHR?”
- “Can we adjust sensitivity thresholds for our patient population?”
Failure modes:
- “What causes false negatives?” (amelanotic melanoma, acral melanoma?)
- “What causes false positives?” (seborrheic keratosis, benign nevi?)
- “How does device handle poor image quality or user error?”
Liability and support:
- “What liability protection does vendor provide?”
- “What happens if device misses a melanoma? Does vendor share liability?”
- “Can we return device if performance is inadequate in our practice?”
Cost-benefit analysis:
- Device cost: $15,000
- Annual license: $3,000
- Assume 5-year use: total cost $30,000
- How many melanomas must you detect to justify cost?
- Will this replace or supplement your dermoscopy expertise?

Red flags that should prompt you to decline purchase:

Vendor cannot provide Fitzpatrick-stratified data
Only validation study is Esteva et al. (2017) or similar outdated study
Device marketed for “autonomous diagnosis” without physician oversight
Cannot adjust sensitivity thresholds
Vendor claims device will “replace dermoscopy expertise”
No external validation outside vendor’s own study

Decision framework:

Request trial period (30-60 days) to test in your practice
Compare AI recommendations to your clinical judgment
Track false positives and false negatives
Survey staff about workflow integration
Decide based on real-world performance, not vendor promises

Clinical Scenario 3: Dark Skin Melanoma Evaluation

Case: A 68-year-old Black man (Fitzpatrick type VI) presents with a dark lesion on his heel that has been present for 8 months. He delayed seeking care because “I didn’t think Black people could get skin cancer.” Dermoscopy shows irregular pigmentation and blue-white veil. Your AI-assisted dermoscopy device classifies the lesion as “benign, 15% risk.”

Question: How does skin tone bias in AI affect your clinical decision-making?

Answer:

Immediate recognition:

This presentation is concerning for acral lentiginous melanoma, which:

Accounts for 70% of melanomas in Black patients
Occurs on palms, soles, under nails
Often diagnosed late (better prognosis if caught early)
Is underrepresented in AI training datasets

Critical reasoning about AI recommendation:

AI classification (“benign, 15% risk”) should be disregarded because:
- AI systems have 27-percentage-point sensitivity gap in Fitzpatrick V-VI skin
- Acral lentiginous melanoma is underrepresented in training data
- “15% risk” may actually be 40-50% risk when adjusted for skin tone bias
Clinical features override AI:
- Blue-white veil on dermoscopy = high-risk feature
- Heel location = acral site (high suspicion)
- 8-month duration = prolonged lesion (concerning)
- Irregular pigmentation = atypical
Patient education is essential:
- “Black patients absolutely can get skin cancer”
- “Melanoma in Black patients often occurs on palms, soles, or under nails”
- “Delayed diagnosis is common and leads to worse outcomes”

Clinical action:

Perform biopsy immediately, regardless of AI classification.

Technique: punch biopsy or excisional biopsy (avoid shave biopsy for acral lesions)
Send for histopathology with request to evaluate for melanoma
Do NOT delay based on AI “benign” classification

Documentation:

“68-year-old Black man with 8-month history of pigmented lesion on heel. Dermoscopy shows blue-white veil and irregular pigmentation concerning for acral lentiginous melanoma. AI-assisted dermoscopy device classified lesion as low risk (15%); however, given known performance gap in AI systems for dark skin (Daneshjou et al., 2022: 27-percentage-point sensitivity gap in Fitzpatrick V-VI), clinical judgment takes precedence. Biopsy performed.”

Key teaching points:

AI recommendations must be interpreted in context of known biases
For dark-skinned patients, lower your threshold for biopsy due to AI underperformance
Acral lentiginous melanoma is a critical diagnosis not to miss
Patient education about melanoma risk in Black patients is essential

Follow-up:

If biopsy confirms melanoma:

Stage appropriately (sentinel lymph node biopsy if indicated)
Educate patient about surveillance
Screen family members
Report missed diagnosis to AI vendor (feedback for model improvement)

If biopsy shows benign lesion:

False positive biopsy is acceptable given high suspicion
Better to biopsy benign lesion than miss melanoma in high-risk population
Document reasoning for medicolegal purposes

Clinical Scenario 4: Primary Care Physician Considering DermaSensor

Case: You’re a family medicine physician who performs many skin exams. You’re considering purchasing DermaSensor (FDA-cleared January 2024) to help decide which lesions to refer to dermatology. The device costs approximately $8,000-10,000 plus annual software license. It has 96% sensitivity for skin cancers (97% negative predictive value), but specificity data is limited in public disclosures.

Question: Is DermaSensor appropriate for your practice?

Answer:

Understanding the performance metrics:

Metric	Value	Clinical Translation
Sensitivity	96% overall (88% melanoma, 98% BCC, 99% SCC)	Detects 96 of 100 skin cancers
Negative predictive value	97%	A “low risk” result has 97% chance of being benign
Clinical utility	Reduced missed cancers from 18% to 9%	Meaningful improvement in detection

The referral volume question:

DermaSensor is designed to increase referral sensitivity. The device intentionally prioritizes not missing cancers over avoiding false positives. Before adoption, consider:

How many additional referrals can your local dermatology network accommodate?
What is your current referral-to-biopsy ratio?
What is dermatology wait time in your area?

Cost-benefit analysis:

Assumptions: - Device cost: $8,000-10,000 plus annual license - You see 20 patients per day with skin concerns - Average 2 lesions evaluated per patient = 40 lesions/day - 200 working days/year = 8,000 lesions/year - Assume 1% skin cancer prevalence (typical for primary care)

Without DermaSensor: - 80 skin cancers per year (1% of 8,000) - You refer based on clinical judgment (assume you catch 82% based on published PCP rates) - Approximately 14 skin cancers missed

With DermaSensor: - Device detects approximately 77 of 80 skin cancers (96% sensitivity) - Only 3 cancers missed (benefit: 11 additional cancers detected) - Unknown increase in “evaluate further” recommendations for benign lesions

Key considerations:

Rural/underserved settings: DermaSensor may provide greatest value where dermatology access is limited and any improvement in detection is clinically meaningful.
High-volume suburban practices: Consider whether increased referrals will strain local dermatology capacity.
Selective use: Using device only for equivocal lesions (not screening all lesions) may optimize cost-effectiveness.

When DermaSensor might be useful:

Rural practice with limited dermatology access (device helps prioritize urgent referrals)
Pre-screened population (already selected for suspicious lesions)
Used as “rule-out” tool for equivocal lesions
Practice where current detection rate is below average

When DermaSensor may be less useful:

If you already have dermoscopy training and high detection rates
If local dermatology is already overwhelmed with referrals
If using for low-suspicion screening (device designed for suspicious lesions)

Questions to ask yourself:

“What is my current skin cancer detection rate?”
- If already high (>85%), device offers limited incremental benefit
“What is dermatology wait time in my area?”
- If >3 months, additional referrals may not improve patient outcomes
“Can I justify device cost for the number of additional cancers detected?”
- Calculate cost per additional cancer detected for your practice volume

Key teaching point:

DermaSensor was designed and validated for a specific use case: helping primary care physicians identify lesions warranting dermatology referral. It is not designed for autonomous diagnosis or for use by dermatologists. Understand the intended use before adoption.

Key Takeaways

Clinical Bottom Line for Dermatology AI

Skin tone bias is the defining issue. AI performs 27 percentage points worse in Fitzpatrick V-VI skin compared to Fitzpatrick I-II. This is unacceptable.
Consumer apps are dangerous. Sensitivity ranges from 7-73%. False reassurance leads to delayed melanoma diagnosis. Actively counsel patients against using these apps.
Demand Fitzpatrick-stratified data. Before adopting any dermatology AI, require performance data broken down by skin type I-VI. If vendor can’t provide it, decline.
FDA clearance does not guarantee equity. Current FDA-cleared devices (DermaSensor, Nevisense) lack skin tone-stratified validation data.
Dermoscopy AI > smartphone AI. Algorithms trained on dermoscopy images perform better than those trained on clinical photographs, but bias persists.
Low specificity is a major problem. High sensitivity (good for not missing melanomas) often comes with low specificity (many false positives). This can overwhelm dermatology referral systems.
Acral lentiginous melanoma is underrepresented. This melanoma subtype (common in Black patients, occurs on palms/soles) is underrepresented in training datasets. AI will miss these.
External validation often fails. Algorithms that perform well in academic datasets perform worse in real-world clinical practice.
AI augments, never replaces, dermatologist judgment. Patient history, lesion evolution over time, and clinical context remain essential.
Health equity must be prioritized. Deploying biased AI in dermatology would worsen existing racial disparities in melanoma outcomes.

For primary care physicians:

Do not recommend consumer skin cancer apps
Consider dermoscopy training over AI devices
Refer based on ABCDE criteria + dermoscopy
DermaSensor (FDA-cleared 2024) may increase referral volume; understand your local dermatology capacity before adoption

For dermatologists:

Advocate for skin tone equity in AI development
Demand Fitzpatrick-stratified validation before adopting AI systems
Maintain dermoscopy expertise (AI is a supplement, not replacement)
Educate patients about limitations of consumer apps
Lower your threshold for biopsy in dark-skinned patients when using AI tools (to compensate for bias)

For patients:

Do not trust smartphone apps for skin cancer detection
See a dermatologist for any concerning mole, regardless of app results
Understand that AI is a tool, not a replacement for expert evaluation
Advocate for yourself if you have dark skin and are concerned about a lesion (demand full evaluation, not just AI scan)

Introduction: The Promise and Peril

Part 1: The Skin Tone Bias Crisis

The Fitzpatrick Scale

The Data Problem

The Performance Gap: Quantifying the Bias

Why Algorithms Fail on Dark Skin

Health Equity Implications

Part 2: Consumer Melanoma Apps: The Danger

Freeman et al. (2020) BMJ Study

The False Reassurance Problem

The Liability Question

Professional Society Position

Part 3: FDA-Cleared Dermatology AI Devices

DermaSensor (2024)

SciBase Nevisense (2017)

VisualDx and DermExpert (FDA-Exempt Clinical Decision Support)

Limitations of Current FDA-Cleared Devices

Part 4: Dermoscopy-Based AI

The Dermoscopy Advantage

Man Against Machine Studies

Tschandl et al. (2019) Study

MoleMapper and Patient-Generated Dermoscopy

Part 5: Addressing Skin Tone Bias: Proposed Solutions

Dataset Diversification

Transfer Learning and Fine-Tuning

Explainable AI (XAI) for Dermatology

Multi-Modal Approaches

Part 6: International Perspectives and Guidelines

International Skin Imaging Collaboration (ISIC)

International Dermoscopy Society (IDS)

European Academy of Dermatology and Venereology (EADV)

Part 7: Clinical Scenarios and Practical Guidance

Before Adopting Dermatology AI: Questions to Ask Vendors

Red Flags (Walk Away)

Patient Counseling Scripts

Professional Society Guidelines

Check Your Understanding

Key Takeaways

Further Reading