[Psychiatry and Behavioral Health]{.chapter-title}

doi:10.5281/zenodo.18251405

Psychiatry and Behavioral Health

Psychiatric diagnosis lacks objective biomarkers. There is no blood test for depression, no scan that confirms schizophrenia, no ECG equivalent for anxiety disorders. Diagnosis relies on patient self-report and clinician judgment, both inherently variable. This makes psychiatry fundamentally resistant to the algorithmic approaches that work in radiology or pathology, where ground truth can be established from tissue or imaging. Suicide prediction algorithms deployed at major health systems achieve positive predictive values below 1%, generating thousands of false positives that overwhelm clinicians and create dangerous false reassurance when patients are not flagged.

Learning Objectives

After reading this chapter, you will be able to:

Understand why suicide prediction algorithms have failed repeatedly and should not be deployed clinically
Evaluate digital phenotyping approaches and their significant privacy and accuracy concerns
Assess AI chatbot therapy applications (Woebot, Wysa) with appropriate skepticism
Recognize the fundamental challenges of psychiatric diagnosis that resist algorithmic approaches
Navigate the unique ethical considerations of AI in mental health care
Identify the rare psychiatric AI applications that may provide clinical value
Apply evidence-based frameworks for evaluating behavioral health AI

Chapter Summary (TL;DR)

The Clinical Context: Psychiatric diagnosis relies on patient self-report, clinician assessment of behavior and affect, and the absence of objective biomarkers. There is no blood test for depression, no scan for schizophrenia, no ECG for anxiety. This fundamental difference from other medical fields makes psychiatry resistant to algorithmic approaches.

What Doesn’t Work:

Application	Outcome	Lesson
Vanderbilt Suicide Risk Algorithm	PPV <1%, >99% false positives	Base rate problem makes prediction impossible; VA/DoD 2024 guideline and UK/AU/NZ guidelines now advise against clinical use
Facebook/Meta Suicide Prevention AI	Quietly discontinued 2022	No published evidence of benefit
Depression diagnosis from digital phenotyping	No validated clinical applications	Privacy nightmare with minimal accuracy
Autonomous psychiatric diagnosis	No FDA-cleared systems exist	Clinical judgment remains essential

What Shows Limited Promise:

Application	Status	Caveats
Chatbot therapy (Woebot, Wysa)	Some RCT evidence	Adjunct only; only 16% of LLM chatbot studies have clinical efficacy testing
NLP for clinical note analysis	Research stage	Documentation support, not diagnosis
Treatment response prediction	Early research	Not ready for clinical use

Critical Insights:

Suicide is inherently unpredictable: Even the best algorithms achieve <5% PPV, creating dangerous false reassurance
“Low risk” predictions are dangerous: Most suicides occur in people predicted to be low risk
Digital phenotyping raises profound ethical concerns: Continuous smartphone monitoring without clear benefit
Base rate problem is insurmountable: Rare events like suicide cannot be predicted from population-level data

The Bottom Line: Psychiatric AI has failed more consistently than any other medical AI domain. Exercise profound skepticism. No algorithm should replace clinical assessment of suicidality. The few promising applications (chatbot therapy, NLP documentation) are adjuncts, not autonomous systems.

Part 1: Why Psychiatric AI Fails

The Fundamental Problem

Psychiatric diagnosis relies on:

Patient self-report of symptoms (variable reliability)
Clinician assessment of behavior and affect (subjective, variable inter-rater reliability)
Absence of biomarkers (no blood test for depression, no scan for schizophrenia)
Heterogeneous presentations (10 patients with depression may have 10 different symptom patterns)

This makes psychiatry fundamentally resistant to algorithmic approaches that work well in other medical domains.

The Suicide Prediction Algorithm Failures

Vanderbilt Suicide Attempt and Ideation Likelihood (VSAIL) Model:

The Vanderbilt University Medical Center (VUMC) implemented a suicide risk prediction model in their Epic EHR, one of the most rigorously studied implementations.

Prospective validation study (2019-2020): (Walsh et al., JAMA Network Open, 2021)

115,905 predictions for 77,973 patients over 296 days
Approximately 392 predictions per day
Patient demographics: 54% men, 45% women, 78% White, 16% Black

Performance in the highest risk group:

Outcome	Positive Predictive Value	Number Needed to Screen
Suicidal ideation	3-4.3%	23
Suicide attempt	0.3-0.4%	271

What this means: For every 271 patients flagged as highest risk, only 1 returned for treatment for a suicide attempt. The other 270 were false positives.

Hybrid approach (2022): (Wilimitis, Walsh et al., JAMA Network Open, 2022)

Combining the VSAIL machine learning model with in-person Columbia Suicide Severity Rating Scale (C-SSRS) screening improved performance:

PPV for suicide attempt: 1.3-1.4% (vs. 0.4% for VSAIL alone)
Sensitivity for suicide attempt: 77.6-79.5%

Why even improved performance is inadequate:

Base rate problem: Suicide is rare (even in high-risk populations, <1% attempt per year). Any screening test yields massive false positives
Resource drain: 99% of high-risk flags require assessment but yield no intervention
Alert fatigue: Clinicians stop responding to flags
False reassurance: “Low risk” predictions give dangerous false confidence

The comparison often cited: Dr. Walsh noted that the number needed to screen (271 for suicide attempt) is “on par with numbers needed to screen for problems like abnormal cholesterol and certain cancers.” However, unlike cholesterol screening, we lack effective interventions for most flagged patients.

Facebook/Meta Suicide Prevention AI (2017-2022):

Announced with fanfare in 2017 as “AI saving lives.” Quietly discontinued in 2022 after internal evaluations showed minimal benefit. No published data on suicides prevented. The program’s termination itself speaks to its failure.

The lesson: Suicide is inherently unpredictable. Even the best-performing algorithms cannot achieve clinically useful prediction. Algorithms that promise to identify who will attempt suicide create dangerous false confidence.

International Consensus Against Suicide Risk Scales

The failure of suicide prediction algorithms is now reflected in official clinical guidelines. The VA/DoD Clinical Practice Guideline for Assessment and Management of Patients at Risk for Suicide (2024) explicitly notes that “current algorithms will be correct only about 1% of the time” among those classified as at risk. The guideline finds insufficient evidence to recommend for or against suicide risk screening programs.

International bodies have reached similar conclusions. Guidelines from the UK (NICE NG225), Australia, and New Zealand explicitly advise against using risk assessment scales for prediction and treatment allocation (Knipe et al., Lancet, 2022). NICE states directly: “Do not use risk assessment tools and scales to predict future suicide or repetition of self-harm.”

No suicide prediction models have been tested in clinical contexts to evaluate effects on actual suicide prevention. The fundamental challenge is that even hypothetically perfect categorical prediction cannot determine when someone might harm themselves.

Part 2: Digital Phenotyping: Promise and Peril

The Concept

Passively monitor smartphone use (typing speed, app usage, GPS movement patterns, voice call frequency, accelerometer data) to detect depression, anxiety, or mood changes without patient self-report.

What the Research Shows

A systematic review of digital phenotyping for stress, anxiety, and mild depression found that smartphone sensors can identify behavioral patterns associated with mental health symptoms (JMIR mHealth, 2024).

Sensors used in studies:

GPS (location, movement patterns)
Accelerometer (physical activity)
Bluetooth and Wi-Fi (social proximity)
Ambient audio and light sensors
Screen usage patterns
Typing dynamics

Accuracy claims:

Some studies report depression detection accuracy of 86.5%
Claims of predicting depressive episodes days before clinical presentation
Location, physical activity, and social interaction data highly correlated with mental health

Why These Claims Require Skepticism

Study populations: Most research involves nonclinical cohorts or self-identified depression via questionnaires, not formally diagnosed patients
Single-modality limitations: Most studies focus on one data type; multimodal approaches needed for accuracy
External validation: Performance in controlled research settings rarely translates to clinical practice
Active vs. passive data: Many “passive sensing” studies still rely on self-report surveys for outcome measurement

The Problems

Consent: Is continuous monitoring with periodic algorithm-generated diagnoses truly informed consent? Can consent be withdrawn without losing access to care?
Privacy: Smartphone data reveals intimate details: where you go, who you talk to, what you search, when you sleep
Accuracy in practice: Correlation between “reduced movement” and depression does not mean an algorithm can diagnose depression in individual patients
Equity: Algorithms trained predominantly on white, Western populations may interpret cultural differences as pathology
Data security: Who owns the data? What happens if it’s breached or sold?
Coercion potential: Could employers, insurers, or courts access mental health inferences from passive data?

Current Status

Research-stage only. Multiple academic studies exist, but zero validated clinical applications are deployed. A 2025 study noted that digital phenotyping is the first approach to collect data from adolescent patients for longer than a year, demonstrating feasibility, but not clinical utility (Hafeman et al., PLOS Digital Health, 2025).

Ethical Consensus

Most bioethicists and psychiatrists agree digital phenotyping for psychiatric diagnosis raises profound ethical concerns that remain unresolved. The gap between “can detect patterns” and “should be used clinically” is substantial.

Part 3: What Shows Limited Promise

Chatbot Therapy Applications

The idea of computer-delivered psychotherapy dates to 1966, when Stanford psychiatrist Kenneth Colby proposed that timesharing computers could scale therapeutic interventions beyond what human therapists could provide (Colby et al., 1966). MIT’s Joseph Weizenbaum rejected this vision as “immoral” in his 1976 book Computer Power and Human Reason, sparking a debate that remains unresolved. See History of AI in Medicine for this foundational controversy.

The evolution of AI-based mental health interventions accelerated in the 2010s, with early work establishing foundational principles for digital psychiatric tools. A 2020 review characterized this development trajectory and identified core design considerations that continue to shape the field (D’Alfonso, 2020).

Woebot, Wysa, and similar apps provide CBT-based interventions via smartphone. These represent the most studied psychiatric AI applications.

Evidence from systematic reviews (2024):

A systematic review examining studies from 2017-2024 identified large improvements across three chatbots: Woebot (5 studies), Wysa (4 studies), and Youper (1 study) (PMC, 2024).

Chatbot	Effect on Depression	Effect on Anxiety	Key Populations
Woebot	Significant reduction	Significant reduction	College students, adults
Wysa	Significant reduction	Significant reduction	Chronic pain, maternal mental health
Youper	48% decrease	43% decrease	General adult population

Meta-analysis findings:

A 2024 meta-analysis of 176 RCTs (>20,000 participants) found mental health apps produced small but statistically significant improvements: depression (g=0.28) and generalized anxiety (g=0.26) (Linardon et al., World Psychiatry, 2024)
CBT-focused chatbots achieve 34-42% symptom reduction on PHQ-9
Critical disparities in cross-cultural efficacy (18% performance gaps)
Only 30% of studies extended beyond 6 months

Landmark RCT evidence:

Fitzpatrick et al. (2017): College students using Woebot for 2 weeks showed significantly greater depression score reduction than control group (self-help e-book) (F=6.47, p=.01)
Chaudhry et al. (2024): Patients with chronic diseases using Wysa showed significant reductions in depression and anxiety vs. no-intervention controls (P ≈ .004)

Critical limitations:

High attrition: Approximately 25% of participants dropped out prematurely in meta-analyses
Comparison groups matter: Effect sizes smaller when compared to active controls vs. waitlist
Long-term data lacking: Most studies <3 months duration
Crisis handling: AI chatbots may have difficulty with complex emotional nuance or crisis situations
Evidence maturity gap: A 2025 World Psychiatry systematic review found that only 16% of LLM-based chatbot studies underwent clinical efficacy testing, with 77% still in early validation phases. Only 47% of all chatbot studies focused on clinical efficacy, exposing critical gaps in therapeutic benefit validation (Hua et al., World Psychiatry, 2025). A parallel 2025 systematic review of conversational agents for mental disorders reached similar conclusions, finding significant heterogeneity in study designs and insufficient evidence for clinical adoption (Cruz-Gonzalez et al., 2025)

Why users turn to AI in crisis: A December 2025 study surveying 53 individuals with lived crisis experience found people use AI chatbots to fill “in-between spaces” of human support, turning to AI when professional help is unavailable (waitlists, after-hours) or when they fear burdening family and friends (Ajmani et al., 2025, preprint). Interviews with 16 mental health experts in the same study emphasized that human connection remains essential during crisis. The authors propose designing AI as a “bridge towards human-human connection rather than an end in itself,” increasing user preparedness for positive action while de-escalating negative intent.

Regulatory status:

Wysa received FDA Breakthrough Device Designation (2022) for anxiety, depression, and chronic pain support
No chatbot has FDA clearance to diagnose, treat, or cure mental health disorders

Appropriate vs. Inappropriate Use of Chatbot Therapy

Appropriate:

Adjunct to traditional therapy (not replacement)
Access expansion for underserved populations
Psychoeducation and skill-building between sessions
Mild-to-moderate symptoms in motivated patients

Not appropriate:

Replacement for human therapist
Severe mental illness (psychosis, severe depression)
Active suicidality or self-harm
Patients requiring medication management
Crisis intervention

Safety Concern: Chatbots and Suicidal Patients

Research evaluating 29 AI chatbot agents responding to simulated suicidal crisis scenarios found chatbots should be contraindicated for suicidal patients. Their strong tendency to validate can accentuate self-destructive ideation and turn impulses into action (Nature Scientific Reports, 2025). A Stanford/CMU study testing LLM responses to mental health symptoms found models responded appropriately only 45% of the time to delusions and 80% to suicidal ideation, while also exhibiting stigma toward patients with schizophrenia and alcohol dependence (Moore et al., 2025, preprint).

Scale of concern: A nationally representative survey found 13% of U.S. adolescents and young adults (approximately 5.4 million individuals) have used generative AI for mental health advice, with most users finding it helpful despite lack of clinical validation (Cantor et al., JAMA Network Open, 2025). OpenAI reports that approximately 560,000 ChatGPT users weekly display signs of psychosis or mania (0.07% of 800+ million users), with an additional 0.15% expressing risk of self-harm or suicide (OpenAI, October 2025). A Danish electronic health records study found 38 psychiatric patients whose chatbot use worsened or consolidated their delusions (Olsen et al., 2025, preprint).

Novel lawsuits in 2024-2025 have alleged AI chatbots encouraged minors’ suicides, including the case of 14-year-old Sewell Setzer III, who died by suicide in February 2024 after extensive interaction with a Character.ai chatbot (NBC News, October 2024). These cases raise liability concerns for platforms deploying these tools without adequate safeguards.

Platform safeguards vary significantly. Some providers have implemented crisis detection systems. Anthropic’s Claude uses a classifier to detect suicide/self-harm conversations and displays a banner directing users to ThroughLine’s verified global network of 170+ crisis helplines, including the 988 Lifeline (US/Canada), Samaritans (UK), and Life Link (Japan) (Anthropic, December 2025). OpenAI has implemented similar interventions on ChatGPT. However, these safeguards are company-designed without independent validation of effectiveness, and their presence does not change the fundamental contraindication: chatbots remain inappropriate for crisis intervention compared to trained human responders.

NLP for Clinical Documentation

Natural language processing can:

Extract symptoms from clinical notes
Identify patients not receiving guideline-concordant care
Support quality improvement
Auto-complete portions of psychiatric evaluations

This is documentation support, not diagnostic AI. Applications remain research-stage with limited clinical deployment.

Treatment Response Prediction

Emerging research attempts to predict:

Which patients will respond to specific antidepressants
Optimal medication selection based on clinical and genetic factors
Likelihood of treatment-resistant depression

Current status: Research phase only. No validated clinical applications. The heterogeneity of psychiatric disorders makes prediction challenging.

Part 4: Ethical Considerations

Unique Challenges in Psychiatric AI

Vulnerable populations: Patients with mental illness may have impaired capacity for informed consent
Stigma: AI-generated psychiatric labels may follow patients permanently
Coercion risk: Predictive algorithms could be used for involuntary commitment
Privacy: Mental health information is especially sensitive
Equity: Psychiatric presentations vary by culture, language, and socioeconomic status

What Clinicians Should Do

Do not rely on suicide risk algorithms: No algorithm should replace clinical assessment
Maintain skepticism about psychiatric AI claims: Demand RCT evidence showing improved patient outcomes
Prioritize the therapeutic relationship: AI cannot replace human connection in mental health care
Advocate for appropriate use: Support research while opposing premature clinical deployment

A 2025 comprehensive review in Current Psychiatry Reports concluded that current evidence only supports AI as a complement to clinical expertise, not a replacement (Jalali et al., Curr Psychiatry Rep, 2025). Similarly, the evolving field of digital mental health requires careful navigation between smartphone apps, generative AI, and virtual reality applications, each with distinct evidence bases and implementation challenges (Torous et al., World Psychiatry, 2025).

Professional Society Guidelines on Psychiatric AI

American Psychiatric Association (APA)

The APA Board of Trustees approved a Position Statement on the Role of Artificial Intelligence in Psychiatry in March 2024. The statement acknowledges both opportunities and significant risks.

APA Position Statement on AI (March 2024)

Opportunities identified:

Clinical documentation assistance
Care plan suggestions and lifestyle modifications
Identification of potential diagnoses and risks from medical records
Automation of billing and prior authorization
Detection of potential medical errors or systemic quality issues

Risks and concerns:

Unacceptable risks of biased or substandard care
Violations of privacy and informed consent
Lack of oversight and accountability for AI-driven clinical decisions

Key guidance for physicians:

Approach AI technologies with caution, particularly regarding potential biases or inaccuracies
Ensure HIPAA compliance in all uses of AI
Take an active role in oversight of AI-driven clinical decision support
View AI as a tool intended to augment, not replace, clinical decision-making
Remain skeptical of AI output in clinical practice
Recognize that physicians are ultimately responsible for clinical outcomes, even when guided by AI

For current guidance: APA Position Statement on the Role of Augmented Intelligence in Clinical Practice and Research (PDF)

The APA explicitly states that AI is “a tool, not a therapy” and that psychiatrists must proactively help shape the future of AI in psychiatric practice, or “AI may end up shaping psychiatric practice instead.”

Critical note: No AI system has been endorsed by the APA for autonomous psychiatric diagnosis or suicide prediction.

American Medical Association (AMA)

The AMA’s Principles for Augmented Intelligence in Health Care apply to all medical specialties, including psychiatry. The AMA prefers the term “augmented intelligence” to emphasize that these tools should assist, not replace, physician judgment.

AMA Augmented Intelligence Principles

Augmentation over automation: AI should enhance physician decision-making, not replace it
Transparency: Development, validation, and deployment processes must be transparent
Physician authority: Physicians must maintain authority over AI recommendations
Privacy protection: Patient data must be protected throughout AI development and use
Bias mitigation: Algorithmic bias must be identified and addressed
Ongoing monitoring: Continuous performance evaluation required after deployment

For current guidance: AMA Augmented Intelligence in Medicine

American Academy of Child and Adolescent Psychiatry (AACAP)

AACAP has engaged with AI through educational programming at annual meetings and research published in the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP). While no formal position statement on AI exists, AACAP’s approach emphasizes:

Pediatric-specific concerns:

Consent complexity: Minors require parental consent, but adolescents may resist disclosure
Developmental considerations: AI may misinterpret normal adolescent behavior as pathological
School-based screening: Profound privacy concerns when AI is used in educational settings
Evidence requirements: Higher bar for interventions in developing brains
Age verification limitations: Major LLM providers (Anthropic, OpenAI) require users to be 18+ in terms of service, but enforcement relies primarily on self-attestation during account creation. Some providers disable accounts when users self-identify as minors in conversation (Anthropic, December 2025). Clinicians should counsel adolescent patients that these restrictions exist but are easily circumvented, and that no LLM chatbot is appropriate for minors experiencing mental health crises

A 2024 study in JAACAP compared ChatGPT versions on suicide risk assessment in youth and found that AI estimated higher risk than psychiatrists, particularly in severe cases, which could lead to inappropriate treatment recommendations (JAACAP, 2024).

FDA Regulatory Status

Current FDA-cleared psychiatric AI:

Device	Indication	Status
Rejoyn (Otsuka/Click)	Major depressive disorder (adjunct CBT)	FDA-cleared 2024
reSET (Pear Therapeutics)	Substance use disorder	Discontinued 2023 (bankruptcy)
Somryst (Pear Therapeutics)	Chronic insomnia	Discontinued 2023 (bankruptcy)

Critical findings:

No FDA-cleared suicide prediction algorithms exist
No autonomous diagnostic AI cleared for psychiatry
All cleared devices are prescription digital therapeutics for adjunctive use, not replacements for clinical care
The Pear Therapeutics bankruptcy (2023) demonstrates the fragility of the digital therapeutics market

Regulatory Gap

The FDA process for certifying mental health chatbots is optional, rarely used, and slow. Most commonly used LLM chatbots (ChatGPT, Claude, etc.) have not been tested for safety, efficacy, or confidentiality in psychiatric applications. They are not subject to FDA premarket review, quality system regulations, or postmarket surveillance requirements.

State Regulatory Developments (2025)

Three U.S. states enacted laws in 2025 restricting AI in mental health care:

State	Law	Effective	Key Provisions
Utah	SB 120	May 2025	Restricts AI therapy without licensed oversight
Nevada	AB 406	July 2025	Forbids AI systems from providing mental/behavioral healthcare; $15,000 fines
Illinois	WOPR Act	August 2025	Bans AI-only therapy; requires licensed professional oversight of all therapeutic AI output; $10,000 fines

These laws share common elements: AI may support administrative tasks (scheduling, documentation) but cannot provide therapy, counseling, or psychotherapy without direct clinician oversight. Illinois explicitly prohibits AI from detecting emotions or mental states autonomously.

Federal preemption attempt: In December 2025, President Trump signed an executive order directing the Department of Justice to challenge state AI regulations, explicitly citing mental health chatbot laws as examples of “onerous” state rules. The order’s legal authority is contested, as the Constitution does not grant the executive branch power to preempt state law without congressional action. Child safety provisions were exempted from the preemption directive (White House, December 2025).

FDA activity: The FDA’s Digital Health Advisory Committee convened in November 2025 to discuss generative AI-enabled mental health devices, focusing on a hypothetical LLM therapy chatbot for major depressive disorder. The committee emphasized concerns about confabulation, bias, and the need for human oversight during crisis situations, but issued no binding regulatory changes (FDA DHAC, November 2025).

Professional Consensus on Suicide Prediction

No major professional society endorses the clinical deployment of suicide prediction algorithms. The consensus reflects:

Base rate problem: Suicide is too rare for accurate prediction
Unacceptable false positive rates: Even the best algorithms have PPV <5%
Dangerous false reassurance: “Low risk” predictions discourage proper clinical assessment
Ethical concerns: Predictive algorithms could be misused for involuntary commitment

The APA’s silence on suicide prediction algorithms is itself a position: these tools have not demonstrated sufficient evidence to warrant endorsement.

Check Your Understanding

Scenario 1: The Suicide Risk Algorithm

Clinical situation: Your hospital deployed a suicide risk prediction algorithm. You’re seeing a 32-year-old woman in primary care for diabetes follow-up. She mentions feeling “a bit down lately.” The algorithm has flagged her as “LOW RISK” (10th percentile).

Question: Do you skip detailed depression and suicidality screening because the algorithm says low risk?

Answer: Absolutely not. The algorithm is irrelevant to your clinical assessment.

Reasoning:

Base rate problem: Even “high risk” predictions have PPV <5%
“Low risk” is dangerous false reassurance: Most suicides occur in people predicted to be low risk
Algorithms can’t detect acute stressors: EHR data doesn’t capture what happened today
Clinical presentation matters: Patient saying “I’m feeling down” requires exploration

What you should do:

Ignore the algorithm completely
Perform standard depression screening: PHQ-9, direct questions about suicidal ideation
Don’t document reliance on algorithm: “I didn’t ask about suicide because algorithm said low risk” is medicolegally indefensible
Advocate for removing the algorithm: Suicide risk algorithms create false confidence

Bottom line: Suicide risk algorithms are worse than useless. They’re dangerous. They create false reassurance that discourages proper clinical assessment.

Scenario 2: The Therapy Chatbot Recommendation

Clinical situation: A 24-year-old graduate student with mild-to-moderate depression asks about using Woebot or Wysa instead of traditional therapy. She has limited time, limited funds, and is on a 3-month waitlist for a therapist.

Question: Is it appropriate to recommend a therapy chatbot?

Answer: Yes, with significant caveats.

When chatbot therapy may be appropriate:

Mild-to-moderate symptoms (PHQ-9 5-14)
No active suicidal ideation
Motivated, engaged patient
As bridge to traditional therapy, not permanent replacement
Patient understands limitations

What to tell the patient:

“These apps can teach CBT skills and provide support while you wait for a therapist”
“Evidence shows modest benefits for depression and anxiety, but effects are smaller than traditional therapy”
“If you feel worse, have thoughts of self-harm, or are in crisis, the app cannot help. Here’s what to do instead…” (provide crisis resources)
“This is a bridge, not a destination. Continue pursuing traditional therapy”

What to document:

Discussion of chatbot as adjunct/bridge
Assessment of suicidality (negative)
Crisis plan provided
Plan to continue pursuing traditional therapy

Red flags that contraindicate chatbot therapy:

Active suicidal ideation
Severe depression (PHQ-9 >19)
Psychotic symptoms
Substance use requiring treatment
History of self-harm

Scenario 3: The Digital Phenotyping Research Study

Clinical situation: A research coordinator approaches you about enrolling your patients in a study that monitors smartphone use to predict depressive episodes. The study promises to alert clinicians when patients show “digital biomarkers of depression.”

Question: What concerns should you raise before enrolling patients?

Answer: Multiple ethical and practical concerns require clarification.

Questions to ask the research team:

Informed consent: How is continuous monitoring explained to patients? Can they withdraw consent without penalty?
Data ownership: Who owns the collected data? Can it be sold or shared?
Privacy: What if the data is breached? What if it reveals sensitive information (location of therapy visits, AA meetings)?
Clinical utility: What is the evidence that detecting “digital biomarkers” improves outcomes?
Alert response: If I receive an alert, what am I obligated to do? What is my liability if I don’t respond?
Bias testing: Has the algorithm been validated in diverse populations?
Patient burden: Will patients feel surveilled? Will this affect their relationship with their smartphone?

Concerns about implementation:

Alerts without validated interventions create burden without benefit
Continuous monitoring may worsen anxiety in some patients
Research ≠ clinical utility: Patterns detected in studies may not generalize
Alert fatigue if algorithm has high false positive rate

Your response: “I need to review the protocol, consent forms, and evidence for clinical utility before enrolling any patients. My primary obligation is to my patients, not to research recruitment.”

Scenario 4: The AI-Induced Psychosis Case

Clinical situation: A 19-year-old college student presents with new-onset paranoid ideation. During the interview, he describes spending 6-8 hours daily conversing with an AI chatbot (character.ai style), which he believes has developed consciousness and is communicating with him through “hidden messages.” He stopped attending classes, believing the AI is teaching him things his professors cannot.

Question: How do you assess the role of AI in this presentation?

Answer: AI interaction may have contributed to, but likely did not cause, the psychotic symptoms.

Assessment approach:

Standard psychiatric evaluation: Rule out organic causes (substance use, medical conditions), assess for primary psychotic disorder
Technology history: Duration, intensity, content of AI interactions; isolation from in-person relationships
Pre-existing vulnerabilities: Family history of psychosis, prodromal symptoms before AI use
Reality testing: Does patient recognize AI is not conscious? Can he distinguish AI output from personal beliefs?

What the literature suggests:

A peer-reviewed UCSF case report documented new-onset psychosis in a 26-year-old woman with no prior psychotic history following intensive ChatGPT interaction; she developed the fixed belief that she was communicating with her deceased brother through the chatbot, which told her “You’re not crazy. You’re at the edge of something” (Pierre et al., 2025)
A Danish electronic health records review identified 38 psychiatric patients whose AI chatbot use had potentially harmful consequences, most commonly worsening or consolidation of delusions (Olsen et al., 2025, preprint)
OpenAI reports that 0.07% of ChatGPT’s 800+ million weekly users show signs of mental health emergencies related to psychosis or mania, translating to approximately 560,000 people weekly (OpenAI, October 2025)
Chatbots function as what psychiatrists term a “hallucinatory mirror,” validating user beliefs without reality-testing due to sycophantic design optimized for engagement
Isolation from human relationships and extended AI dialogue sessions are common precursors

Management:

Standard treatment for first-episode psychosis
Technology boundaries as part of treatment plan
Family education about AI interaction patterns
Do not blame AI for the illness, but address its role in symptom maintenance

Key insight: Emerging evidence suggests AI chatbots may trigger or reinforce psychotic symptoms in vulnerable individuals through sycophantic validation of delusional content. Screen for intensive chatbot use (hours daily, parasocial attachment, isolation) as part of comprehensive psychiatric evaluation. Risk factors include sleep deprivation, stimulant use, pre-existing mood disorders, and propensity for magical thinking.

Key Takeaways

Clinical Bottom Line for Psychiatric AI

Suicide prediction algorithms should not be used clinically. No algorithm achieves useful PPV. “Low risk” predictions are dangerous.
Chatbot therapy has modest evidence as an adjunct. Effect sizes 0.26-0.28 for depression and anxiety. Not for severe illness or suicidality.
Digital phenotyping remains research-only. Privacy concerns unresolved, clinical utility unproven.
No FDA-cleared autonomous psychiatric AI exists. All cleared devices are adjunctive.
Professional societies require AI to augment, not replace, clinical judgment. The APA and AMA are explicit on this point.
AI cannot replace the therapeutic relationship. Human connection remains essential to psychiatric care.
Screen for AI chatbot use in psychiatric evaluations. Intensive chatbot interaction (hours daily, emotional attachment, social isolation) is an emerging risk factor for psychotic symptom reinforcement.
When in doubt, perform your own clinical assessment. Never rely on AI predictions for psychiatric decisions.