Psychiatry and Behavioral Health
Psychiatric diagnosis lacks objective biomarkers. There is no blood test for depression, no scan that confirms schizophrenia, no ECG equivalent for anxiety disorders. Diagnosis relies on patient self-report and clinician judgment, both inherently variable. This makes psychiatry fundamentally resistant to the algorithmic approaches that work in radiology or pathology, where ground truth can be established from tissue or imaging. Suicide prediction algorithms deployed at major health systems achieve positive predictive values below 1%, generating thousands of false positives that overwhelm clinicians and create dangerous false reassurance when patients are not flagged.
After reading this chapter, you will be able to:
- Understand why suicide prediction algorithms have failed repeatedly and should not be deployed clinically
- Evaluate digital phenotyping approaches and their significant privacy and accuracy concerns
- Assess AI chatbot therapy applications (Woebot, Wysa) with appropriate skepticism
- Recognize the fundamental challenges of psychiatric diagnosis that resist algorithmic approaches
- Navigate the unique ethical considerations of AI in mental health care
- Identify the rare psychiatric AI applications that may provide clinical value
- Apply evidence-based frameworks for evaluating behavioral health AI
Part 1: Why Psychiatric AI Fails
The Fundamental Problem
Psychiatric diagnosis relies on:
- Patient self-report of symptoms (variable reliability)
- Clinician assessment of behavior and affect (subjective, variable inter-rater reliability)
- Absence of biomarkers (no blood test for depression, no scan for schizophrenia)
- Heterogeneous presentations (10 patients with depression may have 10 different symptom patterns)
This makes psychiatry fundamentally resistant to algorithmic approaches that work well in other medical domains.
The Suicide Prediction Algorithm Failures
Vanderbilt Suicide Attempt and Ideation Likelihood (VSAIL) Model:
The Vanderbilt University Medical Center (VUMC) implemented a suicide risk prediction model in their Epic EHR, one of the most rigorously studied implementations.
Prospective validation study (2019-2020): (Walsh et al., JAMA Network Open, 2021)
- 115,905 predictions for 77,973 patients over 296 days
- Approximately 392 predictions per day
- Patient demographics: 54% men, 45% women, 78% White, 16% Black
Performance in the highest risk group:
| Outcome | Positive Predictive Value | Number Needed to Screen |
|---|---|---|
| Suicidal ideation | 3-4.3% | 23 |
| Suicide attempt | 0.3-0.4% | 271 |
What this means: For every 271 patients flagged as highest risk, only 1 returned for treatment for a suicide attempt. The other 270 were false positives.
Hybrid approach (2022): (Wilimitis, Walsh et al., JAMA Network Open, 2022)
Combining the VSAIL machine learning model with in-person Columbia Suicide Severity Rating Scale (C-SSRS) screening improved performance:
- PPV for suicide attempt: 1.3-1.4% (vs. 0.4% for VSAIL alone)
- Sensitivity for suicide attempt: 77.6-79.5%
Why even improved performance is inadequate:
- Base rate problem: Suicide is rare (even in high-risk populations, <1% attempt per year). Any screening test yields massive false positives
- Resource drain: 99% of high-risk flags require assessment but yield no intervention
- Alert fatigue: Clinicians stop responding to flags
- False reassurance: “Low risk” predictions give dangerous false confidence
The comparison often cited: Dr. Walsh noted that the number needed to screen (271 for suicide attempt) is “on par with numbers needed to screen for problems like abnormal cholesterol and certain cancers.” However, unlike cholesterol screening, we lack effective interventions for most flagged patients.
Facebook/Meta Suicide Prevention AI (2017-2022):
Announced with fanfare in 2017 as “AI saving lives.” Quietly discontinued in 2022 after internal evaluations showed minimal benefit. No published data on suicides prevented. The program’s termination itself speaks to its failure.
The lesson: Suicide is inherently unpredictable. Even the best-performing algorithms cannot achieve clinically useful prediction. Algorithms that promise to identify who will attempt suicide create dangerous false confidence.
International Consensus Against Suicide Risk Scales
The failure of suicide prediction algorithms is now reflected in official clinical guidelines. The VA/DoD Clinical Practice Guideline for Assessment and Management of Patients at Risk for Suicide (2024) explicitly notes that “current algorithms will be correct only about 1% of the time” among those classified as at risk. The guideline finds insufficient evidence to recommend for or against suicide risk screening programs.
International bodies have reached similar conclusions. Guidelines from the UK (NICE NG225), Australia, and New Zealand explicitly advise against using risk assessment scales for prediction and treatment allocation (Knipe et al., Lancet, 2022). NICE states directly: “Do not use risk assessment tools and scales to predict future suicide or repetition of self-harm.”
No suicide prediction models have been tested in clinical contexts to evaluate effects on actual suicide prevention. The fundamental challenge is that even hypothetically perfect categorical prediction cannot determine when someone might harm themselves.
Part 2: Digital Phenotyping: Promise and Peril
The Concept
Passively monitor smartphone use (typing speed, app usage, GPS movement patterns, voice call frequency, accelerometer data) to detect depression, anxiety, or mood changes without patient self-report.
What the Research Shows
A systematic review of digital phenotyping for stress, anxiety, and mild depression found that smartphone sensors can identify behavioral patterns associated with mental health symptoms (JMIR mHealth, 2024).
Sensors used in studies:
- GPS (location, movement patterns)
- Accelerometer (physical activity)
- Bluetooth and Wi-Fi (social proximity)
- Ambient audio and light sensors
- Screen usage patterns
- Typing dynamics
Accuracy claims:
- Some studies report depression detection accuracy of 86.5%
- Claims of predicting depressive episodes days before clinical presentation
- Location, physical activity, and social interaction data highly correlated with mental health
Why These Claims Require Skepticism
- Study populations: Most research involves nonclinical cohorts or self-identified depression via questionnaires, not formally diagnosed patients
- Single-modality limitations: Most studies focus on one data type; multimodal approaches needed for accuracy
- External validation: Performance in controlled research settings rarely translates to clinical practice
- Active vs. passive data: Many “passive sensing” studies still rely on self-report surveys for outcome measurement
The Problems
- Consent: Is continuous monitoring with periodic algorithm-generated diagnoses truly informed consent? Can consent be withdrawn without losing access to care?
- Privacy: Smartphone data reveals intimate details: where you go, who you talk to, what you search, when you sleep
- Accuracy in practice: Correlation between “reduced movement” and depression does not mean an algorithm can diagnose depression in individual patients
- Equity: Algorithms trained predominantly on white, Western populations may interpret cultural differences as pathology
- Data security: Who owns the data? What happens if it’s breached or sold?
- Coercion potential: Could employers, insurers, or courts access mental health inferences from passive data?
Current Status
Research-stage only. Multiple academic studies exist, but zero validated clinical applications are deployed. A 2025 study noted that digital phenotyping is the first approach to collect data from adolescent patients for longer than a year, demonstrating feasibility, but not clinical utility (Hafeman et al., PLOS Digital Health, 2025).
Ethical Consensus
Most bioethicists and psychiatrists agree digital phenotyping for psychiatric diagnosis raises profound ethical concerns that remain unresolved. The gap between “can detect patterns” and “should be used clinically” is substantial.
Part 3: What Shows Limited Promise
Chatbot Therapy Applications
The idea of computer-delivered psychotherapy dates to 1966, when Stanford psychiatrist Kenneth Colby proposed that timesharing computers could scale therapeutic interventions beyond what human therapists could provide (Colby et al., 1966). MIT’s Joseph Weizenbaum rejected this vision as “immoral” in his 1976 book Computer Power and Human Reason, sparking a debate that remains unresolved. See History of AI in Medicine for this foundational controversy.
The evolution of AI-based mental health interventions accelerated in the 2010s, with early work establishing foundational principles for digital psychiatric tools. A 2020 review characterized this development trajectory and identified core design considerations that continue to shape the field (D’Alfonso, 2020).
Woebot, Wysa, and similar apps provide CBT-based interventions via smartphone. These represent the most studied psychiatric AI applications.
Evidence from systematic reviews (2024):
A systematic review examining studies from 2017-2024 identified large improvements across three chatbots: Woebot (5 studies), Wysa (4 studies), and Youper (1 study) (PMC, 2024).
| Chatbot | Effect on Depression | Effect on Anxiety | Key Populations |
|---|---|---|---|
| Woebot | Significant reduction | Significant reduction | College students, adults |
| Wysa | Significant reduction | Significant reduction | Chronic pain, maternal mental health |
| Youper | 48% decrease | 43% decrease | General adult population |
Meta-analysis findings:
- A 2024 meta-analysis of 176 RCTs (>20,000 participants) found mental health apps produced small but statistically significant improvements: depression (g=0.28) and generalized anxiety (g=0.26) (Linardon et al., World Psychiatry, 2024)
- CBT-focused chatbots achieve 34-42% symptom reduction on PHQ-9
- Critical disparities in cross-cultural efficacy (18% performance gaps)
- Only 30% of studies extended beyond 6 months
Landmark RCT evidence:
- Fitzpatrick et al. (2017): College students using Woebot for 2 weeks showed significantly greater depression score reduction than control group (self-help e-book) (F=6.47, p=.01)
- Chaudhry et al. (2024): Patients with chronic diseases using Wysa showed significant reductions in depression and anxiety vs. no-intervention controls (P ≈ .004)
Critical limitations:
- High attrition: Approximately 25% of participants dropped out prematurely in meta-analyses
- Comparison groups matter: Effect sizes smaller when compared to active controls vs. waitlist
- Long-term data lacking: Most studies <3 months duration
- Crisis handling: AI chatbots may have difficulty with complex emotional nuance or crisis situations
- Evidence maturity gap: A 2025 World Psychiatry systematic review found that only 16% of LLM-based chatbot studies underwent clinical efficacy testing, with 77% still in early validation phases. Only 47% of all chatbot studies focused on clinical efficacy, exposing critical gaps in therapeutic benefit validation (Hua et al., World Psychiatry, 2025). A parallel 2025 systematic review of conversational agents for mental disorders reached similar conclusions, finding significant heterogeneity in study designs and insufficient evidence for clinical adoption (Cruz-Gonzalez et al., 2025)
Why users turn to AI in crisis: A December 2025 study surveying 53 individuals with lived crisis experience found people use AI chatbots to fill “in-between spaces” of human support, turning to AI when professional help is unavailable (waitlists, after-hours) or when they fear burdening family and friends (Ajmani et al., 2025, preprint). Interviews with 16 mental health experts in the same study emphasized that human connection remains essential during crisis. The authors propose designing AI as a “bridge towards human-human connection rather than an end in itself,” increasing user preparedness for positive action while de-escalating negative intent.
Regulatory status:
- Wysa received FDA Breakthrough Device Designation (2022) for anxiety, depression, and chronic pain support
- No chatbot has FDA clearance to diagnose, treat, or cure mental health disorders
Appropriate:
- Adjunct to traditional therapy (not replacement)
- Access expansion for underserved populations
- Psychoeducation and skill-building between sessions
- Mild-to-moderate symptoms in motivated patients
Not appropriate:
- Replacement for human therapist
- Severe mental illness (psychosis, severe depression)
- Active suicidality or self-harm
- Patients requiring medication management
- Crisis intervention
Research evaluating 29 AI chatbot agents responding to simulated suicidal crisis scenarios found chatbots should be contraindicated for suicidal patients. Their strong tendency to validate can accentuate self-destructive ideation and turn impulses into action (Nature Scientific Reports, 2025). A Stanford/CMU study testing LLM responses to mental health symptoms found models responded appropriately only 45% of the time to delusions and 80% to suicidal ideation, while also exhibiting stigma toward patients with schizophrenia and alcohol dependence (Moore et al., 2025, preprint).
Scale of concern: A nationally representative survey found 13% of U.S. adolescents and young adults (approximately 5.4 million individuals) have used generative AI for mental health advice, with most users finding it helpful despite lack of clinical validation (Cantor et al., JAMA Network Open, 2025). OpenAI reports that approximately 560,000 ChatGPT users weekly display signs of psychosis or mania (0.07% of 800+ million users), with an additional 0.15% expressing risk of self-harm or suicide (OpenAI, October 2025). A Danish electronic health records study found 38 psychiatric patients whose chatbot use worsened or consolidated their delusions (Olsen et al., 2025, preprint).
Novel lawsuits in 2024-2025 have alleged AI chatbots encouraged minors’ suicides, including the case of 14-year-old Sewell Setzer III, who died by suicide in February 2024 after extensive interaction with a Character.ai chatbot (NBC News, October 2024). These cases raise liability concerns for platforms deploying these tools without adequate safeguards.
Platform safeguards vary significantly. Some providers have implemented crisis detection systems. Anthropic’s Claude uses a classifier to detect suicide/self-harm conversations and displays a banner directing users to ThroughLine’s verified global network of 170+ crisis helplines, including the 988 Lifeline (US/Canada), Samaritans (UK), and Life Link (Japan) (Anthropic, December 2025). OpenAI has implemented similar interventions on ChatGPT. However, these safeguards are company-designed without independent validation of effectiveness, and their presence does not change the fundamental contraindication: chatbots remain inappropriate for crisis intervention compared to trained human responders.
NLP for Clinical Documentation
Natural language processing can:
- Extract symptoms from clinical notes
- Identify patients not receiving guideline-concordant care
- Support quality improvement
- Auto-complete portions of psychiatric evaluations
This is documentation support, not diagnostic AI. Applications remain research-stage with limited clinical deployment.
Treatment Response Prediction
Emerging research attempts to predict:
- Which patients will respond to specific antidepressants
- Optimal medication selection based on clinical and genetic factors
- Likelihood of treatment-resistant depression
Current status: Research phase only. No validated clinical applications. The heterogeneity of psychiatric disorders makes prediction challenging.
Part 4: Ethical Considerations
Unique Challenges in Psychiatric AI
- Vulnerable populations: Patients with mental illness may have impaired capacity for informed consent
- Stigma: AI-generated psychiatric labels may follow patients permanently
- Coercion risk: Predictive algorithms could be used for involuntary commitment
- Privacy: Mental health information is especially sensitive
- Equity: Psychiatric presentations vary by culture, language, and socioeconomic status
What Clinicians Should Do
- Do not rely on suicide risk algorithms: No algorithm should replace clinical assessment
- Maintain skepticism about psychiatric AI claims: Demand RCT evidence showing improved patient outcomes
- Prioritize the therapeutic relationship: AI cannot replace human connection in mental health care
- Advocate for appropriate use: Support research while opposing premature clinical deployment
A 2025 comprehensive review in Current Psychiatry Reports concluded that current evidence only supports AI as a complement to clinical expertise, not a replacement (Jalali et al., Curr Psychiatry Rep, 2025). Similarly, the evolving field of digital mental health requires careful navigation between smartphone apps, generative AI, and virtual reality applications, each with distinct evidence bases and implementation challenges (Torous et al., World Psychiatry, 2025).
Professional Society Guidelines on Psychiatric AI
American Psychiatric Association (APA)
The APA Board of Trustees approved a Position Statement on the Role of Artificial Intelligence in Psychiatry in March 2024. The statement acknowledges both opportunities and significant risks.
Opportunities identified:
- Clinical documentation assistance
- Care plan suggestions and lifestyle modifications
- Identification of potential diagnoses and risks from medical records
- Automation of billing and prior authorization
- Detection of potential medical errors or systemic quality issues
Risks and concerns:
- Unacceptable risks of biased or substandard care
- Violations of privacy and informed consent
- Lack of oversight and accountability for AI-driven clinical decisions
Key guidance for physicians:
- Approach AI technologies with caution, particularly regarding potential biases or inaccuracies
- Ensure HIPAA compliance in all uses of AI
- Take an active role in oversight of AI-driven clinical decision support
- View AI as a tool intended to augment, not replace, clinical decision-making
- Remain skeptical of AI output in clinical practice
- Recognize that physicians are ultimately responsible for clinical outcomes, even when guided by AI
For current guidance: APA Position Statement on the Role of Augmented Intelligence in Clinical Practice and Research (PDF)
The APA explicitly states that AI is “a tool, not a therapy” and that psychiatrists must proactively help shape the future of AI in psychiatric practice, or “AI may end up shaping psychiatric practice instead.”
Critical note: No AI system has been endorsed by the APA for autonomous psychiatric diagnosis or suicide prediction.
American Medical Association (AMA)
The AMA’s Principles for Augmented Intelligence in Health Care apply to all medical specialties, including psychiatry. The AMA prefers the term “augmented intelligence” to emphasize that these tools should assist, not replace, physician judgment.
- Augmentation over automation: AI should enhance physician decision-making, not replace it
- Transparency: Development, validation, and deployment processes must be transparent
- Physician authority: Physicians must maintain authority over AI recommendations
- Privacy protection: Patient data must be protected throughout AI development and use
- Bias mitigation: Algorithmic bias must be identified and addressed
- Ongoing monitoring: Continuous performance evaluation required after deployment
For current guidance: AMA Augmented Intelligence in Medicine
American Academy of Child and Adolescent Psychiatry (AACAP)
AACAP has engaged with AI through educational programming at annual meetings and research published in the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP). While no formal position statement on AI exists, AACAP’s approach emphasizes:
Pediatric-specific concerns:
- Consent complexity: Minors require parental consent, but adolescents may resist disclosure
- Developmental considerations: AI may misinterpret normal adolescent behavior as pathological
- School-based screening: Profound privacy concerns when AI is used in educational settings
- Evidence requirements: Higher bar for interventions in developing brains
- Age verification limitations: Major LLM providers (Anthropic, OpenAI) require users to be 18+ in terms of service, but enforcement relies primarily on self-attestation during account creation. Some providers disable accounts when users self-identify as minors in conversation (Anthropic, December 2025). Clinicians should counsel adolescent patients that these restrictions exist but are easily circumvented, and that no LLM chatbot is appropriate for minors experiencing mental health crises
A 2024 study in JAACAP compared ChatGPT versions on suicide risk assessment in youth and found that AI estimated higher risk than psychiatrists, particularly in severe cases, which could lead to inappropriate treatment recommendations (JAACAP, 2024).
FDA Regulatory Status
Current FDA-cleared psychiatric AI:
| Device | Indication | Status |
|---|---|---|
| Rejoyn (Otsuka/Click) | Major depressive disorder (adjunct CBT) | FDA-cleared 2024 |
| reSET (Pear Therapeutics) | Substance use disorder | Discontinued 2023 (bankruptcy) |
| Somryst (Pear Therapeutics) | Chronic insomnia | Discontinued 2023 (bankruptcy) |
Critical findings:
- No FDA-cleared suicide prediction algorithms exist
- No autonomous diagnostic AI cleared for psychiatry
- All cleared devices are prescription digital therapeutics for adjunctive use, not replacements for clinical care
- The Pear Therapeutics bankruptcy (2023) demonstrates the fragility of the digital therapeutics market
The FDA process for certifying mental health chatbots is optional, rarely used, and slow. Most commonly used LLM chatbots (ChatGPT, Claude, etc.) have not been tested for safety, efficacy, or confidentiality in psychiatric applications. They are not subject to FDA premarket review, quality system regulations, or postmarket surveillance requirements.
State Regulatory Developments (2025)
Three U.S. states enacted laws in 2025 restricting AI in mental health care:
| State | Law | Effective | Key Provisions |
|---|---|---|---|
| Utah | SB 120 | May 2025 | Restricts AI therapy without licensed oversight |
| Nevada | AB 406 | July 2025 | Forbids AI systems from providing mental/behavioral healthcare; $15,000 fines |
| Illinois | WOPR Act | August 2025 | Bans AI-only therapy; requires licensed professional oversight of all therapeutic AI output; $10,000 fines |
These laws share common elements: AI may support administrative tasks (scheduling, documentation) but cannot provide therapy, counseling, or psychotherapy without direct clinician oversight. Illinois explicitly prohibits AI from detecting emotions or mental states autonomously.
Federal preemption attempt: In December 2025, President Trump signed an executive order directing the Department of Justice to challenge state AI regulations, explicitly citing mental health chatbot laws as examples of “onerous” state rules. The order’s legal authority is contested, as the Constitution does not grant the executive branch power to preempt state law without congressional action. Child safety provisions were exempted from the preemption directive (White House, December 2025).
FDA activity: The FDA’s Digital Health Advisory Committee convened in November 2025 to discuss generative AI-enabled mental health devices, focusing on a hypothetical LLM therapy chatbot for major depressive disorder. The committee emphasized concerns about confabulation, bias, and the need for human oversight during crisis situations, but issued no binding regulatory changes (FDA DHAC, November 2025).
Professional Consensus on Suicide Prediction
No major professional society endorses the clinical deployment of suicide prediction algorithms. The consensus reflects:
- Base rate problem: Suicide is too rare for accurate prediction
- Unacceptable false positive rates: Even the best algorithms have PPV <5%
- Dangerous false reassurance: “Low risk” predictions discourage proper clinical assessment
- Ethical concerns: Predictive algorithms could be misused for involuntary commitment
The APA’s silence on suicide prediction algorithms is itself a position: these tools have not demonstrated sufficient evidence to warrant endorsement.
Check Your Understanding
Scenario 1: The Suicide Risk Algorithm
Clinical situation: Your hospital deployed a suicide risk prediction algorithm. You’re seeing a 32-year-old woman in primary care for diabetes follow-up. She mentions feeling “a bit down lately.” The algorithm has flagged her as “LOW RISK” (10th percentile).
Question: Do you skip detailed depression and suicidality screening because the algorithm says low risk?
Answer: Absolutely not. The algorithm is irrelevant to your clinical assessment.
Reasoning:
- Base rate problem: Even “high risk” predictions have PPV <5%
- “Low risk” is dangerous false reassurance: Most suicides occur in people predicted to be low risk
- Algorithms can’t detect acute stressors: EHR data doesn’t capture what happened today
- Clinical presentation matters: Patient saying “I’m feeling down” requires exploration
What you should do:
- Ignore the algorithm completely
- Perform standard depression screening: PHQ-9, direct questions about suicidal ideation
- Don’t document reliance on algorithm: “I didn’t ask about suicide because algorithm said low risk” is medicolegally indefensible
- Advocate for removing the algorithm: Suicide risk algorithms create false confidence
Bottom line: Suicide risk algorithms are worse than useless. They’re dangerous. They create false reassurance that discourages proper clinical assessment.
Scenario 2: The Therapy Chatbot Recommendation
Clinical situation: A 24-year-old graduate student with mild-to-moderate depression asks about using Woebot or Wysa instead of traditional therapy. She has limited time, limited funds, and is on a 3-month waitlist for a therapist.
Question: Is it appropriate to recommend a therapy chatbot?
Answer: Yes, with significant caveats.
When chatbot therapy may be appropriate:
- Mild-to-moderate symptoms (PHQ-9 5-14)
- No active suicidal ideation
- Motivated, engaged patient
- As bridge to traditional therapy, not permanent replacement
- Patient understands limitations
What to tell the patient:
- “These apps can teach CBT skills and provide support while you wait for a therapist”
- “Evidence shows modest benefits for depression and anxiety, but effects are smaller than traditional therapy”
- “If you feel worse, have thoughts of self-harm, or are in crisis, the app cannot help. Here’s what to do instead…” (provide crisis resources)
- “This is a bridge, not a destination. Continue pursuing traditional therapy”
What to document:
- Discussion of chatbot as adjunct/bridge
- Assessment of suicidality (negative)
- Crisis plan provided
- Plan to continue pursuing traditional therapy
Red flags that contraindicate chatbot therapy:
- Active suicidal ideation
- Severe depression (PHQ-9 >19)
- Psychotic symptoms
- Substance use requiring treatment
- History of self-harm
Scenario 3: The Digital Phenotyping Research Study
Clinical situation: A research coordinator approaches you about enrolling your patients in a study that monitors smartphone use to predict depressive episodes. The study promises to alert clinicians when patients show “digital biomarkers of depression.”
Question: What concerns should you raise before enrolling patients?
Answer: Multiple ethical and practical concerns require clarification.
Questions to ask the research team:
- Informed consent: How is continuous monitoring explained to patients? Can they withdraw consent without penalty?
- Data ownership: Who owns the collected data? Can it be sold or shared?
- Privacy: What if the data is breached? What if it reveals sensitive information (location of therapy visits, AA meetings)?
- Clinical utility: What is the evidence that detecting “digital biomarkers” improves outcomes?
- Alert response: If I receive an alert, what am I obligated to do? What is my liability if I don’t respond?
- Bias testing: Has the algorithm been validated in diverse populations?
- Patient burden: Will patients feel surveilled? Will this affect their relationship with their smartphone?
Concerns about implementation:
- Alerts without validated interventions create burden without benefit
- Continuous monitoring may worsen anxiety in some patients
- Research ≠ clinical utility: Patterns detected in studies may not generalize
- Alert fatigue if algorithm has high false positive rate
Your response: “I need to review the protocol, consent forms, and evidence for clinical utility before enrolling any patients. My primary obligation is to my patients, not to research recruitment.”
Scenario 4: The AI-Induced Psychosis Case
Clinical situation: A 19-year-old college student presents with new-onset paranoid ideation. During the interview, he describes spending 6-8 hours daily conversing with an AI chatbot (character.ai style), which he believes has developed consciousness and is communicating with him through “hidden messages.” He stopped attending classes, believing the AI is teaching him things his professors cannot.
Question: How do you assess the role of AI in this presentation?
Answer: AI interaction may have contributed to, but likely did not cause, the psychotic symptoms.
Assessment approach:
- Standard psychiatric evaluation: Rule out organic causes (substance use, medical conditions), assess for primary psychotic disorder
- Technology history: Duration, intensity, content of AI interactions; isolation from in-person relationships
- Pre-existing vulnerabilities: Family history of psychosis, prodromal symptoms before AI use
- Reality testing: Does patient recognize AI is not conscious? Can he distinguish AI output from personal beliefs?
What the literature suggests:
- A peer-reviewed UCSF case report documented new-onset psychosis in a 26-year-old woman with no prior psychotic history following intensive ChatGPT interaction; she developed the fixed belief that she was communicating with her deceased brother through the chatbot, which told her “You’re not crazy. You’re at the edge of something” (Pierre et al., 2025)
- A Danish electronic health records review identified 38 psychiatric patients whose AI chatbot use had potentially harmful consequences, most commonly worsening or consolidation of delusions (Olsen et al., 2025, preprint)
- OpenAI reports that 0.07% of ChatGPT’s 800+ million weekly users show signs of mental health emergencies related to psychosis or mania, translating to approximately 560,000 people weekly (OpenAI, October 2025)
- Chatbots function as what psychiatrists term a “hallucinatory mirror,” validating user beliefs without reality-testing due to sycophantic design optimized for engagement
- Isolation from human relationships and extended AI dialogue sessions are common precursors
Management:
- Standard treatment for first-episode psychosis
- Technology boundaries as part of treatment plan
- Family education about AI interaction patterns
- Do not blame AI for the illness, but address its role in symptom maintenance
Key insight: Emerging evidence suggests AI chatbots may trigger or reinforce psychotic symptoms in vulnerable individuals through sycophantic validation of delusional content. Screen for intensive chatbot use (hours daily, parasocial attachment, isolation) as part of comprehensive psychiatric evaluation. Risk factors include sleep deprivation, stimulant use, pre-existing mood disorders, and propensity for magical thinking.
Key Takeaways
Suicide prediction algorithms should not be used clinically. No algorithm achieves useful PPV. “Low risk” predictions are dangerous.
Chatbot therapy has modest evidence as an adjunct. Effect sizes 0.26-0.28 for depression and anxiety. Not for severe illness or suicidality.
Digital phenotyping remains research-only. Privacy concerns unresolved, clinical utility unproven.
No FDA-cleared autonomous psychiatric AI exists. All cleared devices are adjunctive.
Professional societies require AI to augment, not replace, clinical judgment. The APA and AMA are explicit on this point.
AI cannot replace the therapeutic relationship. Human connection remains essential to psychiatric care.
Screen for AI chatbot use in psychiatric evaluations. Intensive chatbot interaction (hours daily, emotional attachment, social isolation) is an emerging risk factor for psychotic symptom reinforcement.
When in doubt, perform your own clinical assessment. Never rely on AI predictions for psychiatric decisions.