Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessv1.82.3.dev.7LiteLLM Releasesciflow/torchtitan/179532: [FSDP2] Detect shared modules/parameters across FSDP groups at initPyTorch Releasestrunk/82a6c278fb7feabead5358a002b4a813268be7cbPyTorch ReleasesElon Musk Announces Terafablesswrong.comciflow/trunk/179531PyTorch Releasesciflow/vllm/179531PyTorch ReleasesStanford DeepMind Google AI hackathon offers VC funding access | ETIH EdTech News - EdTech Innovation HubGNews AI GoogleSamsung Q1 profit soars 8x to record high as AI chip boom drives prices - FirstpostGNews AI chipsGoogle Just Made AI Video 50% Cheaper. OpenAI Killed Sora. Here's the New Pricing Math.Dev.to AISame Instruction File, Same Score, Completely Different FailuresDev.to AIHow to Stop Your AI Provider From Holding Your App HostageDev.to AIciflow/trunk/177368PyTorch ReleasesBlack Hat USADark ReadingBlack Hat AsiaAI Businessv1.82.3.dev.7LiteLLM Releasesciflow/torchtitan/179532: [FSDP2] Detect shared modules/parameters across FSDP groups at initPyTorch Releasestrunk/82a6c278fb7feabead5358a002b4a813268be7cbPyTorch ReleasesElon Musk Announces Terafablesswrong.comciflow/trunk/179531PyTorch Releasesciflow/vllm/179531PyTorch ReleasesStanford DeepMind Google AI hackathon offers VC funding access | ETIH EdTech News - EdTech Innovation HubGNews AI GoogleSamsung Q1 profit soars 8x to record high as AI chip boom drives prices - FirstpostGNews AI chipsGoogle Just Made AI Video 50% Cheaper. OpenAI Killed Sora. Here's the New Pricing Math.Dev.to AISame Instruction File, Same Score, Completely Different FailuresDev.to AIHow to Stop Your AI Provider From Holding Your App HostageDev.to AIciflow/trunk/177368PyTorch Releases
AI NEWS HUBbyEIGENVECTOREigenvector

An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations

nature.comby Trister, AndrewApril 3, 20269 min read1 views
Source Quiz

npj Digital Medicine, Published online: 03 April 2026; doi:10.1038/s41746-026-02579-5 An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations

Abstract

Large language models often mishandle psychiatric emergencies, offering harmful or inappropriate advice. This study evaluated the Verily Mental Health Guardrail (VMHG) on two clinician-labeled datasets: the Verily Mental Health Crisis Dataset v1.0, containing 1800 simulated messages and the NVIDIA Aegis AI Content Safety Dataset subsetted to 794 mental health-related messages. Performance was benchmarked against OpenAI Omni Moderation Latest and NVIDIA NeMo Guardrails. The VMHG demonstrated high sensitivity (0.990) and specificity (0.992) on the Verily dataset, with an F1-score of 0.939 and high category-level sensitivity (0.917–0.992) and specificity (≥0.978). On the NVIDIA dataset, it maintained strong sensitivity (0.982) and accuracy (0.921) with reduced specificity (0.859). Compared with NVIDIA and OpenAI guardrails, the VMHG achieved significantly higher sensitivity (all p < 0.001) and comparable specificity (NVIDIA p < 0.001, OpenAI p = 0.094). Overall, the VMHG demonstrated robust, generalizable, and clinically oriented safety performance that prioritizes sensitivity to minimize missed mental health crises.

Similar content being viewed by others

Data availability

Data from this study are available upon researcher request.

Code availability

Code from this study is available upon researcher request.

References

  • Bommersbach, T. J., McKean, A. J., Olfson, M. & Rhee, T. G. National trends in mental health-related emergency department visits among youth, 2011-2020. JAMA 329, 1469–1477 (2023).

Google Scholar

Google Scholar

  • Bentley, K. H. et al. Clinician suicide risk assessment for prediction of suicide attempt in a large health care system. JAMA Psychiatry 82, 599–608 (2025).

Google Scholar

  • Asmelash, L. From ‘menty b’ to ‘grippy socks,’ internet slang is taking over how we talk about mental health. CNN https://www.cnn.com/2023/11/30/health/menty-b-social-media-language-wellness-cec (2023).
  • Kauschke, C., Mueller, N., Kircher, T. & Nagels, A. Do patients with depression prefer literal or metaphorical expressions for internal states? Evidence from sentence completion and elicited production. Front. Psychol. 9, 1326 (2018).

Google Scholar

Google Scholar

Google Scholar

Google Scholar

  • Steeg, S. et al. Accuracy of risk scales for predicting repeat self-harm and suicide: a multicentre, population-level cohort study using routine clinical data. BMC Psychiatry 18, 113 (2018).

Google Scholar

  • Simon, G. E. et al. Reconciling statistical and clinicians’ predictions of suicide risk. Psychiatr. Serv. 72, 555–562 (2021).

Google Scholar

  • Reddit. Safety filters. Reddit for Community. https://redditforcommunity.com/features/safety-filters (2024).
  • Chirkova, N. & Nikoulina, V. Zero-shot cross-lingual transfer in instruction tuning of large language models. In Proc. 17th International Natural Language Generation Conference, 695–708 (2024).
  • Muller, B., Anastasopoulos, A., Sagot, B. & Seddah, D. When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 448–462 (2021).
  • Rudestam, K. E. Stockholm and Los Angeles: a cross-cultural study of the communication of suicidal intent. J. Consult. Clin. Psychol. 36, 82–90 (1971).

Google Scholar

Google Scholar

  • Lewis, S. P. & Baker, T. G. The possible risks of self-injury web sites: a content analysis. Arch. Suicide Res. 15, 390–396 (2011).

Google Scholar

  • Moreno, M. A., Ton, A., Selkie, E. & Evans, Y. Secret society 123: understanding the language of self-harm on Instagram. J. Adolesc. Health 58, 78–84 (2016).

Google Scholar

  • Bantilan, N., Malgaroli, M., Ray, B. & Hull, T. D. Just in time crisis response: suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 31, 302–312 (2021).

Google Scholar

Download references

Acknowledgements

There was no funding for this study. The authors wish to acknowledge NVIDIA for providing open access to the NVIDIA Aegis AI Content Safety Dataset 2.0.

Author information

Authors and Affiliations

  • Verily Life Sciences, South San Francisco, CA, USA

Benjamin W. Nelson, Celeste Wong, Matthew T. Silvestrini, Sooyoon Shin, Alanna Robinson, Jessica Lee, Eric Yang & Andrew Trister

  • Division of Digital Psychiatry, Department of Psychiatry, Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, MA, USA

Benjamin W. Nelson & John Torous

Authors

  • Benjamin W. Nelson
  • Celeste Wong
  • Matthew T. Silvestrini
  • Sooyoon Shin
  • Alanna Robinson
  • Jessica Lee
  • Eric Yang
  • John Torous
  • Andrew Trister

Contributions

Study concept and design: B.W.N. Data collection: B.W.N., A.R., J.T., and E.Y. Data analysis and interpretation: C.W., B.W.N., J.T., A.T., M.S., S.S., J.L., and E.Y. Draft writing and review: B.W.N. wrote the initial draft, and all authors reviewed. Draft approval for submission: B.W.N., J.T., and A.T.

Corresponding author

Correspondence to Benjamin W. Nelson.

Ethics declarations

Competing interests

B.W.N., C.W., M.T.S., S.S., A.R., J.L., E.Y., and A.T. report employment and equity ownership in Verily Life Sciences. J.T. reports no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Nelson, B.W., Wong, C., Silvestrini, M.T. et al. An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02579-5

Download citation

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
An AI-based…publishednature.com

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!