scholarly journals Machine learning and natural language processing in mental health: a systematic review

Author(s):  
Aziliz Le Glaz ◽  
Aziliz ['Christophe'] ◽  
Sofian Berrouiguet ◽  
Michel Walter ◽  
Michel ['Taylor']
Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 444
Author(s):  
Isuri Anuradha Nanomi Arachchige ◽  
Priyadharshany Sandanapitchai ◽  
Ruvan Weerasinghe

Depression is a common mental health disorder that affects an individual’s moods, thought processes and behaviours negatively, and disrupts one’s ability to function optimally. In most cases, people with depression try to hide their symptoms and refrain from obtaining professional help due to the stigma related to mental health. The digital footprint we all leave behind, particularly in online support forums, provides a window for clinicians to observe and assess such behaviour in order to make potential mental health diagnoses. Natural language processing (NLP) and Machine learning (ML) techniques are able to bridge the existing gaps in converting language to a machine-understandable format in order to facilitate this. Our objective is to undertake a systematic review of the literature on NLP and ML approaches used for depression identification on Online Support Forums (OSF). A systematic search was performed to identify articles that examined ML and NLP techniques to identify depression disorder from OSF. Articles were selected according to the PRISMA workflow. For the purpose of the review, 29 articles were selected and analysed. From this systematic review, we further analyse which combination of features extracted from NLP and ML techniques are effective and scalable for state-of-the-art Depression Identification. We conclude by addressing some open issues that currently limit real-world implementation of such systems and point to future directions to this end.


2019 ◽  
Author(s):  
Aziliz Le Glaz ◽  
Yannis Haralambous ◽  
Deok-Hee Kim-Dufor ◽  
Philippe Lenca ◽  
Romain Billot ◽  
...  

BACKGROUND Machine learning (ML) systems are parts of Artificial Intelligence (AI) that automatically learn models from data in order to make better decisions. Natural Language Processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. OBJECTIVE The primary aim of this systematic review is to summarize and characterize studies that used ML and NLP techniques for mental health, in methodological and technical terms. The secondary aim is to consider the interest of these methods in the mental health clinical practice. METHODS This systematic review follows the PRISMA guidelines and is registered on PROSPERO. The research was conducted on 4 medical databases (Pubmed, Scopus, ScienceDirect and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, mental disorder. The exclusion criteria are: languages other than English, anonymization process, case studies, conference papers and reviews. No limitations on publication dates were imposed. RESULTS 327 articles were identified, 269 were excluded, and 58 were included in the review. Results were organized through a qualitative perspective. Even though studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into three categories: patients included in medical databases, patients who came to the emergency room, and social-media users. The main objectives were symptom extraction, severity of illness classification, comparison of therapy effectiveness, psychopathological clues, and nosography challenging. Data from electronic medical records and that from social media were the two major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than "transparent” functioning classifiers. Python was the most frequently used platform. CONCLUSIONS ML and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new knowledge,. and one major category of the population, social-media users, is obviously an imprecise cohort. In addition, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, ML and NLP techniques provide useful information from unexplored data (i.e., patient’s daily habits that are usually inaccessible to care providers). This may be considered to be an additional tool at every step of mental health care: diagnosis, prognosis, treatment efficacy and monitoring. Therefore, ethical issues – like predicting psychiatric troubles or involvement in the physician-patient relationship – remain and should be discussed in a timely manner. ML and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. CLINICALTRIAL Number CRD42019107376


Author(s):  
Aziliz Le Glaz ◽  
Yannis Haralambous ◽  
Deok-Hee Kim-Dufor ◽  
Philippe Lenca ◽  
Romain Billot ◽  
...  

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2020 ◽  
Author(s):  
Daniel Mark Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.


2010 ◽  
Vol 3 ◽  
pp. BII.S4706 ◽  
Author(s):  
John Pestian ◽  
Henry Nasrallah ◽  
Pawel Matykiewicz ◽  
Aurora Bennett ◽  
Antoon Leenaars

Suicide is the second leading cause of death among 25–34 year olds and the third leading cause of death among 15–25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient's thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.


10.2196/22635 ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. e22635 ◽  
Author(s):  
Daniel M Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

Background The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods We created and released the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyzed trends from 90 text-derived features such as sentiment analysis, personal pronouns, and semantic categories. Using supervised machine learning, we classified posts into their respective support groups and interpreted important features to understand how different problems manifest in language. We applied unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discovered that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ=–0.96, P<.001). Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic. Specifically, the support groups for borderline personality disorder and posttraumatic stress disorder became significantly associated with the suicidality cluster. Furthermore, clusters surrounding self-harm and entertainment emerged. Conclusions By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.


2020 ◽  
Author(s):  
Daniel M Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

BACKGROUND The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. OBJECTIVE The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. METHODS We created and released the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyzed trends from 90 text-derived features such as sentiment analysis, personal pronouns, and semantic categories. Using supervised machine learning, we classified posts into their respective support groups and interpreted important features to understand how different problems manifest in language. We applied unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. RESULTS We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discovered that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ=–0.96, <i>P</i>&lt;.001). Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic. Specifically, the support groups for borderline personality disorder and posttraumatic stress disorder became significantly associated with the suicidality cluster. Furthermore, clusters surrounding self-harm and entertainment emerged. CONCLUSIONS By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.


2021 ◽  
pp. 277-289
Author(s):  
Quinlan D. Buchlak ◽  
Nazanin Esmaili ◽  
Christine Bennett ◽  
Farrokh Farrokhi

Sign in / Sign up

Export Citation Format

Share Document