scholarly journals Machine learning and natural language processing in mental health: a systematic review (Preprint)

2019 ◽  
Author(s):  
Aziliz Le Glaz ◽  
Yannis Haralambous ◽  
Deok-Hee Kim-Dufor ◽  
Philippe Lenca ◽  
Romain Billot ◽  
...  

BACKGROUND Machine learning (ML) systems are parts of Artificial Intelligence (AI) that automatically learn models from data in order to make better decisions. Natural Language Processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. OBJECTIVE The primary aim of this systematic review is to summarize and characterize studies that used ML and NLP techniques for mental health, in methodological and technical terms. The secondary aim is to consider the interest of these methods in the mental health clinical practice. METHODS This systematic review follows the PRISMA guidelines and is registered on PROSPERO. The research was conducted on 4 medical databases (Pubmed, Scopus, ScienceDirect and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, mental disorder. The exclusion criteria are: languages other than English, anonymization process, case studies, conference papers and reviews. No limitations on publication dates were imposed. RESULTS 327 articles were identified, 269 were excluded, and 58 were included in the review. Results were organized through a qualitative perspective. Even though studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into three categories: patients included in medical databases, patients who came to the emergency room, and social-media users. The main objectives were symptom extraction, severity of illness classification, comparison of therapy effectiveness, psychopathological clues, and nosography challenging. Data from electronic medical records and that from social media were the two major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than "transparent” functioning classifiers. Python was the most frequently used platform. CONCLUSIONS ML and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new knowledge,. and one major category of the population, social-media users, is obviously an imprecise cohort. In addition, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, ML and NLP techniques provide useful information from unexplored data (i.e., patient’s daily habits that are usually inaccessible to care providers). This may be considered to be an additional tool at every step of mental health care: diagnosis, prognosis, treatment efficacy and monitoring. Therefore, ethical issues – like predicting psychiatric troubles or involvement in the physician-patient relationship – remain and should be discussed in a timely manner. ML and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. CLINICALTRIAL Number CRD42019107376

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 444
Author(s):  
Isuri Anuradha Nanomi Arachchige ◽  
Priyadharshany Sandanapitchai ◽  
Ruvan Weerasinghe

Depression is a common mental health disorder that affects an individual’s moods, thought processes and behaviours negatively, and disrupts one’s ability to function optimally. In most cases, people with depression try to hide their symptoms and refrain from obtaining professional help due to the stigma related to mental health. The digital footprint we all leave behind, particularly in online support forums, provides a window for clinicians to observe and assess such behaviour in order to make potential mental health diagnoses. Natural language processing (NLP) and Machine learning (ML) techniques are able to bridge the existing gaps in converting language to a machine-understandable format in order to facilitate this. Our objective is to undertake a systematic review of the literature on NLP and ML approaches used for depression identification on Online Support Forums (OSF). A systematic search was performed to identify articles that examined ML and NLP techniques to identify depression disorder from OSF. Articles were selected according to the PRISMA workflow. For the purpose of the review, 29 articles were selected and analysed. From this systematic review, we further analyse which combination of features extracted from NLP and ML techniques are effective and scalable for state-of-the-art Depression Identification. We conclude by addressing some open issues that currently limit real-world implementation of such systems and point to future directions to this end.


Author(s):  
Aziliz Le Glaz ◽  
Yannis Haralambous ◽  
Deok-Hee Kim-Dufor ◽  
Philippe Lenca ◽  
Romain Billot ◽  
...  

Author(s):  
Aziliz Le Glaz ◽  
Aziliz ['Christophe'] ◽  
Sofian Berrouiguet ◽  
Michel Walter ◽  
Michel ['Taylor']

2021 ◽  
Author(s):  
Arash Maghsoudi ◽  
Sara Nowakowski ◽  
Ritwick Agrawal ◽  
Amir Sharafkhaneh ◽  
Sadaf Aram ◽  
...  

BACKGROUND The COVID-19 pandemic has imposed additional stress on population health that may result in a higher incidence of insomnia. In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. OBJECTIVE In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. METHODS We designed a pre-post retrospective study using public social media content from Twitter. We categorized tweets based on time into two intervals: prepandemic (01/01/2019 to 01/01/2020) and pandemic (01/01/2020 to 01/01/2021). We used NLP to analyze polarity (positive/negative) and intensity of emotions and also users’ tweets psychological states in terms of sadness, anxiety and anger by counting the words related to these categories in each tweet. Additionally, we performed temporal analysis to examine the effect of time on the users’ insomnia experience. RESULTS We extracted 268,803 tweets containing the word insomnia (prepandemic, 123,293 and pandemic, 145,510). The odds of negative tweets (OR, 1.31; 95% CI, 1.29-1.33), anger (OR, 1.19; 95% CI, 1.16-1.21), and anxiety (OR, 1.24; 95% CI: 1.21-1.26) were higher during the pandemic compared to prepandemic. The likelihood of negative tweets after midnight was higher than for other daily intevals, comprising approximately 60% of all negative insomnia-related tweets in 2020 and 2021 collectively. CONCLUSIONS Twitter users shared more negative tweets about insomnia during the pandemic than during the year before. Also, more anger and anxiety-related content were disseminated during the pandemic on the social media platform. Future studies using an NLP framework could assess tweets about other psychological distress, habit changes, weight gain due to inactivity, and the effect of viral infection on sleep.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2021 ◽  
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL


2020 ◽  
Author(s):  
Daniel Mark Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.


Sign in / Sign up

Export Citation Format

Share Document