Natural Language Processing Insights into LGBTQ+ Youth Mental Health during the COVID-19 Pandemic: Longitudinal Analysis of r/LGBTeens Micro-Community Reveals Increased Anxiety in Topics and Trends (Preprint)

BACKGROUND Widespread fear surrounding COVID-19, coupled with the extreme physical and social distancing orders, has caused severe negative mental health outcomes. Yet little is known about how the COVID-19 pandemic is impacting LGBTQ+ youth, who experienced disproportionately high adverse mental health outcomes prior to the COVID-19 pandemic. This study aims to address this knowledge gap. OBJECTIVE This work aims to harness natural language processing (NLP) methodologies to investigate the evolution of conversation topics in the most popular subreddit for LGBTQ+ youth. METHODS We generated a dataset of all r/LGBTeens subreddit posts made between Jan 1, 2020 to Feb 1, 2021. We analyzed meaningful trends in anxiety, anger, and sadness in posts. Since the distribution of anxiety before widespread social distancing orders was meaningfully different from the distribution after (P < .001), we employed Latent Dirichlet Allocation (LDA) to examine topics provoking this shift in anxiety. RESULTS While the present study did not find differences in LGBTQ+ youth anger and sadness, results revealed that anxiety increased significantly during social distancing measures compared to before lockdown (P < .001). Further analysis revealed a list of 10 anxiety-provoking topics discussed during the pandemic: attraction to a friend, coming out, coming out to family, discrimination, education, exploring sexuality, gender pronouns, love/relationship advice, starting a new relationship, and struggling with mental health. CONCLUSIONS Conversation topics related to coming-out, gender and sexual identities, discrimination, and relationships were anxiety provoking for LGBTQ+ youth, both before and after the pandemic. The frequency of these conversations increased with lifestyle disruptors related to the pandemic, reflecting LGBTQ+ teens' increased reliance on anonymous discussion forums as safe spaces for discussing lifestyle stressors during COVID-19 lifestyle disruptions (e.g., school closures).

Download Full-text

Natural Language Processing Insights into LGBTQ+ Youth Mental Health during the COVID-19 Pandemic: Longitudinal Analysis of r/LGBTeens Microcommunity Reveals Increased Anxiety in Topics and Trends (Preprint)

JMIR Public Health and Surveillance ◽

10.2196/29029 ◽

2021 ◽

Author(s):

Hannah Stevens ◽

Irena Acic ◽

Sofia Rhea

Keyword(s):

Mental Health ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Longitudinal Analysis ◽

Youth Mental Health ◽

Lgbtq Youth

Download Full-text

Using Natural Language Processing to Assess the Psychological Effect of COVID-19 Pandemic on Insomnia via Tweets: A Pre-Post Retrospective Pilot Study (Preprint)

10.2196/preprints.33454 ◽

2021 ◽

Author(s):

Arash Maghsoudi ◽

Sara Nowakowski ◽

Ritwick Agrawal ◽

Amir Sharafkhaneh ◽

Sadaf Aram ◽

...

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Temporal Analysis ◽

Health Condition ◽

Additional Stress ◽

Mental Health Condition ◽

Twitter Users

BACKGROUND The COVID-19 pandemic has imposed additional stress on population health that may result in a higher incidence of insomnia. In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. OBJECTIVE In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. METHODS We designed a pre-post retrospective study using public social media content from Twitter. We categorized tweets based on time into two intervals: prepandemic (01/01/2019 to 01/01/2020) and pandemic (01/01/2020 to 01/01/2021). We used NLP to analyze polarity (positive/negative) and intensity of emotions and also users’ tweets psychological states in terms of sadness, anxiety and anger by counting the words related to these categories in each tweet. Additionally, we performed temporal analysis to examine the effect of time on the users’ insomnia experience. RESULTS We extracted 268,803 tweets containing the word insomnia (prepandemic, 123,293 and pandemic, 145,510). The odds of negative tweets (OR, 1.31; 95% CI, 1.29-1.33), anger (OR, 1.19; 95% CI, 1.16-1.21), and anxiety (OR, 1.24; 95% CI: 1.21-1.26) were higher during the pandemic compared to prepandemic. The likelihood of negative tweets after midnight was higher than for other daily intevals, comprising approximately 60% of all negative insomnia-related tweets in 2020 and 2021 collectively. CONCLUSIONS Twitter users shared more negative tweets about insomnia during the pandemic than during the year before. Also, more anger and anxiety-related content were disseminated during the pandemic on the social media platform. Future studies using an NLP framework could assess tweets about other psychological distress, habit changes, weight gain due to inactivity, and the effect of viral infection on sleep.

Download Full-text

A Pipeline to Understand Emerging Illness Via Social Media Data Analysis: Case Study on Breast Implant Illness (Preprint)

10.2196/preprints.29768 ◽

2021 ◽

Author(s):

Vishal Dey ◽

Peter Krasniak ◽

Minh Nguyen ◽

Clara Lee ◽

Xia Ning

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Language Processing ◽

Breast Implant ◽

Public Attention ◽

Social Media Data ◽

Media Data

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL

Download Full-text

A Proof of Concept for Assessing Emergency Room Use with Primary Care Data and Natural Language Processing

Methods of Information in Medicine ◽

10.3414/me12-01-0012 ◽

2013 ◽

Vol 52 (01) ◽

pp. 33-42 ◽

Cited By ~ 11

Author(s):

M.-H. Kuo ◽

P. Gooch ◽

J. St-Maurice

Keyword(s):

Mental Health ◽

Primary Care ◽

Natural Language Processing ◽

Natural Language ◽

Emergency Room ◽

Language Processing ◽

Free Text ◽

Proof Of Concept ◽

Term Extraction ◽

Primary Care Data

SummaryObjective: The objective of this study was to undertake a proof of concept that demonstrated the use of primary care data and natural language processing and term extraction to assess emergency room use. The study extracted biopsychosocial concepts from primary care free text and related them to inappropriate emergency room use through the use of odds ratios.Methods: De-identified free text notes were extracted from a primary care clinic in Guelph, Ontario and analyzed with a software toolkit that incorporated General Architecture for Text Engineering (GATE) and MetaMap components for natural language processing and term extraction.Results: Over 10 million concepts were extracted from 13,836 patient records. Codes found in at least 1% percent of the sample were regressed against inappropriate emergency room use. 77 codes fell within the realm of biopsychosocial, were very statistically significant (p < 0.001) and had an OR > 2.0. Thematically, these codes involved mental health and pain related concepts.Conclusions: Analyzed thematically, mental health issues and pain are important themes; we have concluded that pain and mental health problems are primary drivers for inappropriate emergency room use. Age and sex were not significant. This proof of concept demonstrates the feasibly of combining natural language processing and primary care data to analyze a system use question. As a first work it supports further research and could be applied to investigate other, more complex problems.

Download Full-text

Bullying in Schools and LGBTQ+ Youth Mental Health: Relations with Voting for Trump

10.31219/osf.io/s2q4w ◽

2021 ◽

Author(s):

Steven Hobaica ◽

Paul Kwon ◽

Shari Reiter ◽

Aaron Aguilar-Bonnette ◽

Walter Scott ◽

...

Keyword(s):

Mental Health ◽

Psychological Distress ◽

School District ◽

Health Outcomes ◽

Mental Health Outcomes ◽

Lgbtq Youth ◽

Teacher Intervention ◽

Youth Survey ◽

Lgbtq Students ◽

Bullying Experiences

The current study utilized the 2018 Washington State Healthy Youth Survey to explore the relations among school district political attitudes, bullying experiences, and mental health outcomes, particularly for LGBTQ+ students. Although bullying was associated with greater psychological distress (i.e., anxiety, depression, and suicidality) for all students, LGBTQ+ students experienced more bullying and psychological distress. Bullying experiences mediated the relation between LGBTQ+ identity and psychological distress. However, school district voting record moderated the relation between LGBTQ+ identity and bullying, such that LGBTQ+ students in more conservative districts, or districts with more votes for Donald Trump in the 2016 election, experienced more bullying, which was associated with greater psychological distress. Additionally, increased teacher intervention during instances of bullying was related to less bullying for LGBTQ+ students. Finally, in more conservative-leaning districts, LGBTQ+ students reported less teacher intervention, which was associated with more bullying and psychological distress. Given that political conservatism was related to higher rates of bullying and poorer mental health outcomes for LGBTQ+ students, we recommend improving school-based LGBTQ+ bullying policies to prioritize the mental health of LGBTQ+ youth.

Download Full-text

Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: an observational study

10.31234/osf.io/xvwcy ◽

2020 ◽

Author(s):

Daniel Mark Low ◽

Laurie Rumker ◽

Tanya Talkar ◽

John Torous ◽

Guillermo Cecchi ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Support Groups ◽

Language Processing ◽

Mental Health Problems ◽

Health Anxiety ◽

Mental Health Support ◽

Health Support

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.

Download Full-text

An Exploration of Impact of COVID 19 on mental health -Analysis of tweets using Natural Language Processing techniques

10.1101/2020.07.30.20165571 ◽

2020 ◽

Author(s):

Sohini Sengupta ◽

Sareeta Mugde ◽

Garima Sharma

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Gold Mine ◽

Social Media Platforms ◽

Processing Techniques

Twitter is one of the world's biggest social media platforms for hosting abundant number of user-generated posts. It is considered as a gold mine of data. Majority of the tweets are public and thereby pullable unlike other social media platforms. In this paper we are analyzing the topics related to mental health that are recently (June, 2020) been discussed on Twitter. Also amidst the on-going pandemic, we are going to find out if covid-19 emerges as one of the factors impacting mental health. Further we are going to do an overall sentiment analysis to better understand the emotions of users.

Download Full-text

Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s

Revista de Estudos da Linguagem ◽

10.17851/2237-2083.23.3.695-726 ◽

2015 ◽

Vol 23 (3) ◽

pp. 695 ◽

Cited By ~ 1

Author(s):

Arnaldo Candido Junior ◽

Célia Magalhães ◽

Helena Caseli ◽

Régis Zangirolami

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Keyword Extraction ◽

Processing Methods ◽

Dirichlet Allocation

Este artigo tem o objetivo da avaliar a aplicação de dois métodos automáticos eficientes na extração de palavras-chave, usados pelas comunidades da Linguística de Corpus e do Processamento da Língua Natural para gerar palavras-chave de textos literários: o WordSmith Tools e o Latent Dirichlet Allocation (LDA). As duas ferramentas escolhidas para este trabalho têm suas especificidades e técnicas diferentes de extração, o que nos levou a uma análise orientada para a sua performance. Objetivamos entender, então, como cada método funciona e avaliar sua aplicação em textos literários. Para esse fim, usamos análise humana, com conhecimento do campo dos textos usados. O método LDA foi usado para extrair palavras-chave por meio de sua integração com o Portal Min@s: Corpora de Fala e Escrita, um sistema geral de processamento de corpora, concebido para diferentes pesquisas de Linguística de Corpus. Os resultados do experimento confirmam a eficácia do WordSmith Tools e do LDA na extração de palavras-chave de um corpus literário, além de apontar que é necessária a análise humana das listas em um estágio anterior aos experimentos para complementar a lista gerada automaticamente, cruzando os resultados do WordSmith Tools e do LDA. Também indicam que a intuição linguística do analista humano sobre as listas geradas separadamente pelos dois métodos usados neste estudo foi mais favorável ao uso da lista de palavras-chave do WordSmith Tools.

Download Full-text

Latent Dirichlet Allocation in predicting clinical trial terminations

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0973-y ◽

2019 ◽

Vol 19 (1) ◽

Author(s):

Simon Geletta ◽

Lendie Follett ◽

Marcia Laugerman

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Structured Data ◽

Unstructured Data ◽

Future Research ◽

Funding Agencies ◽

Dirichlet Allocation

Abstract Background This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures. Method We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data. Results In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone. Conclusions Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Download Full-text