Analysis of mental and physical disorders associated with COVID-19 in online health forums: a natural language processing study

ObjectivesOnline health forums provide rich and untapped real-time data on population health. Through novel data extraction and natural language processing (NLP) techniques, we characterise the evolution of mental and physical health concerns relating to the COVID-19 pandemic among online health forum users.Setting and designWe obtained data from three leading online health forums: HealthBoards, Inspire and HealthUnlocked, from the period 1 January 2020 to 31 May 2020. Using NLP, we analysed the content of posts related to COVID-19.Primary outcome measures(1) Proportion of forum posts containing COVID-19 keywords; (2) proportion of forum users making their very first post about COVID-19; (3) proportion of COVID-19-related posts containing content related to physical and mental health comorbidities.ResultsData from 739 434 posts created by 53 134 unique users were analysed. A total of 35 581 posts (4.8%) contained a COVID-19 keyword. Posts discussing COVID-19 and related comorbid disorders spiked in early March to mid-March around the time of global implementation of lockdowns prompting a large number of users to post on online health forums for the first time. Over a quarter of COVID-19-related thread titles mentioned a physical or mental health comorbidity.ConclusionsWe demonstrate that it is feasible to characterise the content of online health forum user posts regarding COVID-19 and measure changes over time. The pandemic and corresponding public response has had a significant impact on posters’ queries regarding mental health. Social media data sources such as online health forums can be harnessed to strengthen population-level mental health surveillance.

Download Full-text

Investigating mental and physical disorders associated with COVID-19 in online health forums

10.1101/2020.12.14.20248155 ◽

2020 ◽

Author(s):

Rashmi Patel ◽

Fabrizio Smeraldi ◽

Maryam Abdollahyan ◽

Jessica Irving ◽

Conrad Bessant

Keyword(s):

Mental Health ◽

Language Processing ◽

Data Extraction ◽

Population Level ◽

Physical And Mental Health ◽

Time Data ◽

Public Response ◽

Physical Disorders ◽

Changes Over Time ◽

First Time

Objectives: Online health forums provide rich and untapped real-time data on population health. Through novel data extraction and natural language processing (NLP) techniques, we characterise the evolution of mental and physical health concerns relating to the COVID-19 pandemic among online health forum users. Setting and design: We obtained data from 739,434 posts by 53,134 unique users of three leading online health forums: HealthBoards, Inspire and HealthUnlocked, from the period 1st January 2020 to 31st May 2020. Using NLP, we analysed the content of posts related to COVID-19. Primary outcome measures: (i) Proportion of forum posts containing COVID-19 keywords (ii) Proportion of forum users making their very first post about COVID-19 (iii) Number of COVID-19 related posts containing content related to physical and mental health comorbidities Results: Posts discussing COVID-19 and related comorbid disorders spiked in early- to mid-March around the time of global implementation of lockdowns prompting a large number of users to post on online health forums for the first time. The pandemic and corresponding public response has had a significant impact on posters' queries regarding mental health. Conclusions: We demonstrate it is feasible to characterise the content of online health forum user posts regarding COVID-19 and measure changes over time. Social media data sources such as online health forums can be harnessed to strengthen population-level mental health surveillance.

Download Full-text

Using Natural Language Processing to Assess the Psychological Effect of COVID-19 Pandemic on Insomnia via Tweets: A Pre-Post Retrospective Pilot Study (Preprint)

10.2196/preprints.33454 ◽

2021 ◽

Author(s):

Arash Maghsoudi ◽

Sara Nowakowski ◽

Ritwick Agrawal ◽

Amir Sharafkhaneh ◽

Sadaf Aram ◽

...

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Temporal Analysis ◽

Health Condition ◽

Additional Stress ◽

Mental Health Condition ◽

Twitter Users

BACKGROUND The COVID-19 pandemic has imposed additional stress on population health that may result in a higher incidence of insomnia. In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. OBJECTIVE In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. METHODS We designed a pre-post retrospective study using public social media content from Twitter. We categorized tweets based on time into two intervals: prepandemic (01/01/2019 to 01/01/2020) and pandemic (01/01/2020 to 01/01/2021). We used NLP to analyze polarity (positive/negative) and intensity of emotions and also users’ tweets psychological states in terms of sadness, anxiety and anger by counting the words related to these categories in each tweet. Additionally, we performed temporal analysis to examine the effect of time on the users’ insomnia experience. RESULTS We extracted 268,803 tweets containing the word insomnia (prepandemic, 123,293 and pandemic, 145,510). The odds of negative tweets (OR, 1.31; 95% CI, 1.29-1.33), anger (OR, 1.19; 95% CI, 1.16-1.21), and anxiety (OR, 1.24; 95% CI: 1.21-1.26) were higher during the pandemic compared to prepandemic. The likelihood of negative tweets after midnight was higher than for other daily intevals, comprising approximately 60% of all negative insomnia-related tweets in 2020 and 2021 collectively. CONCLUSIONS Twitter users shared more negative tweets about insomnia during the pandemic than during the year before. Also, more anger and anxiety-related content were disseminated during the pandemic on the social media platform. Future studies using an NLP framework could assess tweets about other psychological distress, habit changes, weight gain due to inactivity, and the effect of viral infection on sleep.

Download Full-text

A Pipeline to Understand Emerging Illness Via Social Media Data Analysis: Case Study on Breast Implant Illness (Preprint)

10.2196/preprints.29768 ◽

2021 ◽

Author(s):

Vishal Dey ◽

Peter Krasniak ◽

Minh Nguyen ◽

Clara Lee ◽

Xia Ning

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Language Processing ◽

Breast Implant ◽

Public Attention ◽

Social Media Data ◽

Media Data

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL

Download Full-text

Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse

10.1101/2020.08.28.20183863 ◽

2020 ◽

Cited By ~ 1

Author(s):

Abraham Sanders ◽

Rachael White ◽

Lauren Severson ◽

Rufeng Ma ◽

Richard McQueen ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Public Attitudes ◽

Online Activity ◽

Health Crisis ◽

Public Response ◽

Health Community ◽

High Level

In this exploratory study, we scrutinize a database of over 1 million tweets collected across the first five months of 2020 to draw conclusions about public attitudes towards the preventative measure of mask usage during the COVID-19 pandemic. In recent months, a body of literature has emerged to suggest the robustness of trends in online activity as proxies for the epidemiological and sociological impact of COVID-19. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for individual clusters through automatic text summarization. We find that topic clustering and visualization based on mask-related Twitter data offers revealing insights into societal perceptions of COVID-19 and techniques for its prevention. We observe that the volume and polarity of mask related tweets has greatly increased. Importantly, the analysis pipeline presented can be leveraged by the health community for the assessment of public response to health interventions in the ongoing global health crisis.

Download Full-text

Natural Language Processing-Based Information Extraction and Abstraction for Lease Documents

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch011 ◽

2020 ◽

pp. 170-187

Author(s):

Sumathi S. ◽

Rajkumar S. ◽

Indumathi S.

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Data Extraction ◽

Easy Access ◽

Property A ◽

Key Events

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.

Download Full-text

A Proof of Concept for Assessing Emergency Room Use with Primary Care Data and Natural Language Processing

Methods of Information in Medicine ◽

10.3414/me12-01-0012 ◽

2013 ◽

Vol 52 (01) ◽

pp. 33-42 ◽

Cited By ~ 11

Author(s):

M.-H. Kuo ◽

P. Gooch ◽

J. St-Maurice

Keyword(s):

Mental Health ◽

Primary Care ◽

Natural Language Processing ◽

Natural Language ◽

Emergency Room ◽

Language Processing ◽

Free Text ◽

Proof Of Concept ◽

Term Extraction ◽

Primary Care Data

SummaryObjective: The objective of this study was to undertake a proof of concept that demonstrated the use of primary care data and natural language processing and term extraction to assess emergency room use. The study extracted biopsychosocial concepts from primary care free text and related them to inappropriate emergency room use through the use of odds ratios.Methods: De-identified free text notes were extracted from a primary care clinic in Guelph, Ontario and analyzed with a software toolkit that incorporated General Architecture for Text Engineering (GATE) and MetaMap components for natural language processing and term extraction.Results: Over 10 million concepts were extracted from 13,836 patient records. Codes found in at least 1% percent of the sample were regressed against inappropriate emergency room use. 77 codes fell within the realm of biopsychosocial, were very statistically significant (p < 0.001) and had an OR > 2.0. Thematically, these codes involved mental health and pain related concepts.Conclusions: Analyzed thematically, mental health issues and pain are important themes; we have concluded that pain and mental health problems are primary drivers for inappropriate emergency room use. Age and sex were not significant. This proof of concept demonstrates the feasibly of combining natural language processing and primary care data to analyze a system use question. As a first work it supports further research and could be applied to investigate other, more complex problems.

Download Full-text

A Natural Language Processing Tool for Large-Scale Data Extraction from Echocardiography Reports

PLoS ONE ◽

10.1371/journal.pone.0153749 ◽

2016 ◽

Vol 11 (4) ◽

pp. e0153749 ◽

Cited By ~ 20

Author(s):

Chinmoy Nath ◽

Mazen S. Albaghdadi ◽

Siddhartha R. Jonnalagadda

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Data Extraction ◽

Large Scale Data ◽

Natural Language Processing Tool ◽

Scale Data

Download Full-text

How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

BioMed Research International ◽

10.1155/2018/6217812 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

J. Bouaziz ◽

R. Mashiach ◽

S. Cohen ◽

A. Kedem ◽

A. Baron ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Data Extraction ◽

Endometrial Tissue ◽

Endometrial Cells ◽

Pubmed Database ◽

Using Data

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

Download Full-text

Using Natural Language Processing to Determine Chemotherapeutic Regimens Administered within 30 Days Prior to Death in Acute Myelogenous Leukemia Patients

Blood ◽

10.1182/blood.v124.21.1267.1267 ◽

2014 ◽

Vol 124 (21) ◽

pp. 1267-1267

Author(s):

Hanahlyn M Park ◽

Vicky Sandhu ◽

Paul Fearn ◽

Kathleen Shannon Dorcy ◽

Elihu H. Estey ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Extraction ◽

Training Sample ◽

Chemotherapeutic Agents ◽

Myelogenous Leukemia ◽

Testing Sample ◽

Data Elements ◽

Access To Data

Abstract Background Clinical research in AML relies heavily on databases. Typically patients’ medical records are examined manually with relevant data entered into the database. This process is slow and subject to human error. Natural Language Processing (NLP) is a technique used to build and train computer algorithms to automatically extract structured data elements from unstructured text. Although, we have not used NLP extensively, the potential exists to use NLP as a tool to ease the time and resource-intensive burden of manual data abstraction. Since many of the data elements needed for clinical and translational research, quality metrics, and operations and analytics, are the same throughout FHCRC/UW/SCCA, the biomedical informatics group at FHCRC has decided to invest in the creation of an enterprise wide NLP pipeline to improve the efficiency and quality of data extraction for researchers, clinicians, and administrators throughout FHCRC/UW/SCCA. Purpose For this pilot project, an NLP system was trained and tested against a manually curated dataset to determine whether chemotherapeutic regimens were administered within 30 days prior to death in AML patients. The first part of this project was to train the NLP system with a small sample of patients in order to build in rules and logic about how to find both a patient’s date of death and the evidence of a completed chemotherapeutic agent. The second phase was to test the algorithm with unseen data from another set of patients and determine the system’s overall performance in finding the patient’s date of death and determining if they received chemotherapy within the preceding 30 days. Methods Inclusion criteria were the following: AML patients who came to FHCRC/SCCA/UWMC between 1/1/2010 to 12/31/2012, whose age ≥ 18 years, and who received chemotherapeutic agents within 30 days of death. Total sample size was 54 patients. Training sample was 24 patients and testing sample was 30 patients. In order to see the accuracy of the trained NLP system, manual and automatic extraction of data sets were compared. The performance of the system was evaluated in two ways: predicted value of a retrieved NLM identification (the number of correctly retrieved results out of all retrieved results) and sensitivity (the number of correctly retrieved results out of all possible correct results in the gold standard training and testing data). These two metrics will help determine if NLP can be a useful data extraction aid in order to expedite real time access to data analysis for improvement in outcomes for AML patients. Results For the training sample, the predictive value of a retrieved result by NLM of finding both the date of death and chemotherapeutic agents was 100%. The sensitivity of both date of death and chemotherapeutic agents was 92% in training sample. For the testing sample, the predicted value of a NLM identification was for finding date of death and chemotherapeutic agents was 96% while sensitivity of both date of death and chemotherapeutic agents was 73%. Limitations Sensitivity, in both training and testing populations, is primarily affected because of the ubiquitous problem of not having a concrete record of many patients’ death. Often patients go back to local facilities for continuing care and are lost to follow-up. The precision of finding date of death in the testing sample was affected by one date of death that was pulled incorrectly from a clinic note due to an error in the NLP algorithm. The recall of finding chemotherapeutic agents in the testing sample was affected by the lack of recognition of a chemotherapeutic trial name that had not appeared in the training sample. Conclusion The results of this pilot give us a preliminary idea of the feasibility of the NLP algorithm to perform in the future. Although the trained NLP tool only recalled 70-80% of the two data elements (date of death, chemotherapeutic agents), this was primarily due to the absence of certain data elements in the electronic health record and the precision of the defined date elements was nearly perfect. With the given results, we conclude that NLP can be a useful tool for data extraction purposes which will potentially maximize the ability of the leukemia service to have earlier access to data relative to symptom management and disease response which will influence the development of new clinical pathways for the optimizing of care and possible improvement in outcomes for AML patients. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: an observational study

10.31234/osf.io/xvwcy ◽

2020 ◽

Author(s):

Daniel Mark Low ◽

Laurie Rumker ◽

Tanya Talkar ◽

John Torous ◽

Guillermo Cecchi ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Support Groups ◽

Language Processing ◽

Mental Health Problems ◽

Health Anxiety ◽

Mental Health Support ◽

Health Support

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.

Download Full-text