A Study of the Effects of the COVID-19 Pandemic on the Experience of Back Pain Reported on Twitter® in the United States: A Natural Language Processing Approach

The COVID-19 pandemic has changed our lifestyles, habits, and daily routine. Some of the impacts of COVID-19 have been widely reported already. However, many effects of the COVID-19 pandemic are still to be discovered. The main objective of this study was to assess the changes in the frequency of reported physical back pain complaints reported during the COVID-19 pandemic. In contrast to other published studies, we target the general population using Twitter as a data source. Specifically, we aim to investigate differences in the number of back pain complaints between the pre-pandemic and during the pandemic. A total of 53,234 and 78,559 tweets were analyzed for November 2019 and November 2020, respectively. Because Twitter users do not always complain explicitly when they tweet about the experience of back pain, we have designed an intelligent filter based on natural language processing (NLP) to automatically classify the examined tweets into the back pain complaining class and other tweets. Analysis of filtered tweets indicated an 84% increase in the back pain complaints reported in November 2020 compared to November 2019. These results might indicate significant changes in lifestyle during the COVID-19 pandemic, including restrictions in daily body movements and reduced exposure to routine physical exercise.

Download Full-text

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed

Journal of Medical Internet Research ◽

10.2196/16816 ◽

2020 ◽

Vol 22 (1) ◽

pp. e16816 ◽

Cited By ~ 4

Author(s):

Jing Wang ◽

Huan Deng ◽

Bangtao Liu ◽

Anbin Hu ◽

Jun Liang ◽

...

Keyword(s):

United States ◽

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Medical Research ◽

Language Processing ◽

The United States ◽

Columbia University ◽

Medical Field ◽

Number Of Publications

Background Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. Objective The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. Methods A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. Results A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author’s affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). Conclusions NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.

Download Full-text

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed (Preprint)

10.2196/preprints.16816 ◽

2019 ◽

Author(s):

Jing Wang ◽

Huan Deng ◽

Bangtao Liu ◽

Anbin Hu ◽

Jun Liang ◽

...

Keyword(s):

United States ◽

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Medical Research ◽

Language Processing ◽

The United States ◽

Columbia University ◽

Medical Field ◽

Number Of Publications

BACKGROUND Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. OBJECTIVE The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. METHODS A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. RESULTS A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author’s affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). CONCLUSIONS NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.

Download Full-text

Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States

10.1101/2021.08.23.21261924 ◽

2021 ◽

Author(s):

Ari Z. Klein ◽

Steven Meanley ◽

Karen O’Connor ◽

José A. Bauermeister ◽

Graciela Gonzalez-Hernandez

Keyword(s):

United States ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Age Groups ◽

User Profile ◽

The United States ◽

Bisexual Men ◽

Self Reports

AbstractBackgroundPre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of Human Immunodeficiency Virus (HIV). There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. While Twitter content has been analyzed as a source of PrEP-related data (e.g., barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions.ObjectiveMen who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objective of this study was to develop and assess an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM.MethodsBetween September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions on the tweets and their user profile metadata designed to filter out noise and identify actual self-reports. We identified 10,043 unique users geolocated in the United States, and drew upon a validated NLP tool to automatically identify their ages.ResultsBased on manually distinguishing true and false positive self-reports in the tweets or profiles of 1000 of the 10,043 users identified by our automated pipeline, our pipeline has a precision of 0.85. Among the 8756 users for which a United States state-level geolocation was detected, 5096 (58.2%) of them are in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) of them are in counties or states considered priority jurisdictions by the Ending the HIV Epidemic (EHE) initiative. Furthermore, the majority of the users are in the same two age groups as the majority of MSM in the United States with new HIV diagnoses.ConclusionsOur automated NLP pipeline can be used to identify MSM in the United States who may be at risk for acquiring HIV, laying the groundwork for using Twitter on a large scale to target PrEP-related interventions directly at this population.

Download Full-text

The Perception of Entrepreneurship Managers and Impact on the Leadership Style in the United States With the Modern Natural Language Processing Analysis

Journal of Management Policy and Practice ◽

10.33423/jmpp.v22i2.4459 ◽

2021 ◽

Vol 22 (2) ◽

Keyword(s):

United States ◽

Natural Language Processing ◽

Natural Language ◽

Leadership Style ◽

Language Processing ◽

The United States ◽

Modern Natural

Download Full-text

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.729834 ◽

2021 ◽

Vol 4 ◽

Author(s):

Yue Wu ◽

Zhichao Liu ◽

Leihong Wu ◽

Minjun Chen ◽

Weida Tong

Keyword(s):

United States ◽

Deep Learning ◽

Natural Language Processing ◽

Liver Injury ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Drug Induced ◽

Drug Induced Liver Injury ◽

Drug Labeling

Background & Aims: The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model.Methods: FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application.Results: The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling.Conclusion: Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.

Download Full-text

A Natural-Language-Processing-Based Procedure for Generating Distractors for Multiple-Choice Questions

Evaluation & the Health Professions ◽

10.1177/01632787211046981 ◽

2021 ◽

pp. 016327872110469

Author(s):

Peter Baldwin ◽

Janet Mee ◽

Victoria Yaneva ◽

Miguel Paniagua ◽

Jean D’Angelo ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Multiple Choice ◽

Incorrect Response ◽

The United States ◽

Choice Test ◽

Multiple Choice Questions ◽

Response Options ◽

Medical Licensing

One of the most challenging aspects of writing multiple-choice test questions is identifying plausible incorrect response options—i.e., distractors. To help with this task, a procedure is introduced that can mine existing item banks for potential distractors by considering the similarities between a new item’s stem and answer and the stems and response options for items in the bank. This approach uses natural language processing to measure similarity and requires a substantial pool of items for constructing the generating model. The procedure is demonstrated with data from the United States Medical Licensing Examination (USMLE®). For about half the items in the study, at least one of the top three system-produced candidates matched a human-produced distractor exactly; and for about one quarter of the items, two of the top three candidates matched human-produced distractors. A study was conducted in which a sample of system-produced candidates were shown to 10 experienced item writers. Overall, participants thought about 81% of the candidates were on topic and 56% would help human item writers with the task of writing distractors.

Download Full-text

Using Natural Language Processing to Assess the Psychological Effect of COVID-19 Pandemic on Insomnia via Tweets: A Pre-Post Retrospective Pilot Study (Preprint)

10.2196/preprints.33454 ◽

2021 ◽

Author(s):

Arash Maghsoudi ◽

Sara Nowakowski ◽

Ritwick Agrawal ◽

Amir Sharafkhaneh ◽

Sadaf Aram ◽

...

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Temporal Analysis ◽

Health Condition ◽

Additional Stress ◽

Mental Health Condition ◽

Twitter Users

BACKGROUND The COVID-19 pandemic has imposed additional stress on population health that may result in a higher incidence of insomnia. In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. OBJECTIVE In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. METHODS We designed a pre-post retrospective study using public social media content from Twitter. We categorized tweets based on time into two intervals: prepandemic (01/01/2019 to 01/01/2020) and pandemic (01/01/2020 to 01/01/2021). We used NLP to analyze polarity (positive/negative) and intensity of emotions and also users’ tweets psychological states in terms of sadness, anxiety and anger by counting the words related to these categories in each tweet. Additionally, we performed temporal analysis to examine the effect of time on the users’ insomnia experience. RESULTS We extracted 268,803 tweets containing the word insomnia (prepandemic, 123,293 and pandemic, 145,510). The odds of negative tweets (OR, 1.31; 95% CI, 1.29-1.33), anger (OR, 1.19; 95% CI, 1.16-1.21), and anxiety (OR, 1.24; 95% CI: 1.21-1.26) were higher during the pandemic compared to prepandemic. The likelihood of negative tweets after midnight was higher than for other daily intevals, comprising approximately 60% of all negative insomnia-related tweets in 2020 and 2021 collectively. CONCLUSIONS Twitter users shared more negative tweets about insomnia during the pandemic than during the year before. Also, more anger and anxiety-related content were disseminated during the pandemic on the social media platform. Future studies using an NLP framework could assess tweets about other psychological distress, habit changes, weight gain due to inactivity, and the effect of viral infection on sleep.

Download Full-text

Identifying and intercepting health misinformation on Reddit dermatology forums with artificially intelligent bots using natural language processing (Preprint)

10.2196/preprints.20975 ◽

2021 ◽

Author(s):

Monique B. Sager ◽

Aditya M. Kashyap ◽

Mila Tamminga ◽

Sadhana Ravoori ◽

Christopher Callison-Burch ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Test Accuracy ◽

Limited Data ◽

Test Environment ◽

Data Set ◽

Inappropriate Care ◽

Processing Techniques

BACKGROUND Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to inappropriate care. Initial testing has shown that artificially intelligent bots can detect misinformation on Reddit forums and may be able to produce responses to posts containing misinformation. OBJECTIVE To analyze the ability of bots to find and respond to health misinformation on Reddit’s dermatology forums in a controlled test environment. METHODS Using natural language processing techniques, we trained bots to target misinformation using relevant keywords and to post pre-fabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. RESULTS Our models yielded data test accuracies ranging from 95%-100%, with a BERT fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective pre-fabricated responses to misinformation. CONCLUSIONS Using a limited data set, bots had near-perfect ability to detect these examples of health misinformation within Reddit dermatology forums. Given that these bots can then post pre-fabricated responses, this technique may allow for interception of misinformation. Providing correct information, even instantly, however, does not mean users will be receptive or find such interventions persuasive. Further work should investigate this strategy’s effectiveness to inform future deployment of bots as a technique in combating health misinformation. CLINICALTRIAL N/A

Download Full-text

A Comparison of Natural Language Processing Methods for the Classification of Lumbar Spine Imaging Findings Related to Lower Back Pain

Academic Radiology ◽

10.1016/j.acra.2021.09.005 ◽

2021 ◽

Author(s):

Chethan Jujjavarapu ◽

Vikas Pejaver ◽

Trevor A. Cohen ◽

Sean D. Mooney ◽

Patrick J. Heagerty ◽

...

Keyword(s):

Natural Language Processing ◽

Back Pain ◽

Lumbar Spine ◽

Natural Language ◽

Language Processing ◽

Lower Back Pain ◽

Imaging Findings ◽

Lower Back ◽

Spine Imaging

Download Full-text

Trend of Social Media News: A Viewpoint of COVID-19 Tweets Using Natural Language Processing

10.31219/osf.io/zjkra ◽

2021 ◽

Author(s):

AISDL

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Developed Countries ◽

World Health ◽

The World ◽

Twitter Users ◽

The Developed Countries ◽

Health Organization

The meteoric rise of social media news during the ongoing COVID-19 is worthy of advanced research. Freedom of speech in many parts of the world, especially the developed countries and liberty of socialization, calls for noteworthy information sharing during the panic pandemic. However, as a communication intervention during crises in the past, social media use is remarkable; the Tweets generated via Twitter during the ongoing COVID-19 is incomparable with the former records. This study examines social media news trends and compares the Tweets on COVID-19 as a corpus from Twitter. By deploying Natural Language Processing (NLP) methods on tweets, we were able to extract and quantify the similarities between some tweets over time, which means that some people say the same thing about the pandemic while other Twitter users view it differently. The tools we used are Spacy, Networkx, WordCloud, and Re. This study contributes to the social media literature by understanding the similarity and divergence of COVID-19 tweets of the public and health agencies such as the World Health Organization (WHO). The study also sheds more light on the COVID-19 sparse and densely text network and their implications for the policymakers. The study explained the limitations and proposed future studies.

Download Full-text