On Mining Words: The Utility of Topic Models in Health Education Research and Practice

2021 ◽  
Vol 22 (3) ◽  
pp. 309-312
Author(s):  
Danny Valdez ◽  
Andrew C. Picket ◽  
Belinda-Rose Young ◽  
Shelley Golden

Written language is the primary means by which scientific research findings are disseminated. Yet in the era of information overload, dissemination of a field of research may require additional efforts given the sheer volume of material available on any specific topic. Topic models are unsupervised natural language processing methods that analyze nonnumeric data (i.e., text data) in abundance. These tools aggregate, and make sense of, those data making them interpretable to interested audiences. In this perspective piece, we briefly describe topic models, including their purpose, function, and applicability for health education researchers and practitioners. We note how topic models can be applied in several contexts, including social media–based analyses, and mapping trends in scientific literature over time. As a tool for studying words, and patterns of words, topic models stand to improve our understanding of events prior and those occurring in the moment and help us look ahead into the future.

Author(s):  
Subhadra Dutta ◽  
Eric M. O’Rourke

Natural language processing (NLP) is the field of decoding human written language. This chapter responds to the growing interest in using machine learning–based NLP approaches for analyzing open-ended employee survey responses. These techniques address scalability and the ability to provide real-time insights to make qualitative data collection equally or more desirable in organizations. The chapter walks through the evolution of text analytics in industrial–organizational psychology and discusses relevant supervised and unsupervised machine learning NLP methods for survey text data, such as latent Dirichlet allocation, latent semantic analysis, sentiment analysis, word relatedness methods, and so on. The chapter also lays out preprocessing techniques and the trade-offs of growing NLP capabilities internally versus externally, points the readers to available resources, and ends with discussing implications and future directions of these approaches.


2003 ◽  
Vol 21 (4) ◽  
pp. 369-375
Author(s):  
O. O. Bankole ◽  
O. O. Denloye ◽  
G. A. Aderinokun ◽  
C. O. Badejo R.N. Phn

The development of photo-posters to educate the Nigerian community on the perceived problems of teething was prompted by research findings which revealed that misconceptions about teething were widespread within the populace and in particular among some health professionals. Studies have shown that 58% of ethnic Yoruba rural dwellers in Nigeria attributed ailments to the teething process, while 70% of market women in Enugu State perceived diarrhea in their children was due to teething. In a recent survey, 61.4% of nurses believed diarrhea should accompany the teething process. Furthermore 82.1%, 35.8%, and 27.9% of them implicated fever, weight loss, and boils respectively as signs of teething. Photo-posters adopt the use of visual representation of a problem and the goal of using photo-posters is to begin to create an understanding in the minds of people that babies can be healthy in spite of their erupting teeth. It is believed that using pictures of real babies who are seen to be healthy when their teeth first emerge should go a long way to reducing some of the misconceived ideas. In its development, the participatory approach was adopted involving selected members of the target population, thus making it a culturally appropriate tool. This article describes the rationale behind the choice of the photo-posters and the process of developing them.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


Author(s):  
Robert Procter ◽  
Miguel Arana-Catania ◽  
Felix-Anselm van Lier ◽  
Nataliya Tkachenko ◽  
Yulan He ◽  
...  

The development of democratic systems is a crucial task as confirmed by its selection as one of the Millennium Sustainable Development Goals by the United Nations. In this article, we report on the progress of a project that aims to address barriers, one of which is information overload, to achieving effective direct citizen participation in democratic decision-making processes. The main objectives are to explore if the application of Natural Language Processing (NLP) and machine learning can improve citizens? experience of digital citizen participation platforms. Taking as a case study the ?Decide Madrid? Consul platform, which enables citizens to post proposals for policies they would like to see adopted by the city council, we used NLP and machine learning to provide new ways to (a) suggest to citizens proposals they might wish to support; (b) group citizens by interests so that they can more easily interact with each other; (c) summarise comments posted in response to proposals; (d) assist citizens in aggregating and developing proposals. Evaluation of the results confirms that NLP and machine learning have a role to play in addressing some of the barriers users of platforms such as Consul currently experience.


2021 ◽  
Author(s):  
Anahita Davoudi ◽  
Natalie Lee ◽  
Thaibinh Luong ◽  
Timothy Delaney ◽  
Elizabeth Asch ◽  
...  

Background: Free-text communication between patients and providers is playing an increasing role in chronic disease management, through platforms varying from traditional healthcare portals to more novel mobile messaging applications. These text data are rich resources for clinical and research purposes, but their sheer volume render them difficult to manage. Even automated approaches such as natural language processing require labor-intensive manual classification for developing training datasets, which is a rate-limiting step. Automated approaches to organizing free-text data are necessary to facilitate the use of free-text communication for clinical care and research. Objective: We applied unsupervised learning approaches to 1) understand the types of topics discussed and 2) to learn medication-related intents from messages sent between patients and providers through a bi-directional text messaging system for managing participant blood pressure. Methods: This study was a secondary analysis of de-identified messages from a remote mobile text-based employee hypertension management program at an academic institution. In experiment 1, we trained a Latent Dirichlet Allocation (LDA) model for each message type (inbound-patient and outbound-provider) and identified the distribution of major topics and significant topics (probability >0.20) across message types. In experiment 2, we annotated all medication-related messages with a single medication intent. Then, we trained a second LDA model (medLDA) to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n-1-3 words) using spaCy, clinical named entities using STANZA, and medication categories using MedEx, and then applied Chi-square feature selection to learn the most informative features associated with each medication intent. Results: A total of 253 participants and 5 providers engaged in the program generating 12,131 total messages: 47% patient messages and 53% provider messages. Most patient messages correspond to blood pressure (BP) reporting, BP encouragement, and appointment scheduling. In contrast, most provider messages correspond to BP reporting, medication adherence, and confirmatory statements. In experiment 1, for both patient and provider messages, most messages contained 1 topic and few with more than 3 topics identified using LDA. However, manual review of some messages within topics revealed significant heterogeneity even within single-topic messages as identified by LDA. In experiment 2, among the 534 medication messages annotated with a single medication intent, most of the 282 patient medication messages referred to medication request (48%; n=134) and medication taking (28%; n=79); most of the 252 provider medication messages referred to medication question (69%; n=173). Although medLDA could identify a majority intent within each topic, the model could not distinguish medication intents with low prevalence within either patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusion: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitate the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated deep medication intent classification.


2020 ◽  
Author(s):  
David DeFranza ◽  
Himanshu Mishra ◽  
Arul Mishra

Language provides an ever-present context for our cognitions and has the ability to shape them. Languages across the world can be gendered (language in which the form of noun, verb, or pronoun is presented as female or male) versus genderless. In an ongoing debate, one stream of research suggests that gendered languages are more likely to display gender prejudice than genderless languages. However, another stream of research suggests that language does not have the ability to shape gender prejudice. In this research, we contribute to the debate by using a Natural Language Processing (NLP) method which captures the meaning of a word from the context in which it occurs. Using text data from Wikipedia and the Common Crawl project (which contains text from billions of publicly facing websites) across 45 world languages, covering the majority of the world’s population, we test for gender prejudice in gendered and genderless languages. We find that gender prejudice occurs more in gendered rather than genderless languages. Moreover, we examine whether genderedness of language influences the stereotypic dimensions of warmth and competence utilizing the same NLP method.


Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.


2021 ◽  
Author(s):  
Connor Shorten ◽  
Taghi M. Khoshgoftaar ◽  
Borko Furht

Abstract Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.


2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Sign in / Sign up

Export Citation Format

Share Document