Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study (Preprint)

BACKGROUND Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. OBJECTIVE The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. METHODS We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. RESULTS The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. CONCLUSIONS NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.

Download Full-text

Short Message Service Filtering with Natural Language Processing in Indonesian Language

10.1109/iciss53185.2021.9532503 ◽

2021 ◽

Author(s):

Vincentius Gabriel Tandra ◽

Yowen Yowen ◽

Ravel Tanjaya ◽

William Lucianto Santoso ◽

Nunung Nurul Qomariyah

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Short Message Service ◽

Short Message ◽

Message Service

Download Full-text

Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN

Mathematical Problems in Engineering ◽

10.1155/2018/2410206 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Siyuan Zhao ◽

Zhiwei Xu ◽

Limin Liu ◽

Mengjie Guo ◽

Jing Yun

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Word Order ◽

Text Analysis ◽

Important Application ◽

Detection Mechanism ◽

Short Text

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.

Download Full-text

Natural language processing versus rule-based text analysis: Comparing BERT score and readability indices to predict crowdfunding outcomes

Journal of Business Venturing Insights ◽

10.1016/j.jbvi.2021.e00276 ◽

2021 ◽

Vol 16 ◽

pp. e00276

Author(s):

C.S. Richard Chan ◽

Charuta Pethe ◽

Steven Skiena

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Rule Based

Download Full-text

An approach to neural network analysis of text information in the economic assessment of companies

Economic Analysis Theory and Practice ◽

10.24891/ea.20.8.1574 ◽

2021 ◽

Vol 20 (8) ◽

pp. 1574-1594

Author(s):

Aleksandr R. NEVREDINOV

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Economic Assessment ◽

Management Decision ◽

Textual Information ◽

Financial Condition ◽

Analysis And Synthesis ◽

Management Decision Making

Subject. When evaluating enterprises, maximum accuracy and comprehensiveness of analysis are important, although the use of various indicators of organization’s financial condition and external factors provide a sufficiently high accuracy of forecasting. Many researchers are increasingly focusing on the natural language processing to analyze various text sources. This subject is extremely relevant against the needs of companies to quickly and extensively analyze their activities. Objectives. The study aims at exploring the natural language processing methods and sources of textual information about companies that can be used in the analysis, and developing an approach to the analysis of textual information. Methods. The study draws on methods of analysis and synthesis, systematization, formalization, comparative analysis, theoretical and methodological provisions contained in domestic and foreign scientific works on text analysis, including for purposes of company evaluation. Results. I offer and test an approach to using non-numeric indicators for company analysis. The paper presents a unique model, which is created on the basis of existing developments that have shown their effectiveness. I also substantiate the use of this approach to analyze a company’s condition and to include the analysis results in models for overall assessment of the state of companies. Conclusions. The findings improve scientific and practical understanding of techniques for the analysis of companies, the ways of applying text analysis, using machine learning. They can be used to support management decision-making to automate the analysis of their own and other companies in the market, with which they interact.

Download Full-text

Leveraging Python to Process Cross-Cultural Temperament Interviews: A Novel Platform for Text Analysis

Journal of Cross-Cultural Psychology ◽

10.1177/0022022120906478 ◽

2020 ◽

Vol 51 (2) ◽

pp. 168-181 ◽

Cited By ~ 1

Author(s):

Joshua J. Underwood ◽

Cornelia Kirchhoff ◽

Haven Warwick ◽

Maria A. Gartstein

Keyword(s):

Early Childhood ◽

Natural Language Processing ◽

Individual Differences ◽

Natural Language ◽

Language Processing ◽

Data Reduction ◽

Text Analysis ◽

Cross Cultural ◽

Two Samples ◽

Do So

During childhood, parents represent the most commonly used source of their child’s temperament information and, typically, do so by responding to questionnaires. Despite their wide-ranging applications, interviews present notorious data reduction challenges, as quantification of narratives has proven to be a labor-intensive process. However, for the purposes of this study, the labor-intensive nature may have conferred distinct advantages. The present study represents a demonstration project aimed at leveraging emerging technologies for this purpose. Specifically, we used Python natural language processing capabilities to analyze semistructured temperament interviews conducted with U.S. and German mothers of toddlers, expecting to identify differences between these two samples in the frequency of words used to describe individual differences, along with some similarities. Two different word lists were used: (a) a set of German personality words and (b) temperament-related words extracted from the Early Childhood Behavior Questionnaire (ECBQ). Analyses using the German trait word demonstrated that mothers from Germany described their toddlers as significantly more “cheerful” and “careful” compared with U.S. caregivers. According to U.S. mothers, their children were more “independent,” “emotional,” and “timid.” For the ECBQ analysis, German mothers described their children as “calm” and “careful” more often than U.S. mothers. U.S. mothers, however, referred to their children as “upset,” “happy,” and “frustrated” more frequently than German caregivers. The Python code developed herein illustrates this software as a viable research tool for cross-cultural investigations.

Download Full-text

How You Say It Matters: Text Analysis of FOMC Statements Using Natural Language Processing

The Federal Reserve Bank of Kansas City Economic Review ◽

10.18651/er/v106n1dohkimyang ◽

2021 ◽

Author(s):

Taeyoung Doh ◽

Sungil Kim ◽

Shu-Kuei X. Yang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis

Download Full-text

Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing

Yearbook of Medical Informatics ◽

10.15265/iy-2016-017 ◽

2016 ◽

Vol 25 (01) ◽

pp. 224-233 ◽

Cited By ~ 11

Author(s):

N. Elhadad ◽

D. Demner-Fushman

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Disease Modeling ◽

Text Processing ◽

Healthcare Quality ◽

Unintended Consequences ◽

Health Related

Summary Objectives: This paper reviews work over the past two years in Natural Language Processing (NLP) applied to clinical and consumer-generated texts. Methods: We included any application or methodological publication that leverages text to facilitate healthcare and address the health-related needs of consumers and populations. Results: Many important developments in clinical text processing, both foundational and task-oriented, were addressed in community-wide evaluations and discussed in corresponding special issues that are referenced in this review. These focused issues and in-depth reviews of several other active research areas, such as pharmacovigilance and summarization, allowed us to discuss in greater depth disease modeling and predictive analytics using clinical texts, and text analysis in social media for healthcare quality assessment, trends towards online interventions based on rapid analysis of health-related posts, and consumer health question answering, among other issues. Conclusions: Our analysis shows that although clinical NLP continues to advance towards practical applications and more NLP methods are used in large-scale live health information applications, more needs to be done to make NLP use in clinical applications a routine widespread reality. Progress in clinical NLP is mirrored by developments in social media text analysis: the research is moving from capturing trends to addressing individual health-related posts, thus showing potential to become a tool for precision medicine and a valuable addition to the standard healthcare quality evaluation tools.

Download Full-text

A Qualitative Analysis of Provider Notes of Atopic Dermatitis-Related Visits Using Natural Language Processing Methods

Dermatology and Therapy ◽

10.1007/s13555-021-00553-5 ◽

2021 ◽

Author(s):

Evangeline J. Pierce ◽

Natalie N. Boytsov ◽

Joe J. Vasey ◽

Theresa C. Sudaria ◽

Xiong Liu ◽

...

Keyword(s):

Atopic Dermatitis ◽

Natural Language Processing ◽

Natural Language ◽

Qualitative Analysis ◽

Language Processing ◽

Processing Methods

Download Full-text

Deep learning approach to text analysis for human emotion detection from big data

Journal of Intelligent Systems ◽

10.1515/jisys-2022-0001 ◽

2022 ◽

Vol 31 (1) ◽

pp. 113-126

Author(s):

Jia Guo

Keyword(s):

Big Data ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Question Answering ◽

Word Embeddings ◽

Emotion Detection ◽

Human Emotion

Abstract Emotional recognition has arisen as an essential field of study that can expose a variety of valuable inputs. Emotion can be articulated in several means that can be seen, like speech and facial expressions, written text, and gestures. Emotion recognition in a text document is fundamentally a content-based classification issue, including notions from natural language processing (NLP) and deep learning fields. Hence, in this study, deep learning assisted semantic text analysis (DLSTA) has been proposed for human emotion detection using big data. Emotion detection from textual sources can be done utilizing notions of Natural Language Processing. Word embeddings are extensively utilized for several NLP tasks, like machine translation, sentiment analysis, and question answering. NLP techniques improve the performance of learning-based methods by incorporating the semantic and syntactic features of the text. The numerical outcomes demonstrate that the suggested method achieves an expressively superior quality of human emotion detection rate of 97.22% and the classification accuracy rate of 98.02% with different state-of-the-art methods and can be enhanced by other emotional word embeddings.

Download Full-text

Natural Language Processing in Mixed-methods Text Analysis: A Workflow Approach

International Journal of Social Research Methodology ◽

10.1080/13645579.2021.2018905 ◽

2022 ◽

pp. 1-13

Author(s):

Louisa Parks ◽

Wim Peters

Keyword(s):

Natural Language Processing ◽

Mixed Methods ◽

Natural Language ◽

Language Processing ◽

Text Analysis

Download Full-text