Natural language processing versus rule-based text analysis: Comparing BERT score and readability indices to predict crowdfunding outcomes

Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN

Mathematical Problems in Engineering ◽

10.1155/2018/2410206 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Siyuan Zhao ◽

Zhiwei Xu ◽

Limin Liu ◽

Mengjie Guo ◽

Jing Yun

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Word Order ◽

Text Analysis ◽

Important Application ◽

Detection Mechanism ◽

Short Text

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.

Download Full-text

An approach to neural network analysis of text information in the economic assessment of companies

Economic Analysis Theory and Practice ◽

10.24891/ea.20.8.1574 ◽

2021 ◽

Vol 20 (8) ◽

pp. 1574-1594

Author(s):

Aleksandr R. NEVREDINOV

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Economic Assessment ◽

Management Decision ◽

Textual Information ◽

Financial Condition ◽

Analysis And Synthesis ◽

Management Decision Making

Subject. When evaluating enterprises, maximum accuracy and comprehensiveness of analysis are important, although the use of various indicators of organization’s financial condition and external factors provide a sufficiently high accuracy of forecasting. Many researchers are increasingly focusing on the natural language processing to analyze various text sources. This subject is extremely relevant against the needs of companies to quickly and extensively analyze their activities. Objectives. The study aims at exploring the natural language processing methods and sources of textual information about companies that can be used in the analysis, and developing an approach to the analysis of textual information. Methods. The study draws on methods of analysis and synthesis, systematization, formalization, comparative analysis, theoretical and methodological provisions contained in domestic and foreign scientific works on text analysis, including for purposes of company evaluation. Results. I offer and test an approach to using non-numeric indicators for company analysis. The paper presents a unique model, which is created on the basis of existing developments that have shown their effectiveness. I also substantiate the use of this approach to analyze a company’s condition and to include the analysis results in models for overall assessment of the state of companies. Conclusions. The findings improve scientific and practical understanding of techniques for the analysis of companies, the ways of applying text analysis, using machine learning. They can be used to support management decision-making to automate the analysis of their own and other companies in the market, with which they interact.

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical Data

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m3083 ◽

2009 ◽

Vol 16 (4) ◽

pp. 571-575 ◽

Cited By ~ 10

Author(s):

L. C. Childs ◽

R. Enelow ◽

L. Simonsen ◽

N. H. Heintzelman ◽

K. M. Kowalski ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Data ◽

Rule Based ◽

Rule Based System

Download Full-text

Leveraging Python to Process Cross-Cultural Temperament Interviews: A Novel Platform for Text Analysis

Journal of Cross-Cultural Psychology ◽

10.1177/0022022120906478 ◽

2020 ◽

Vol 51 (2) ◽

pp. 168-181 ◽

Cited By ~ 1

Author(s):

Joshua J. Underwood ◽

Cornelia Kirchhoff ◽

Haven Warwick ◽

Maria A. Gartstein

Keyword(s):

Early Childhood ◽

Natural Language Processing ◽

Individual Differences ◽

Natural Language ◽

Language Processing ◽

Data Reduction ◽

Text Analysis ◽

Cross Cultural ◽

Two Samples ◽

Do So

During childhood, parents represent the most commonly used source of their child’s temperament information and, typically, do so by responding to questionnaires. Despite their wide-ranging applications, interviews present notorious data reduction challenges, as quantification of narratives has proven to be a labor-intensive process. However, for the purposes of this study, the labor-intensive nature may have conferred distinct advantages. The present study represents a demonstration project aimed at leveraging emerging technologies for this purpose. Specifically, we used Python natural language processing capabilities to analyze semistructured temperament interviews conducted with U.S. and German mothers of toddlers, expecting to identify differences between these two samples in the frequency of words used to describe individual differences, along with some similarities. Two different word lists were used: (a) a set of German personality words and (b) temperament-related words extracted from the Early Childhood Behavior Questionnaire (ECBQ). Analyses using the German trait word demonstrated that mothers from Germany described their toddlers as significantly more “cheerful” and “careful” compared with U.S. caregivers. According to U.S. mothers, their children were more “independent,” “emotional,” and “timid.” For the ECBQ analysis, German mothers described their children as “calm” and “careful” more often than U.S. mothers. U.S. mothers, however, referred to their children as “upset,” “happy,” and “frustrated” more frequently than German caregivers. The Python code developed herein illustrates this software as a viable research tool for cross-cultural investigations.

Download Full-text

Clinical trial cohort selection based on multi-level rule-based natural language processing system

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz109 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1218-1226 ◽

Cited By ~ 7

Author(s):

Long Chen ◽

Yu Gu ◽

Xin Ji ◽

Chao Lou ◽

Zhiyong Sun ◽

...

Keyword(s):

Clinical Trials ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rule Based ◽

Language System ◽

Unified Medical Language System ◽

Rule Based System ◽

Medical Language ◽

Cohort Selection

Abstract Objective Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients’ eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials. Materials and Methods The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture. Results and Discussion The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors’ rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn’t achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts. Conclusion Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.

Download Full-text

An approach to natural language processing in the rule-based expert system

Proceedings of the 1990 ACM annual conference on Cooperation - CSC '90 ◽

10.1145/100348.100381 ◽

1990 ◽

Cited By ~ 1

Author(s):

Jan Kazimierczak

Keyword(s):

Natural Language Processing ◽

Expert System ◽

Natural Language ◽

Language Processing ◽

Rule Based

Download Full-text

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

JMIR Medical Informatics ◽

10.2196/25157 ◽

2022 ◽

Vol 10 (1) ◽

pp. e25157

Author(s):

Zhen Yang ◽

Chloé Pou-Prom ◽

Ashley Jones ◽

Michaelia Banning ◽

David Dai ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Expanded Disability Status Scale ◽

Health Records ◽

Rule Based ◽

Disability Status ◽

Edss Score ◽

Electronic Health

Background The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS score from unstructured electronic health records can be time-consuming. Objective We aimed to compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS score and EDSS functional system subscores from the electronic health records of patients with MS. Methods We studied 17,452 electronic health records of 4906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets, and compared the performance characteristics of 3 natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword “EDSS.” Next, we trained a convolutional neural network (CNN) model to predict the 19 half-step increments of the EDSS score. Finally, we used a combined rule-based–CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared with the reference standard, which was manually labeled EDSS scores in the clinic database. Results Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall, and an F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, and 0.70, and 0.86, 0.70, 0.70, and 0.70. Because of missing data, the model performance for EDSS subscores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores. Conclusions A combined keyword-CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.

Download Full-text

How You Say It Matters: Text Analysis of FOMC Statements Using Natural Language Processing

The Federal Reserve Bank of Kansas City Economic Review ◽

10.18651/er/v106n1dohkimyang ◽

2021 ◽

Author(s):

Taeyoung Doh ◽

Sungil Kim ◽

Shu-Kuei X. Yang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis

Download Full-text

Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing

Yearbook of Medical Informatics ◽

10.15265/iy-2016-017 ◽

2016 ◽

Vol 25 (01) ◽

pp. 224-233 ◽

Cited By ~ 11

Author(s):

N. Elhadad ◽

D. Demner-Fushman

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Disease Modeling ◽

Text Processing ◽

Healthcare Quality ◽

Unintended Consequences ◽

Health Related

Summary Objectives: This paper reviews work over the past two years in Natural Language Processing (NLP) applied to clinical and consumer-generated texts. Methods: We included any application or methodological publication that leverages text to facilitate healthcare and address the health-related needs of consumers and populations. Results: Many important developments in clinical text processing, both foundational and task-oriented, were addressed in community-wide evaluations and discussed in corresponding special issues that are referenced in this review. These focused issues and in-depth reviews of several other active research areas, such as pharmacovigilance and summarization, allowed us to discuss in greater depth disease modeling and predictive analytics using clinical texts, and text analysis in social media for healthcare quality assessment, trends towards online interventions based on rapid analysis of health-related posts, and consumer health question answering, among other issues. Conclusions: Our analysis shows that although clinical NLP continues to advance towards practical applications and more NLP methods are used in large-scale live health information applications, more needs to be done to make NLP use in clinical applications a routine widespread reality. Progress in clinical NLP is mirrored by developments in social media text analysis: the research is moving from capturing trends to addressing individual health-related posts, thus showing potential to become a tool for precision medicine and a valuable addition to the standard healthcare quality evaluation tools.

Download Full-text