An empirical comparison of distance/similarity measures for Natural Language Processing

Text Classification is one of the tasks of Natural Language Processing (NLP). In this area, Graph Convolutional Networks (GCN) has achieved values higher than CNN's and other related models. For GCN, the metric that defines the correlation between words in a vector space plays a crucial role in the classification because it determines the weight of the edges between two words (represented by nodes in the graph). In this study, we empirically investigated the impact of thirteen measures of distance/similarity. A representation was built for each document using word embedding from word2vec model. Also, a graph-based representation of five dataset was created for each measure analyzed, where each word is a node in the graph, and each edge is weighted by distance/similarity between words. Finally, each model was run in a simple graph neural network. The results show that, concerning text classification, there is no statistical difference between the analyzed metrics and the Graph Convolution Network. Even with the incorporation of external words or external knowledge, the results were similar to the methods without the incorporation of words. However, the results indicate that some distance metrics behave better than others in relation to context capture, with Euclidean distance reaching the best values or having statistical similarity with the best.

Download Full-text

Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification

2017 IEEE International Conference on Web Services (ICWS) ◽

10.1109/icws.2017.46 ◽

2017 ◽

Cited By ~ 5

Author(s):

Hang Zhuang ◽

Chao Wang ◽

Changlong Li ◽

Qingfeng Wang ◽

Xuehai Zhou

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chinese Text ◽

Text Classification ◽

Convolutional Networks ◽

Chinese Text Classification ◽

Processing Service

Download Full-text

Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey

2021 3rd International Conference on Signal Processing and Communication (ICPSC) ◽

10.1109/icspc51351.2021.9451752 ◽

2021 ◽

Author(s):

PM. Lavanya ◽

E. Sasikala

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Healthcare Network ◽

Learning Techniques ◽

Comprehensive Survey

Download Full-text

A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity

AERA Open ◽

10.1177/23328584211028615 ◽

2021 ◽

Vol 7 ◽

pp. 233285842110286

Author(s):

Kylie L. Anglin ◽

Vivian C. Wong ◽

Arielle Boguslav

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Intervention Implementation ◽

Proof Of Concept ◽

Coaching Intervention ◽

Processing Techniques ◽

Teacher Coaching ◽

The Impact

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.

Download Full-text

Text Classification by using Natural Language Processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1802/4/042010 ◽

2021 ◽

Vol 1802 (4) ◽

pp. 042010

Author(s):

Peiyang Yu ◽

Victor Y. Cui ◽

Jiaxin Guan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification

Download Full-text

Application of natural language processing methods to extract coded data from administrative data held in the Scottish Prescribing Information System

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.263 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Clifford Nangle ◽

Stuart McTaggart ◽

Margaret MacLeod ◽

Jackie Caldwell ◽

Marion Bennie

Keyword(s):

Information System ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Exposure ◽

Drug Dose ◽

Free Text ◽

Wide Range ◽

The Impact ◽

Prescribing Information

ABSTRACT ObjectivesThe Prescribing Information System (PIS) datamart, hosted by NHS National Services Scotland receives around 90 million electronic prescription messages per year from GP practices across Scotland. Prescription messages contain information including drug name, quantity and strength stored as coded, machine readable, data while prescription dose instructions are unstructured free text and difficult to interpret and analyse in volume. The aim, using Natural Language Processing (NLP), was to extract drug dose amount, unit and frequency metadata from freely typed text in dose instructions to support calculating the intended number of days’ treatment. This then allows comparison with actual prescription frequency, treatment adherence and the impact upon prescribing safety and effectiveness. ApproachAn NLP algorithm was developed using the Ciao implementation of Prolog to extract dose amount, unit and frequency metadata from dose instructions held in the PIS datamart for drugs used in the treatment of gastrointestinal, cardiovascular and respiratory disease. Accuracy estimates were obtained by randomly sampling 0.1% of the distinct dose instructions from source records, comparing these with metadata extracted by the algorithm and an iterative approach was used to modify the algorithm to increase accuracy and coverage. ResultsThe NLP algorithm was applied to 39,943,465 prescription instructions issued in 2014, consisting of 575,340 distinct dose instructions. For drugs used in the gastrointestinal, cardiovascular and respiratory systems (i.e. chapters 1, 2 and 3 of the British National Formulary (BNF)) the NLP algorithm successfully extracted drug dose amount, unit and frequency metadata from 95.1%, 98.5% and 97.4% of prescriptions respectively. However, instructions containing terms such as ‘as directed’ or ‘as required’ reduce the usability of the metadata by making it difficult to calculate the total dose intended for a specific time period as 7.9%, 0.9% and 27.9% of dose instructions contained terms meaning ‘as required’ while 3.2%, 3.7% and 4.0% contained terms meaning ‘as directed’, for drugs used in BNF chapters 1, 2 and 3 respectively. ConclusionThe NLP algorithm developed can extract dose, unit and frequency metadata from text found in prescriptions issued to treat a wide range of conditions and this information may be used to support calculating treatment durations, medicines adherence and cumulative drug exposure. The presence of terms such as ‘as required’ and ‘as directed’ has a negative impact on the usability of the metadata and further work is required to determine the level of impact this has on calculating treatment durations and cumulative drug exposure.

Download Full-text

Money Makes the World Go Frowned. Analyzing the Impact of Chinese Foreign Aid on States’ Sentiment Using Natural Language Processing

Chinas Rolle in einer neuen Weltordnung ◽

10.5771/9783828876361-241 ◽

2021 ◽

pp. 241-264

Author(s):

Dennis Hammerschmidt ◽

Cosima Meyer

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Foreign Aid ◽

Language Processing ◽

The World ◽

Chinese Foreign Aid ◽

The Impact

Download Full-text

Qualitative Study in Natural Language Processing: Text Classification

10.1007/978-3-030-85990-9_8 ◽

2021 ◽

pp. 83-92

Author(s):

Ahlam Wahdan ◽

Said A. Salloum ◽

Khaled Shaalan

Keyword(s):

Natural Language Processing ◽

Qualitative Study ◽

Natural Language ◽

Language Processing ◽

Text Classification

Download Full-text

Analysis of the Impact of the US Presidential Election on the US Economy Based on Natural Language Processing and Big Data

Computational and Experimental Simulations in Engineering - Mechanisms and Machine Science ◽

10.1007/978-3-030-67090-0_39 ◽

2021 ◽

pp. 483-494

Author(s):

Mingzhen Li ◽

Xiangdong Liu

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Presidential Election ◽

Us Economy ◽

The Us ◽

Us Presidential Election ◽

The Impact

Download Full-text

A Natural Language Processing Approach to Automated Highlighting of New Information in Clinical Notes

Applied Sciences ◽

10.3390/app10082824 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2824

Author(s):

Yu-Hsiang Su ◽

Ching-Ping Chao ◽

Ling-Chien Hung ◽

Sheng-Feng Sung ◽

Pei-Ju Lee

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Task Performance ◽

Language Processing ◽

Automated Identification ◽

Clinical Notes ◽

New Information ◽

Perceived Workload ◽

The Impact ◽

User Experiment

Electronic medical records (EMRs) have been used extensively in most medical institutions for more than a decade in Taiwan. However, information overload associated with rapid accumulation of large amounts of clinical narratives has threatened the effective use of EMRs. This situation is further worsened by the use of “copying and pasting”, leading to lots of redundant information in clinical notes. This study aimed to apply natural language processing techniques to address this problem. New information in longitudinal clinical notes was identified based on a bigram language model. The accuracy of automated identification of new information was evaluated using expert annotations as the reference standard. A two-stage cross-over user experiment was conducted to evaluate the impact of highlighting of new information on task demands, task performance, and perceived workload. The automated method identified new information with an F1 score of 0.833. The user experiment found a significant decrease in perceived workload associated with a significantly higher task performance. In conclusion, automated identification of new information in clinical notes is feasible and practical. Highlighting of new information enables healthcare professionals to grasp key information from clinical notes with less perceived workload.

Download Full-text

Text Classification for Clinical Trial Operations: Evaluation and Comparison of Natural Language Processing Techniques

Therapeutic Innovation & Regulatory Science ◽

10.1007/s43441-020-00236-x ◽

2020 ◽

Author(s):

Emma Richard ◽

Bhargava Reddy

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Processing Techniques

Download Full-text