Application of natural language processing algorithms for extracting information from news articles in event-based surveillance

Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms

Academic Radiology ◽

10.1016/j.acra.2021.01.017 ◽

2021 ◽

Author(s):

Matthew D. Li ◽

Francis Deng ◽

Ken Chang ◽

Jayashree Kalpathy-Cramer ◽

Ambrose J. Huang

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Meniscal Tears ◽

Processing Algorithms

Download Full-text

EventEpi–A Natural Language Processing Framework for Event-Based Surveillance

10.1101/19006395 ◽

2019 ◽

Author(s):

Auss Abbood ◽

Alexander Ullrich ◽

Rüdiger Busche ◽

Stéphane Ghozzi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Fine Tuning ◽

Entity Recognition ◽

World Health ◽

Support Vector ◽

Event Based ◽

Processing Framework

AbstractAccording to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of epidemiologists sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural-language-processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles’ key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We trained a naive Bayes classifier to find the single most likely one using RKI’s EBS database as labels. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using document and word embeddings. Two of the tested algorithms stood out: The multilayer perceptron performed best overall, with a precision of 0.19, recall of 0.50, specificity of 0.89, F1 of 0.28, and the highest tested index balanced accuracy of 0.46. The support-vector machine, on the other hand, had the highest recall (0.88) which can be of higher interest for epidemiologists. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code is publicly available at https://github.com/aauss/EventEpi.

Download Full-text

Identification Technology of Grid Monitoring Alarm Event Based on Natural Language Processing and Deep Learning in China

Energies ◽

10.3390/en12173258 ◽

2019 ◽

Vol 12 (17) ◽

pp. 3258 ◽

Cited By ~ 2

Author(s):

Bai ◽

Sun ◽

Zang ◽

Zhang ◽

Shen ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Short Term Memory ◽

Event Identification ◽

Short Period ◽

Grid Monitoring ◽

Low Efficiency ◽

Event Based ◽

Power Dispatching

Power dispatching systems currently receive massive, complicated, and irregular monitoring alarms during their operation, which prevents the controllers from making accurate judgments on the alarm events that occur within a short period of time. In view of the current situation with the low efficiency of monitoring alarm information, this paper proposes a method based on natural language processing (NLP) and a hybrid model that combines long short-term memory (LSTM) and convolutional neural network (CNN) for the identification of grid monitoring alarm events. Firstly, the characteristics of the alarm information text were analyzed and induced and then preprocessed. Then, the monitoring alarm information was vectorized based on the Word2vec model. Finally, a monitoring alarm event identification model based on a combination of LSTM and CNN was established for the characteristics of the alarm information. The feasibility and effectiveness of the method in this paper were verified by comparison with multiple identification models.

Download Full-text

Predicting Verification Methods from Natural Language Requirements

10.31224/osf.io/wxv9e ◽

2020 ◽

Author(s):

Michael Prendergast

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Analysis ◽

Nearest Neighbor ◽

Large Set ◽

Technical Requirements ◽

Verification Methods ◽

Natural Language Requirements ◽

Processing Algorithms

Abstract – A Verification Cross-Reference Matrix (VCRM) is a table that depicts the verification methods for requirements in a specification. Usually requirement labels are rows, available test methods are columns, and an “X” in a cell indicates usage of a verification method for that requirement. Verification methods include Demonstration, Inspection, Analysis and Test, and sometimes Certification, Similarity and/or Analogy. VCRMs enable acquirers and stakeholders to quickly understand how a product’s requirements will be tested.Maintaining consistency of very large VCRMs can be challenging, and inconsistent verification methods can result in a large set of uncoordinated “spaghetti tests”. Natural language processing algorithms that can identify similarities between requirements offer promise in addressing this challenge.This paper applies and compares compares four natural language processing algorithms to the problem of automatically populating VCRMs from natural language requirements: Naïve Bayesian inference, (b) Nearest Neighbor by weighted Dice similarity, (c) Nearest Neighbor with Latent Semantic Analysis similarity, and (d) an ensemble method combining the first three approaches. The VCRMs used for this study are for slot machine technical requirements derived from gaming regulations from the countries of Australia and New Zealand, the province of Nova Scotia (Canada), the state of Michigan (United States) and recommendations from the International Association of Gaming Regulators (IAGR).

Download Full-text

Machine Learning-based Natural Language Processing Algorithms and Electronic Health Records Data

Linguistic and Philosophical Investigations ◽

10.22381/lpi1920205 ◽

2020 ◽

Vol 19 (0) ◽

pp. 93

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Health Records ◽

Processing Algorithms ◽

Electronic Health

Download Full-text

Conversational Artificial Intelligence/Natural Language Processing Algorithms for Modeling and Research Summarization of Friction Stir Welded Aluminum Joints

10.33774/chemrxiv-2021-hbxdx ◽

2021 ◽

Author(s):

Akshansh Mishra

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Friction Stir ◽

Processing Algorithms

Download Full-text

Use of Natural Language Processing Algorithms to Identify Common Data Elements in Operative Notes for Total Hip Arthroplasty

The Journal of Bone and Joint Surgery (American) ◽

10.2106/jbjs.19.00071 ◽

2019 ◽

Vol 101 (21) ◽

pp. 1931-1938 ◽

Cited By ~ 11

Author(s):

Cody C. Wyles ◽

Meagan E. Tibbo ◽

Sunyang Fu ◽

Yanshan Wang ◽

Sunghwan Sohn ◽

...

Keyword(s):

Natural Language Processing ◽

Total Hip Arthroplasty ◽

Natural Language ◽

Language Processing ◽

Hip Arthroplasty ◽

Common Data Elements ◽

Total Hip ◽

Processing Algorithms ◽

Data Elements ◽

Operative Notes

Download Full-text

Abstract WP382: Using Natural Language Processing Algorithms to Identify Stroke Cases and Stroke Subtypes From Neuroimaging Reports

Stroke ◽

10.1161/str.50.suppl_1.wp382 ◽

2019 ◽

Vol 50 (Suppl_1) ◽

Author(s):

Carin Northuis ◽

Martin Michalowski ◽

Kamakshi Lakshminarayan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Stroke Subtypes ◽

Processing Algorithms

Download Full-text

Research and Experiment of Intelligent Natural Language Processing Algorithms

Wireless Personal Communications ◽

10.1007/s11277-018-5316-2 ◽

2018 ◽

Vol 102 (4) ◽

pp. 2927-2939

Author(s):

Zeliang Zhang ◽

Xinwen Bi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Algorithms

Download Full-text

Natural Language Processing Algorithms for Normalizing Expressions of Synonymous Symptoms in Traditional Chinese Medicine

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2021/6676607 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Lu Zhou ◽

Shuangqiao Liu ◽

Caiyan Li ◽

Yuemeng Sun ◽

Yizhuo Zhang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Classification Model ◽

Generation Model ◽

Sigmoid Function ◽

Sequence Generation ◽

Model Based ◽

Processing Algorithms

Background. The modernization of traditional Chinese medicine (TCM) demands systematic data mining using medical records. However, this process is hindered by the fact that many TCM symptoms have the same meaning but different literal expressions (i.e., TCM synonymous symptoms). This problem can be solved by using natural language processing algorithms to construct a high-quality TCM symptom normalization model for normalizing TCM synonymous symptoms to unified literal expressions. Methods. Four types of TCM symptom normalization models, based on natural language processing, were constructed to find a high-quality one: (1) a text sequence generation model based on a bidirectional long short-term memory (Bi-LSTM) neural network with an encoder-decoder structure; (2) a text classification model based on a Bi-LSTM neural network and sigmoid function; (3) a text sequence generation model based on bidirectional encoder representation from transformers (BERT) with sequence-to-sequence training method of unified language model (BERT-UniLM); (4) a text classification model based on BERT and sigmoid function (BERT-Classification). The performance of the models was compared using four metrics: accuracy, recall, precision, and F1-score. Results. The BERT-Classification model outperformed the models based on Bi-LSTM and BERT-UniLM with respect to the four metrics. Conclusions. The BERT-Classification model has superior performance in normalizing expressions of TCM synonymous symptoms.

Download Full-text