Combining an Expert-Based Medical Entity Recognizer to a Machine-Learning System: Methods and a Case Study

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.

Download Full-text

Data-Driven Machine Learning System for Optimization of Processes Supporting the Distribution of Goods and Services – a case study

Procedia Manufacturing ◽

10.1016/j.promfg.2020.02.205 ◽

2020 ◽

Vol 44 ◽

pp. 60-67

Author(s):

Zbigniew Tarapata ◽

Tadeusz Nowicki ◽

Ryszard Antkiewicz ◽

Jaroslaw Dudzinski ◽

Konrad Janik

Keyword(s):

Machine Learning ◽

Learning System ◽

Data Driven ◽

Goods And Services

Download Full-text

Machine learning-based named entity recognition via effective integration of various evidences

Natural Language Engineering ◽

10.1017/s1351324904003559 ◽

2005 ◽

Vol 11 (2) ◽

pp. 189-206 ◽

Cited By ~ 4

Author(s):

GUODONG ZHOU ◽

JIAN SU

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Learning System ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Data Sparseness ◽

Constraint Relaxation ◽

Text Document ◽

F Measure

Named entity recognition identifies and classifies entity names in a text document into some predefined categories. It resolves the “who”, “where” and “how much” problems in information extraction and leads to the resolution of the “what” and “how” problems in further processing. This paper presents a Hidden Markov Model (HMM) and proposes a HMM-based named entity recognizer implemented as the system PowerNE. Through the HMM and an effective constraint relaxation algorithm to deal with the data sparseness problem, PowerNE is able to effectively apply and integrate various internal and external evidences of entity names. Currently, four evidences are included: (1) a simple deterministic internal feature of the words, such as capitalization and digitalization; (2) an internal semantic feature of the important triggers; (3) an internal gazetteer feature, which determines the appearance of the current word string in the provided gazetteer list; and (4) an external macro context feature, which deals with the name alias phenomena. In this way, the named entity recognition problem is resolved effectively. PowerNE has been benchmarked with the Message Understanding Conferences (MUC) data. The evaluation shows that, using the formal training and test data of the MUC-6 and MUC-7 English named entity tasks, and it achieves the F-measures of 96.6 and 94.1, respectively. Compared with the best reported machine learning system, it achieves a 1.7 higher F-measure with one quarter of the training data on MUC-6, and a 3.6 higher F-measure with one ninth of the training data on MUC-7. In addition, it performs slightly better than the best reported handcrafted rule-based systems on MUC-6 and MUC-7.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

Semi-supervised machine learning approaches for predicting the chronology of archaeological sites: A case study of temples from medieval Angkor, Cambodia

PLoS ONE ◽

10.1371/journal.pone.0205649 ◽

2018 ◽

Vol 13 (11) ◽

pp. e0205649 ◽

Cited By ~ 4

Author(s):

Sarah Klassen ◽

Jonathan Weed ◽

Damian Evans

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Archaeological Sites ◽

Learning Approaches

Download Full-text

Discovering Business Processes from Email Logs using fastText and Process Mining

10.36227/techrxiv.12283835 ◽

2020 ◽

Author(s):

Yaghoub rashnavadi ◽

Sina Behzadifard ◽

Reza Farzadnia ◽

sina zamani

Keyword(s):

Machine Learning ◽

Business Processes ◽

Oil And Gas ◽

Process Mining ◽

The Body ◽

Process Models ◽

Supervised Machine Learning ◽

Implicit Information ◽

Oil And Gas Sector

<p>Communication has never been more accessible than today. With the help of Instant messengers and Email Services, millions of people can transfer information with ease, and this trend has affected organizations as well. There are billions of organizational emails sent or received daily, and their main goal is to facilitate the daily operation of organizations. Behind this vast corpus of human-generated content, there is much implicit information that can be mined and used to improve or optimize the organizations’ operations. Business processes are one of those implicit knowledge areas that can be discovered from Email logs of an Organization, as most of the communications are followed inside Emails. The purpose of this research is to propose an approach to discover the process models in the Email log. In this approach, we combine two tools, supervised machine learning and process mining. With the help of supervised machine learning, fastText classifier, we classify the body text of emails to the activity-related. Then the generated log will be mined with process mining techniques to find process models. We illustrate the approach with a case study company from the oil and gas sector.</p>

Download Full-text

Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/3403150 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 10

Author(s):

Fisnik Dalipi ◽

Sule Yildirim Yayilgan ◽

Alemayehu Gebremedhin

Keyword(s):

Machine Learning ◽

Heat Load ◽

District Heating ◽

Heating System ◽

Partial Least Square ◽

Supervised Machine Learning ◽

Data Driven ◽

Support Vector ◽

Load Prediction ◽

District Heating System

We present our data-driven supervised machine-learning (ML) model to predict heat load for buildings in a district heating system (DHS). Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR), Partial Least Square (PLS), and random forest (RF). We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text

Data-driven sensitivity analysis of complex machine learning models: A case study of directional drilling

Journal of Petroleum Science and Engineering ◽

10.1016/j.petrol.2020.107630 ◽

2020 ◽

Vol 195 ◽

pp. 107630

Author(s):

Andrzej T. Tunkiel ◽

Dan Sui ◽

Tomasz Wiktorski

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Data Driven ◽

Directional Drilling ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Towards a Multi-Layered Phishing Detection

Sensors ◽

10.3390/s20164540 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4540

Author(s):

Kieran Rendall ◽

Antonia Nisioti ◽

Alexios Mylonas

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Detection System ◽

Single Layer ◽

Supervised Machine Learning ◽

Data Driven ◽

Feature Sets ◽

Phishing Attacks ◽

Production Environments ◽

Phishing Detection

Phishing is one of the most common threats that users face while browsing the web. In the current threat landscape, a targeted phishing attack (i.e., spear phishing) often constitutes the first action of a threat actor during an intrusion campaign. To tackle this threat, many data-driven approaches have been proposed, which mostly rely on the use of supervised machine learning under a single-layer approach. However, such approaches are resource-demanding and, thus, their deployment in production environments is infeasible. Moreover, most previous works utilise a feature set that can be easily tampered with by adversaries. In this paper, we investigate the use of a multi-layered detection framework in which a potential phishing domain is classified multiple times by models using different feature sets. In our work, an additional classification takes place only when the initial one scores below a predefined confidence level, which is set by the system owner. We demonstrate our approach by implementing a two-layered detection system, which uses supervised machine learning to identify phishing attacks. We evaluate our system with a dataset consisting of active phishing attacks and find that its performance is comparable to the state of the art.

Download Full-text

Enhance 3D Point Cloud Accuracy Through Supervised Machine Learning for Automated Rolling Stock Maintenance: A Railway Sector Case Study

2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE) ◽

10.1109/iccecome.2018.8658788 ◽

2018 ◽

Author(s):

Randika K. W. Vithanage ◽

Colin S. Harrison ◽

Anjali K. M. M. DeSilva

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Supervised Machine Learning ◽

Rolling Stock ◽

3D Point Cloud

Download Full-text