An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning Applied to Gastrointestinal Tract Abnormality Classification

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Predicting lineage-specific differences in open chromatin across dozens of mammalian genomes

10.1101/2020.12.04.410795 ◽

2020 ◽

Author(s):

Irene M. Kaplow ◽

Morgan E. Wirthlin ◽

Alyssa J. Lawler ◽

Ashley R. Brown ◽

Michael Kleyman ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Genome Sequence ◽

Evaluation Metrics ◽

Open Chromatin ◽

Learning Models ◽

Cell Type ◽

Mammalian Genomes ◽

Multiple Species ◽

Machine Learning Models

ABSTRACTMany phenotypes have evolved through gene expression, meaning that differences between species are caused in part by differences in enhancers. Here, we demonstrate that we can accurately predict differences between species in open chromatin status at putative enhancers using machine learning models trained on genome sequence across species. We present a new set of criteria that we designed to explicitly demonstrate if models are useful for studying open chromatin regions whose orthologs are not open in every species. Our approach and evaluation metrics can be applied to any tissue or cell type with open chromatin data available from multiple species.

Download Full-text

Towards Deep Learning-Based Approach for Detecting Android Malware

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch096 ◽

2021 ◽

pp. 2193-2219

Author(s):

Jarrett Booz ◽

Josh McGiff ◽

William G. Hatcher ◽

Wei Yu ◽

James Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Environment ◽

Malware Detection ◽

Extensive Study ◽

Detection Accuracy ◽

Android Malware ◽

Android Malware Detection ◽

Mobile Malware Detection ◽

Optimal Settings

In this article, the authors implement a deep learning environment and fine-tune parameters to determine the optimal settings for the classification of Android malware from extracted permission data. By determining the optimal settings, the authors demonstrate the potential performance of a deep learning environment for Android malware detection. Specifically, an extensive study is conducted on various hyper-parameters to determine optimal configurations, and then a performance evaluation is carried out on those configurations to compare and maximize detection accuracy in our target networks. The results achieve a detection accuracy of approximately 95%, with an approximate F1 score of 93%. In addition, the evaluation is extended to include other machine learning frameworks, specifically comparing Microsoft Cognitive Toolkit (CNTK) and Theano with TensorFlow. The future needs are discussed in the realm of machine learning for mobile malware detection, including adversarial training, scalability, and the evaluation of additional data and features.

Download Full-text

Design and Implementation of Machine Learning Evaluation Metrics on HPCC Systems

2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS) ◽

10.1109/csitss47250.2019.9031056 ◽

2019 ◽

Author(s):

A. Suryanarayanan ◽

Arjuna Chala ◽

Lili Xu ◽

G Shobha ◽

Jyoti Shetty ◽

...

Keyword(s):

Machine Learning ◽

Evaluation Metrics ◽

Design And Implementation ◽

Learning Evaluation ◽

Hpcc Systems

Download Full-text

Software defect prediction: A multi-criteria decision-making approach

Nigerian Journal of Technological Research ◽

10.4314/njtr.v15i1.7 ◽

2020 ◽

Vol 15 (1) ◽

pp. 35-42

Author(s):

A.O. Balogun ◽

A.O. Bajeh ◽

H.A. Mojeed ◽

A.G. Akintola

Keyword(s):

Machine Learning ◽

Software Testing ◽

Evaluation Metrics ◽

Defect Prediction ◽

Software Systems ◽

Software Defect Prediction ◽

Learning Models ◽

Decision Method ◽

Software Defect ◽

Machine Learning Models

Failure of software systems as a result of software testing is very much rampant as modern software systems are large and complex. Software testing which is an integral part of the software development life cycle (SDLC), consumes both human and capital resources. As such, software defect prediction (SDP) mechanisms are deployed to strengthen the software testing phase in SDLC by predicting defect prone modules or components in software systems. Machine learning models are used for developing the SDP models with great successes achieved. Moreover, some studies have highlighted that a combination of machine learning models as a form of an ensemble is better than single SDP models in terms of prediction accuracy. However, the efficiency of machine learning models can change with diverse predictive evaluation metrics. Thus, more studies are needed to establish the effectiveness of ensemble SDP models over single SDP models. This study proposes the deployment of Multi-Criteria Decision Method (MCDM) techniques to rank machine learning models. Analytic Network Process (ANP) and Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) which are types of MCDM techniques are deployed on 9 machine learning models with 11 performance evaluation metrics and 11 software defects datasets. The experimental results showed that ensemble SDP models are best appropriate SDP models as Boosted SMO and Boosted PART ranked highest for each of the MCDM techniques. Besides, the experimental results also validated the stand of not considering accuracy as the only performance evaluation metrics for SDP models. Conclusively, more performance metrics other than predictive accuracy should be considered when ranking and evaluating machine learning models. Keywords: Ensemble; Multi-Criteria Decision Method; Software Defect Prediction

Download Full-text

INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00198-1 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Hooman Zabeti ◽

Nick Dexter ◽

Amir Hosein Safari ◽

Nafiseh Sedaghat ◽

Maxwell Libbrecht ◽

...

Keyword(s):

Machine Learning ◽

Drug Resistance ◽

Predictive Accuracy ◽

Group Testing ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Evaluation Metrics ◽

Lower Accuracy ◽

Unseen Data ◽

The One

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.

Download Full-text

Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches (Preprint)

10.2196/preprints.16678 ◽

2019 ◽

Author(s):

Adane Tarekegn ◽

Fulvio Ricceri ◽

Giuseppe Costa ◽

Elisa Ferracin ◽

Mario Giacobini

Keyword(s):

Machine Learning ◽

Older Adults ◽

Elderly People ◽

Emergency Admission ◽

Evaluation Metrics ◽

Support Vector ◽

Learning Models ◽

Increased Risk ◽

Wide Range ◽

Machine Learning Models

BACKGROUND Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings. OBJECTIVE The objective of our study was to develop predictive models for frailty conditions in older people using different machine learning methods based on a database of clinical characteristics and socioeconomic factors. METHODS An administrative health database containing 1,095,612 elderly people aged 65 or older with 58 input variables and 6 output variables was used. We first identify and define six problems/outputs as surrogates of frailty. We then resolve the imbalanced nature of the data through resampling process and a comparative study between the different machine learning (ML) algorithms – Artificial neural network (ANN), Genetic programming (GP), Support vector machines (SVM), Random Forest (RF), Logistic regression (LR) and Decision tree (DT) – was carried out. The performance of each model was evaluated using a separate unseen dataset. RESULTS Predicting mortality outcome has shown higher performance with ANN (TPR 0.81, TNR 0.76, accuracy 0.78, F1-score 0.79) and SVM (TPR 0.77, TNR 0.80, accuracy 0.79, F1-score 0.78) than predicting the other outcomes. On average, over the six problems, the DT classifier has shown the lowest accuracy, while other models (GP, LR, RF, ANN, and SVM) performed better. All models have shown lower accuracy in predicting an event of an emergency admission with red code than predicting fracture and disability. In predicting urgent hospitalization, only SVM achieved better performance (TPR 0.75, TNR 0.77, accuracy 0.73, F1-score 0.76) with the 10-fold cross validation compared with other models in all evaluation metrics. CONCLUSIONS We developed machine learning models for predicting frailty conditions (mortality, urgent hospitalization, disability, fracture, and emergency admission). The results show that the prediction performance of machine learning models significantly varies from problem to problem in terms of different evaluation metrics. Through further improvement, the model that performs better can be used as a base for developing decision-support tools to improve early identification and prediction of frail older adults.

Download Full-text

Comparison of Stochastic and Machine Learning Methods for Multi-Step Ahead Forecasting of Hydrological Processes

10.20944/preprints201710.0133.v2 ◽

2018 ◽

Cited By ~ 2

Author(s):

Georgia Papacharalampous ◽

Hristos Tyralis ◽

Demetris Koutsoyiannis

Keyword(s):

Machine Learning ◽

Time Series ◽

Large Scale ◽

Simulation Experiment ◽

Extensive Study ◽

Discharge Time ◽

Machine Learning Methods ◽

Stationary Stochastic Processes ◽

The Subject ◽

First Time

Research within the field of hydrology often focuses on comparing stochastic to machine learning (ML) forecasting methods. The comparisons performed are all based on case studies, while an extensive study aiming to provide generalized results on the subject is missing. Herein, we compare 11 stochastic and 9 ML methods regarding their multi-step ahead forecasting properties by conducting 12 large-scale computational experiments based on simulations. Each of these experiments uses 2 000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 100 values and the second time using time series of 300 values. Additionally, we conduct a real-world experiment using 405 mean annual river discharge time series of 100 values. We quantify the performance of the methods using 18 metrics. The results indicate that stochastic and ML methods perform equally well.

Download Full-text

Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms

Data Science and Knowledge Engineering for Sensing Decision Support ◽

10.1142/9789813273238_0160 ◽

2018 ◽

Cited By ~ 3

Author(s):

Rachel Heyburn ◽

Raymond R. Bond ◽

Michaela Black ◽

Maurice Mulvenna ◽

Jonathan Wallace ◽

...

Keyword(s):

Machine Learning ◽

Real Data ◽

Evaluation Metrics ◽

Data Similarity

Download Full-text

Towards Deep Learning-Based Approach for Detecting Android Malware

International Journal of Software Innovation ◽

10.4018/ijsi.2019100101 ◽

2019 ◽

Vol 7 (4) ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Jarrett Booz ◽

Josh McGiff ◽

William G. Hatcher ◽

Wei Yu ◽

James Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Environment ◽

Malware Detection ◽

Extensive Study ◽

Detection Accuracy ◽

Android Malware ◽

Android Malware Detection ◽

Mobile Malware Detection ◽

Optimal Settings

In this article, the authors implement a deep learning environment and fine-tune parameters to determine the optimal settings for the classification of Android malware from extracted permission data. By determining the optimal settings, the authors demonstrate the potential performance of a deep learning environment for Android malware detection. Specifically, an extensive study is conducted on various hyper-parameters to determine optimal configurations, and then a performance evaluation is carried out on those configurations to compare and maximize detection accuracy in our target networks. The results achieve a detection accuracy of approximately 95%, with an approximate F1 score of 93%. In addition, the evaluation is extended to include other machine learning frameworks, specifically comparing Microsoft Cognitive Toolkit (CNTK) and Theano with TensorFlow. The future needs are discussed in the realm of machine learning for mobile malware detection, including adversarial training, scalability, and the evaluation of additional data and features.

Download Full-text