A New Stacking Based Ensemble Predictive Model to Predict the Mortality after Heart Surgery on a Highly Imbalanced Dataset

Mapping Intimacies ◽

10.20944/preprints202109.0181.v1 ◽

2021 ◽

Author(s):

Arman Ghavidel ◽

Rouzbeh Ghousi ◽

Alireza Atashi

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Heart Surgery ◽

Cross Validation ◽

Nearest Neighbor ◽

Sampling Algorithm ◽

Predicting Performance ◽

Unbalanced Dataset ◽

Under Sampling ◽

Fold Cross Validation

Nowadays, according to spectacular improvement in health care and biomedical level, a tremendous amount of data is recorded by hospitals. In addition, the most effective approach to reduce disease mortality is to diagnose it as soon as possible. As a result, data mining by applying machine learning in the field of diseases provides good opportunities to examine the hidden patterns of this collection. An exact forecast of the mortality after heart surgery will cause Successful medical treatment and fewer costs. This research wants to recommend a new stacking predictive model after utilizing the random forest feature importance method to foresee the mortality after heart surgery on a highly unbalanced dataset by using the most practical features. To solve the unbalanced data problem, a combination of the SVM-SMOTE over-sampling algorithm and the Edited-Nearest-Neighbor under-sampling algorithm is used. This research compares the introduced model with some other machine learning classifiers to ensure efficiency through shuffle hold-out and 10-fold cross-validation strategies. In order to validate the performance of the implemented machine learning methods in this research, both shuffle hold-out, and 10-fold cross-validation results indicated that our model had the highest efficiency compared to the other models. Furthermore, the Friedman statistical test is applied to survey the differences between models. The result demonstrates that the introduced stacking model reached the most accurate predicting performance after Logistic Regression.

Download Full-text

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646 ◽

2018 ◽

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

Download Full-text

K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1204 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Zoelkarnain Rinanda Tembusai ◽

Herman Mawengkang ◽

Muhammad Zarlis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Analytic Hierarchy Process ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Analytic Hierarchy ◽

Machine Learning Model ◽

Hierarchy Process ◽

Fold Cross Validation

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Heartbeat Detection by Laser Doppler Vibrometry and Machine Learning

Sensors ◽

10.3390/s20185362 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5362 ◽

Cited By ~ 1

Author(s):

Luca Antognoli ◽

Sara Moccia ◽

Lucia Migliorelli ◽

Sara Casaccia ◽

Lorenzo Scalise ◽

...

Keyword(s):

Machine Learning ◽

Signal Analysis ◽

Cross Validation ◽

Nearest Neighbor ◽

Laser Doppler Vibrometer ◽

Laser Doppler ◽

Support Vector ◽

K Nearest Neighbor ◽

Real World Application ◽

Testing Protocol

Background: Heartbeat detection is a crucial step in several clinical fields. Laser Doppler Vibrometer (LDV) is a promising non-contact measurement for heartbeat detection. The aim of this work is to assess whether machine learning can be used for detecting heartbeat from the carotid LDV signal. Methods: The performances of Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and K-Nearest Neighbor (KNN) were compared using the leave-one-subject-out cross-validation as the testing protocol in an LDV dataset collected from 28 subjects. The classification was conducted on LDV signal windows, which were labeled as beat, if containing a beat, or no-beat, otherwise. The labeling procedure was performed using electrocardiography as the gold standard. Results: For the beat class, the f1-score (f1) values were 0.93, 0.93, 0.95, 0.96 for RF, DT, KNN and SVM, respectively. No statistical differences were found between the classifiers. When testing the SVM on the full-length (10 min long) LDV signals, to simulate a real-world application, we achieved a median macro-f1 of 0.76. Conclusions: Using machine learning for heartbeat detection from carotid LDV signals showed encouraging results, representing a promising step in the field of contactless cardiovascular signal analysis.

Download Full-text

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Download Full-text

North American Hardwoods Identification Using Machine-Learning

Forests ◽

10.3390/f11030298 ◽

2020 ◽

Vol 11 (3) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Dercilio Junior Verly Lopes ◽

Greg W. Burgreen ◽

Edward D. Entsminger

Keyword(s):

Machine Learning ◽

North American ◽

Mobile Application ◽

Cross Validation ◽

Data Augmentation ◽

Technical Note ◽

Machine Learning Method ◽

Training Set ◽

Hardwood Species ◽

Fold Cross Validation

This technical note determines the feasibility of using an InceptionV4_ResNetV2 convolutional neural network (CNN) to correctly identify hardwood species from macroscopic images. The method is composed of a commodity smartphone fitted with a 14× macro lens for photography. The end-grains of ten different North American hardwood species were photographed to create a dataset of 1869 images. The stratified 5-fold cross-validation machine-learning method was used, in which the number of testing samples varied from 341 to 342. Data augmentation was performed on-the-fly for each training set by rotating, zooming, and flipping images. It was found that the CNN could correctly identify hardwood species based on macroscopic images of its end-grain with an adjusted accuracy of 92.60%. With the current growing of machine-learning field, this model can then be readily deployed in a mobile application for field wood identification.

Download Full-text

Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network

Applied Sciences ◽

10.3390/app10061999 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1999 ◽

Cited By ~ 7

Author(s):

Milica M. Badža ◽

Marko Č. Barjaktarović

Keyword(s):

Neural Network ◽

Machine Learning ◽

Brain Tumors ◽

Convolutional Neural Network ◽

Cross Validation ◽

Magnetic Resonance Images ◽

Generalization Capability ◽

Data Set ◽

Fold Cross Validation

The classification of brain tumors is performed by biopsy, which is not usually conducted before definitive brain surgery. The improvement of technology and machine learning can help radiologists in tumor diagnostics without invasive measures. A machine-learning algorithm that has achieved substantial results in image segmentation and classification is the convolutional neural network (CNN). We present a new CNN architecture for brain tumor classification of three tumor types. The developed network is simpler than already-existing pre-trained networks, and it was tested on T1-weighted contrast-enhanced magnetic resonance images. The performance of the network was evaluated using four approaches: combinations of two 10-fold cross-validation methods and two databases. The generalization capability of the network was tested with one of the 10-fold methods, subject-wise cross-validation, and the improvement was tested by using an augmented image database. The best result for the 10-fold cross-validation method was obtained for the record-wise cross-validation for the augmented data set, and, in that case, the accuracy was 96.56%. With good generalization capability and good execution speed, the new developed CNN architecture could be used as an effective decision-support tool for radiologists in medical diagnostics.

Download Full-text

Issues in performance evaluation for host–pathogen protein interaction prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500116 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1650011 ◽

Cited By ~ 9

Author(s):

Wajid Arshad Abbasi ◽

Fayyaz Ul Amir Afsar Minhas

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Cross Validation ◽

Protein Protein Interactions ◽

Evaluation Scheme ◽

Host Pathogen ◽

Pathogen Protein ◽

Protein Interaction Prediction ◽

Underlying Mechanisms ◽

Fold Cross Validation

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

Download Full-text