Deterministic dropout for deep neural networks using composite random forest

Background Cardiac dysrhythmias (CD) affect millions of Americans in the United States (US), and are associated with considerable morbidity and mortality. New strategies to combat this growing problem are urgently needed. Objectives Predicting CD using electronic health record (EHR) data would allow for earlier diagnosis and treatment of the condition, thus improving overall cardiovascular outcomes. The Guideline Advantage (TGA) is an American Heart Association ambulatory quality clinical data registry of EHR data representing 70 clinics distributed throughout the US, and has been used to monitor outpatient prevention and disease management outcome measures across populations and for longitudinal research on the impact of preventative care. Methods For this study, we represented all time-series cardiovascular health (CVH) measures and the corresponding data collection time points for each patient by numerical embedding vectors. We then employed a deep learning technique–long-short term memory (LSTM) model–to predict CD from the vector of time-series CVH measures by 5-fold cross validation and compared the performance of this model to the results of deep neural networks, logistic regression, random forest, and Naïve Bayes models. Results We demonstrated that the LSTM model outperformed other traditional machine learning models and achieved the best prediction performance as measured by the average area under the receiver operator curve (AUROC): 0.76 for LSTM, 0.71 for deep neural networks, 0.66 for logistic regression, 0.67 for random forest, and 0.59 for Naïve Bayes. The most influential feature from the LSTM model were blood pressure. Conclusions These findings may be used to prevent CD in the outpatient setting by encouraging appropriate surveillance and management of CVH.

Download Full-text

Androgen Receptor Binding Prediction with Random Forest, Deep Neural Networks, and Graph Convolutional Neural Networks

10.20944/preprints202102.0318.v1 ◽

2021 ◽

Author(s):

Alfonso T. García-Sosa

Keyword(s):

Prostate Cancer ◽

Neural Networks ◽

Androgen Receptor ◽

Random Forest ◽

Deep Neural Networks ◽

State Of The Art ◽

Source Code ◽

Protein Structures ◽

Binding Prediction ◽

Machine Learning Classifiers

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to different results, with deep neural networks on user-defined physicochemically-relevant features developed for this work outperform graph convolutional, random forest, and large featurizations. The results can help provide clues on risk of substances and better experimental design for toxicity assays. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML

Download Full-text

A Hybrid Short-Term Load Forecasting Model Based on Improved Fuzzy C-Means Clustering, Random Forest and Deep Neural Networks

IEEE Access ◽

10.1109/access.2021.3063123 ◽

2021 ◽

Vol 9 ◽

pp. 59754-59765

Author(s):

Fu Liu ◽

Tian Dong ◽

Tao Hou ◽

Yun Liu

Keyword(s):

Neural Networks ◽

Random Forest ◽

Deep Neural Networks ◽

Load Forecasting ◽

Forecasting Model ◽

Short Term ◽

Fuzzy C Means ◽

Model Based ◽

Short Term Load Forecasting ◽

Fuzzy C Means Clustering

Download Full-text

Classifying image sequences of astronomical transients with deep neural networks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2973 ◽

2020 ◽

Vol 499 (3) ◽

pp. 3130-3138

Author(s):

Catalina Gómez ◽

Mauricio Neira ◽

Marcela Hernández Hoyos ◽

Pablo Arbeláez ◽

Jaime E Forero-Romero

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Random Forest ◽

Deep Neural Networks ◽

Light Curves ◽

Image Subtraction ◽

Data Set ◽

Random Forest Classification ◽

Forest Classification ◽

Astronomical Images

ABSTRACT Supervised classification of temporal sequences of astronomical images into meaningful transient astrophysical phenomena has been considered a hard problem because it requires the intervention of human experts. The classifier uses the expert’s knowledge to find heuristic features to process the images, for instance, by performing image subtraction or by extracting sparse information such as flux time-series, also known as light curves. We present a successful deep learning approach that learns directly from imaging data. Our method models explicitly the spatiotemporal patterns with deep convolutional neural networks and gated recurrent units. We train these deep neural networks using 1.3 million real astronomical images from the Catalina Real-Time Transient Survey to classify the sequences into five different types of astronomical transient classes. The TAO-Net (for Transient Astronomical Objects Network) architecture outperforms the results from random forest classification on light curves by 10 percentage points as measured by the F1 score for each class; the average F1 over classes goes from $45{{\ \rm percent}}$ with random forest classification to $55{{\ \rm percent}}$ with TAO-Net. This achievement with TAO-Net opens the possibility to develop new deep learning architectures for early transient detection. We make available the training data set and trained models of TAO-Net to allow for future extensions of this work.

Download Full-text

Predicting Melanoma Staging using Targeted RNA Sequencing data using Machine Learning.

10.1101/2021.11.03.21265077 ◽

2021 ◽

Author(s):

Fahad Shabbir Ahmed ◽

Furqan Bin Irfan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Rna Sequencing ◽

Deep Neural Networks ◽

Tumor Staging ◽

Nodal Metastasis ◽

Random Forest Classifier ◽

Sequencing Data ◽

Diagnostic Aspect

The aim of this study is to use machine learning to predict tumor staging and metastasis in melanoma with differentially expressed genes. Machine has been used in different clinical setting to predict different outcomes. However, it has not been used to look at predicting the diagnostic aspect of tumor staging. We used the TCGA RNA-Sequencing data on melanomas to predict tumor staging nodal and/or metastasis using deep neural networks (DNN) and random forest classifier (RF). Results: We were able to predict tumor staging (lower vs higher stage, i.e. Tis / T1 / T2 vs T3 and higher), nodal metastasis and combined nodal or distant metastasis in patients with melanomas with high accuracies. However, we need to further validate these results.

Download Full-text

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Sustainability ◽

10.3390/su11030699 ◽

2019 ◽

Vol 11 (3) ◽

pp. 699 ◽

Cited By ~ 13

Author(s):

Lkhagvadorj Munkhdalai ◽

Tsendsuren Munkhdalai ◽

Oyun-Erdene Namsrai ◽

Jong Lee ◽

Keun Ryu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Deep Neural Networks ◽

Credit Scoring ◽

Support Vector ◽

Learning Approaches ◽

Learning Methods ◽

Human Expert ◽

Machine Learning Methods

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Download Full-text