Comparison of Neural Network and Random Forest Classifier Performance on Dragon Fruit Disease

Author(s):  
Anita Jaquiline Lado ◽  
Adri Gabriel Sooai ◽  
Natalia Magdalena Rafu Mamulak ◽  
Paskalis Andrianus Nani ◽  
Yulianti Paula Bria ◽  
...  
2021 ◽  
Vol 36 (6) ◽  
pp. 1040-1040
Author(s):  
Emily Brickell ◽  
Andrew Whitford ◽  
Anneliese Boettcher ◽  
Carolina Pereira ◽  
R John Sawyer

Abstract Objective Machine learning (ML) classifier performance estimates are affected by sample size and class imbalance in training data, and yet performance is often reported with balanced data. We explore the effect of varying sample size and dementia conversion base rate on the performance of a classifier that predicts future dementia. Method Longitudinal data from the National Alzheimer’s Coordination Center (NACC) Uniform Data Set (UDS) were used. All participants had MCI at baseline. A random forest classifier (RFC) was trained to predict dementia at 1, 2, and 3 years. Predictors included baseline neuropsychological test scores, demographics, and health history. Cases were sampled at multiple sample sizes (N = 125, 250, 500, 1000 and 2000) and base rates (0.1, 0.2, 0.3, 0.4, and 0.5). Performance was evaluated using Matthews Correlation Coefficient (MCC). Results For balanced data (N = 1000), the classifier predicts conversion to dementia at 3 years with an MCC of 0.54 (sensitivity = 0.79; specificity = 0.75). As expected, means of classifier performance estimates decline as the conversion rate decreases. Likewise, variability of estimates increases with smaller sample sizes. For a conversion rate of 30%, consistent with many memory clinics, classifier performance declines only moderately (MCC = 0.44). In conversion rates of 10% and 20%, performance approaches chance. Performance trends illustrated in Figure 1. Conclusions Such classifiers may have clinical utility in memory clinics with higher conversion rates. Expected tradeoffs are observed with respect to diminishing sample size increasing error variance, and higher base rates of positive cases improving overall performance. Results provide potential guidelines for sample size and recruitment targets with RFC designs.


2017 ◽  
Author(s):  
Patrick J. Trainor ◽  
Andrew P. DeFilippis ◽  
Shesh N. Rai

AbstractStatistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes and for furnishing diagnostic and prognostic phenotype predictions in medicine. Despite this, a comprehensive and rigorous evaluation of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics data, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines, and Neural Network classification techniques for discriminating phenotype. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were randomly simulated to provide consistent estimates of classifier performance over a wide range of possible scenarios. The presence of non-normal error distributions and the effect of prior-significance filtering (dimension reduction) were evaluated. In each simulation, classifier parameters (such as the number of hidden nodes in a neural network) were tuned by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real clinical metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the scenarios without a significant presence of non-normal error distributions over metabolite clusters, Neural Network and PLS-DA classifiers performed poorly relative to Sparse PLS-DA (sPLS-DA), Support Vector Machine (SVM), and Random Forest classifiers. When non-normal error distributions were introduced, the performance of PLS-DA classifiers deteriorated further relative to the remaining techniques. Simultaneously, while the relative performance of Neural Network classifiers improved relative to PLS-DA classifiers, Neural Network classifier performance remained poor compared sPLS-DA, SVM, and Random Forest classifiers. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.


Author(s):  
Siji George C G, Et. al.

Sentiment analysis is one of the active research areas in the field of datamining. Machine learning algorithms are capable to implement sentiment analysis. Due to the capacity of self-learning and massive data handling, most of the researchers are using deep learning neural networks for solving sentiment classification tasks. So, in this paper, a new model is designed under a hybrid framework of machine learning and deep learning which couples Convolutional Neural Network and Random Forest classifier for fine-grained sentiment analysis. The Continuous Bag-of-Word (CBOW) model is used to vectorize the text input. The most important features are extracted by the Convolutional Neural Network (CNN). The extracted features are used by the Random Forest(RF) classifier for sentiment classification. The performance of the proposed hybrid CNNRF model is comparedwith the base model such as Convolutional Neural Network (CNN) and Random Forest (RF) classifier. The experimental result shows that the proposed model far beat the existing base models in terms of classification accuracy and effectively integrated genetically-modified CNN with Random Forest classifier.


Sign in / Sign up

Export Citation Format

Share Document