The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features

NeuroImage ◽  
2018 ◽  
Vol 178 ◽  
pp. 622-637 ◽  
Author(s):  
Zaixu Cui ◽  
Gaolang Gong
2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


2021 ◽  
Author(s):  
Herdiantri Sufriyana ◽  
Yu Wei Wu ◽  
Emily Chia-Yu Su

Abstract We aimed to provide a resampling protocol for dimensional reduction resulting a few latent variables. The applicability focuses on but not limited for developing a machine learning prediction model in order to improve the number of sample size in relative to the number of candidate predictors. By this feature representation technique, one can improve generalization by preventing latent variables to overfit data used to conduct the dimensional reduction. However, this technique may warrant more computational capacity and time to conduct the procedure. The key stages consisted of derivation of latent variables from multiple resampling subsets, parameter estimation of latent variables in population, and selection of latent variables transformed by the estimated parameters.


2021 ◽  
Vol 2083 (3) ◽  
pp. 032059
Author(s):  
Qiang Chen ◽  
Meiling Deng

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.


2021 ◽  
Vol 12 ◽  
Author(s):  
Bidhan Lamichhane ◽  
Andy G. S. Daniel ◽  
John J. Lee ◽  
Daniel S. Marcus ◽  
Joshua S. Shimony ◽  
...  

Glioblastoma multiforme (GBM) is the most frequently occurring brain malignancy. Due to its poor prognosis with currently available treatments, there is a pressing need for easily accessible, non-invasive techniques to help inform pre-treatment planning, patient counseling, and improve outcomes. In this study we determined the feasibility of resting-state functional connectivity (rsFC) to classify GBM patients into short-term and long-term survival groups with respect to reported median survival (14.6 months). We used a support vector machine with rsFC between regions of interest as predictive features. We employed a novel hybrid feature selection method whereby features were first filtered using correlations between rsFC and OS, and then using the established method of recursive feature elimination (RFE) to select the optimal feature subset. Leave-one-subject-out cross-validation evaluated the performance of models. Classification between short- and long-term survival accuracy was 71.9%. Sensitivity and specificity were 77.1 and 65.5%, respectively. The area under the receiver operating characteristic curve was 0.752 (95% CI, 0.62–0.88). These findings suggest that highly specific features of rsFC may predict GBM survival. Taken together, the findings of this study support that resting-state fMRI and machine learning analytics could enable a radiomic biomarker for GBM, augmenting care and planning for individual patients.


Sign in / Sign up

Export Citation Format

Share Document