Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

Author(s):  
Lucio F.M. Mota ◽  
Sara Pegolo ◽  
Toshimi Baba ◽  
Francisco Peñagaricano ◽  
Gota Morota ◽  
...  
2021 ◽  
Author(s):  
Mu Yue

In high-dimensional data, penalized regression is often used for variable selection and parameter estimation. However, these methods typically require time-consuming cross-validation methods to select tuning parameters and retain more false positives under high dimensionality. This chapter discusses sparse boosting based machine learning methods in the following high-dimensional problems. First, a sparse boosting method to select important biomarkers is studied for the right censored survival data with high-dimensional biomarkers. Then, a two-step sparse boosting method to carry out the variable selection and the model-based prediction is studied for the high-dimensional longitudinal observations measured repeatedly over time. Finally, a multi-step sparse boosting method to identify patient subgroups that exhibit different treatment effects is studied for the high-dimensional dense longitudinal observations. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data. It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.


2021 ◽  
Vol 15 ◽  
Author(s):  
Meijie Liu ◽  
Baojuan Li ◽  
Dewen Hu

Machine learning methods have been frequently applied in the field of cognitive neuroscience in the last decade. A great deal of attention has been attracted to introduce machine learning methods to study the autism spectrum disorder (ASD) in order to find out its neurophysiological underpinnings. In this paper, we presented a comprehensive review about the previous studies since 2011, which applied machine learning methods to analyze the functional magnetic resonance imaging (fMRI) data of autistic individuals and the typical controls (TCs). The all-round process was covered, including feature construction from raw fMRI data, feature selection methods, machine learning methods, factors for high classification accuracy, and critical conclusions. Applying different machine learning methods and fMRI data acquired from different sites, classification accuracies were obtained ranging from 48.3% up to 97%, and informative brain regions and networks were located. Through thorough analysis, high classification accuracies were found to usually occur in the studies which involved task-based fMRI data, single dataset for some selection principle, effective feature selection methods, or advanced machine learning methods. Advanced deep learning together with the multi-site Autism Brain Imaging Data Exchange (ABIDE) dataset became research trends especially in the recent 4 years. In the future, advanced feature selection and machine learning methods combined with multi-site dataset or easily operated task-based fMRI data may appear to have the potentiality to serve as a promising diagnostic tool for ASD.


2018 ◽  
Vol 226 (4) ◽  
pp. 259-273 ◽  
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Abstract. Machine learning tools are increasingly used in social sciences and policy fields due to their increase in predictive accuracy. However, little research has been done on how well the models of machine learning methods replicate across samples. We compare machine learning methods with regression on the replicability of variable selection, along with predictive accuracy, using an empirical dataset as well as simulated data with additive, interaction, and non-linear squared terms added as predictors. Methods analyzed include support vector machines (SVM), random forests (RF), multivariate adaptive regression splines (MARS), and the regularized regression variants, least absolute shrinkage and selection operator (LASSO), and elastic net. In simulations with additive and linear interactions, machine learning methods performed similarly to regression in replicating predictors; they also performed mostly equal or below regression on measures of predictive accuracy. In simulations with square terms, machine learning methods SVM, RF, and MARS improved predictive accuracy and replicated predictors better than regression. Thus, in simulated datasets, the gap between machine learning methods and regression on predictive measures foreshadowed the gap in variable selection. In replications on the empirical dataset, however, improved prediction by machine learning methods was not accompanied by a visible improvement in replicability in variable selection. This disparity is explained by the overall explanatory power of the models. When predictors have small effects and noise predominates, improved global measures of prediction in a sample by machine learning methods may not lead to the robust selection of predictors; thus, in the presence of weak predictors and noise, regression remains a useful tool for model building and replication.


2021 ◽  
Vol 11 ◽  
Author(s):  
Xuejiao Han ◽  
Jing Yang ◽  
Jingwen Luo ◽  
Pengan Chen ◽  
Zilong Zhang ◽  
...  

ObjectivesThe purpose of this study aimed at investigating the reliability of radiomics features extracted from contrast-enhanced CT in differentiating pancreatic cystadenomas from pancreatic neuroendocrine tumors (PNETs) using machine-learning methods.MethodsIn this study, a total number of 120 patients, including 66 pancreatic cystadenomas patients and 54 PNETs patients were enrolled. Forty-eight radiomic features were extracted from contrast-enhanced CT images using LIFEx software. Five feature selection methods were adopted to determine the appropriate features for classifiers. Then, nine machine learning classifiers were employed to build predictive models. The performance of the forty-five models was evaluated with area under the curve (AUC), accuracy, sensitivity, specificity, and F1 score in the testing group.ResultsThe predictive models exhibited reliable ability of differentiating pancreatic cystadenomas from PNETs when combined with suitable selection methods. A combination of DC as the selection method and RF as the classifier, as well as Xgboost+RF, demonstrated the best discriminative ability, with the highest AUC of 0.997 in the testing group.ConclusionsRadiomics-based machine learning methods might be a noninvasive tool to assist in differentiating pancreatic cystadenomas and PNETs.


Sign in / Sign up

Export Citation Format

Share Document