scholarly journals Model Selection Criteria on Beta Regression for Machine Learning

2019 ◽  
Vol 1 (1) ◽  
pp. 427-449
Author(s):  
Patrícia Espinheira ◽  
Luana da Silva ◽  
Alisson Silva ◽  
Raydonal Ospina

Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely P 2 statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria P 2 are provided. We also evaluated two versions of the R 2 criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases.

2021 ◽  
Vol 20 (3) ◽  
pp. 450-461
Author(s):  
Stanley L. Sclove

AbstractThe use of information criteria, especially AIC (Akaike’s information criterion) and BIC (Bayesian information criterion), for choosing an adequate number of principal components is illustrated.


2020 ◽  
Vol 37 (11) ◽  
pp. 3338-3352
Author(s):  
Shiran Abadi ◽  
Oren Avram ◽  
Saharon Rosset ◽  
Tal Pupko ◽  
Itay Mayrose

Abstract Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.


2020 ◽  
Author(s):  
Shiran Abadi ◽  
Oren Avram ◽  
Saharon Rosset ◽  
Tal Pupko ◽  
Itay Mayrose

AbstractStatistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. While model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, while these methods are dedicated to revealing the processes that underlie the sequence data, in most cases they do not produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate model for branch-length estimation accuracy. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared to existing strategies. We show that on datasets simulated under simple homogenous substitution models ModelTeller leads to branch-length estimation that is as accurate as the statistical model selection criteria. We then demonstrate that ModelTeller outperforms these criteria when more intricate patterns – that aim at mimicking realistic processes – are considered.


2010 ◽  
Vol 47 (1) ◽  
pp. 216-234 ◽  
Author(s):  
Filia Vonta ◽  
Alex Karagrigoriou

Measures of divergence or discrepancy are used either to measure mutual information concerning two variables or to construct model selection criteria. In this paper we focus on divergence measures that are based on a class of measures known as Csiszár's divergence measures. In particular, we propose a measure of divergence between residual lives of two items that have both survived up to some time t as well as a measure of divergence between past lives, both based on Csiszár's class of measures. Furthermore, we derive properties of these measures and provide examples based on the Cox model and frailty or transformation model.


2015 ◽  
Vol 28 (1) ◽  
pp. 67-82 ◽  
Author(s):  
Shuichi Kawano ◽  
Ibuki Hoshina ◽  
Kaito Shimamura ◽  
Sadanori Konishi

Sign in / Sign up

Export Citation Format

Share Document