Full Model Selection in Huge Datasets and for Proxy Models Construction

Author(s):  
Angel Díaz-Pacheco ◽  
Carlos Alberto Reyes-García
2019 ◽  
Vol 23 (5) ◽  
pp. 1109-1129
Author(s):  
Ángel Díaz-Pacheco ◽  
Carlos A. Reyes-Garcia

Author(s):  
Nancy Perez-Castro ◽  
Aldo Marquez-Grajales ◽  
Hector Gabriel Acosta-Mesa ◽  
Efren Mezura-Montes

2020 ◽  
Vol 17 (4) ◽  
pp. 1199-1212
Author(s):  
Natalia Gnatiuk ◽  
Iuliia Radchenko ◽  
Richard Davy ◽  
Evgeny Morozov ◽  
Leonid Bobylev

Abstract. The observed warming in the Arctic is more than double the global average, and this enhanced Arctic warming is projected to continue throughout the 21st century. This rapid warming has a wide range of impacts on polar and sub-polar marine ecosystems. One of the examples of such an impact on ecosystems is that of coccolithophores, particularly Emiliania huxleyi, which have expanded their range poleward during recent decades. The coccolithophore E. huxleyi plays an essential role in the global carbon cycle. Therefore, the assessment of future changes in coccolithophore blooms is very important. Currently, there are a large number of climate models that give projections for various oceanographic, meteorological, and biochemical variables in the Arctic. However, individual climate models can have large biases when compared to historical observations. The main goal of this research was to select an ensemble of climate models that most accurately reproduces the state of environmental variables that influence the coccolithophore E. huxleyi bloom over the historical period when compared to reanalysis data. We developed a novel approach for model selection to include a diverse set of measures of model skill including the spatial pattern of some variables, which had not previously been included in a model selection procedure. We applied this method to each of the Arctic and sub-Arctic seas in which E. huxleyi blooms have been observed. Once we have selected an optimal combination of climate models that most skilfully reproduce the factors which affect E. huxleyi, the projections of the future conditions in the Arctic from these models can be used to predict how E. huxleyi blooms will change in the future. Here, we present the validation of 34 CMIP5 (fifth phase of the Coupled Model Intercomparison Project) atmosphere–ocean general circulation models (GCMs) over the historical period 1979–2005. Furthermore, we propose a procedure of ranking and selecting these models based on the model's skill in reproducing 10 important oceanographic, meteorological, and biochemical variables in the Arctic and sub-Arctic seas. These factors include the concentration of nutrients (NO3, PO4, and SI), dissolved CO2 partial pressure (pCO2), pH, sea surface temperature (SST), salinity averaged over the top 30 m (SS30 m), 10 m wind speed (WS), ocean surface current speed (OCS), and surface downwelling shortwave radiation (SDSR). The validation of the GCMs' outputs against reanalysis data includes analysis of the interannual variability, seasonal cycle, spatial biases, and temporal trends of the simulated variables. In total, 60 combinations of models were selected for 10 variables over six study regions using the selection procedure we present here. The results show that there is neither a combination of models nor one model that has high skill in reproducing the regional climatic-relevant features of all combinations of the considered variables in target seas. Thereby, an individual subset of models was selected according to our model selection procedure for each combination of variable and Arctic or sub-Arctic sea. Following our selection procedure, the number of selected models in the individual subsets varied from 3 to 11. The paper presents a comparison of the selected model subsets and the full-model ensemble of all available CMIP5 models to reanalysis data. The selected subsets of models generally show a better performance than the full-model ensemble. Therefore, we conclude that within the task addressed in this study it is preferable to employ the model subsets determined through application of our procedure than the full-model ensemble.


2006 ◽  
Vol 45 (01) ◽  
pp. 44-50 ◽  
Author(s):  
N. H. Augustin ◽  
W. Sauerbrei ◽  
N. Holländer

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.


Author(s):  
Russell Cheng

Bootstrap model selection is proposed for the difficult problem of selecting important factors in non-orthogonal linear models when the number of factors, P, is large. In the method, the full model is first fitted to the original data. Then B parametric bootstrap samples are drawn from the fitted model, and the full model fitted to each. A submodel is obtained from each fitted full model by rejecting those factors found unimportant in the fit. Each distinct selected submodel is then fitted to the original data and its Mallows Cp statistic calculated. A subset of good submodels based on the Cp values is then obtained. A reliability check can be made by fitting this subset to the BS samples also, to see how often each submodel is found to be a good fit. Use of the method is illustrated using a real-data sample.


2014 ◽  
Vol 71 (1) ◽  
pp. 95-105
Author(s):  
Alejandro Rosales-Pérez ◽  
Jesús A. González ◽  
Carlos A. Reyes-García ◽  
Carlos A. Coello Coello

2019 ◽  
Vol 163 ◽  
pp. 14-23 ◽  
Author(s):  
M. Tremblay ◽  
M. Kammer ◽  
H. Lange ◽  
S. Plattner ◽  
C. Baumgartner ◽  
...  

Author(s):  
Angel Díaz Pacheco ◽  
esús A. Gonzalez-Bernal ◽  
Carlos A. Reyes-Garcia

Sign in / Sign up

Export Citation Format

Share Document