Full Model Selection in Huge Datasets and for Proxy Models Construction

Facing the full model selection problem in high volume datasets employing intelligent proxy models

Intelligent Data Analysis ◽

10.3233/ida-184199 ◽

2019 ◽

Vol 23 (5) ◽

pp. 1109-1129

Author(s):

Ángel Díaz-Pacheco ◽

Carlos A. Reyes-Garcia

Keyword(s):

Model Selection ◽

High Volume ◽

Selection Problem ◽

Full Model ◽

Proxy Models ◽

Model Selection Problem

Download Full-text

Multi-objective Full Model Selection in temporal databases: Optimizing time and performance

2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) ◽

10.1109/ropec.2016.7830617 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nancy Perez-Castro ◽

Hector Gabriel Acosta-Mesa ◽

Efren Mezura-Montes ◽

Hugo Jair Escalante

Keyword(s):

Model Selection ◽

Temporal Databases ◽

Full Model ◽

Multi Objective ◽

And Performance

Download Full-text

Full Model Selection issue in temporal data through evolutionary algorithms: A brief review

2017 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2017.7969602 ◽

2017 ◽

Author(s):

Nancy Perez-Castro ◽

Aldo Marquez-Grajales ◽

Hector Gabriel Acosta-Mesa ◽

Efren Mezura-Montes

Keyword(s):

Model Selection ◽

Evolutionary Algorithms ◽

Temporal Data ◽

Full Model

Download Full-text

Simulation of factors affecting <i>Emiliania huxleyi</i> blooms in Arctic and sub-Arctic seas by CMIP5 climate models: model validation and selection

Biogeosciences ◽

10.5194/bg-17-1199-2020 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1199-1212

Author(s):

Natalia Gnatiuk ◽

Iuliia Radchenko ◽

Richard Davy ◽

Evgeny Morozov ◽

Leonid Bobylev

Keyword(s):

Model Selection ◽

Climate Models ◽

Selection Procedure ◽

Reanalysis Data ◽

The Arctic ◽

Historical Period ◽

Full Model ◽

Model Ensemble ◽

Arctic Seas ◽

Biochemical Variables

Abstract. The observed warming in the Arctic is more than double the global average, and this enhanced Arctic warming is projected to continue throughout the 21st century. This rapid warming has a wide range of impacts on polar and sub-polar marine ecosystems. One of the examples of such an impact on ecosystems is that of coccolithophores, particularly Emiliania huxleyi, which have expanded their range poleward during recent decades. The coccolithophore E. huxleyi plays an essential role in the global carbon cycle. Therefore, the assessment of future changes in coccolithophore blooms is very important. Currently, there are a large number of climate models that give projections for various oceanographic, meteorological, and biochemical variables in the Arctic. However, individual climate models can have large biases when compared to historical observations. The main goal of this research was to select an ensemble of climate models that most accurately reproduces the state of environmental variables that influence the coccolithophore E. huxleyi bloom over the historical period when compared to reanalysis data. We developed a novel approach for model selection to include a diverse set of measures of model skill including the spatial pattern of some variables, which had not previously been included in a model selection procedure. We applied this method to each of the Arctic and sub-Arctic seas in which E. huxleyi blooms have been observed. Once we have selected an optimal combination of climate models that most skilfully reproduce the factors which affect E. huxleyi, the projections of the future conditions in the Arctic from these models can be used to predict how E. huxleyi blooms will change in the future. Here, we present the validation of 34 CMIP5 (fifth phase of the Coupled Model Intercomparison Project) atmosphere–ocean general circulation models (GCMs) over the historical period 1979–2005. Furthermore, we propose a procedure of ranking and selecting these models based on the model's skill in reproducing 10 important oceanographic, meteorological, and biochemical variables in the Arctic and sub-Arctic seas. These factors include the concentration of nutrients (NO3, PO4, and SI), dissolved CO2 partial pressure (pCO2), pH, sea surface temperature (SST), salinity averaged over the top 30 m (SS30 m), 10 m wind speed (WS), ocean surface current speed (OCS), and surface downwelling shortwave radiation (SDSR). The validation of the GCMs' outputs against reanalysis data includes analysis of the interannual variability, seasonal cycle, spatial biases, and temporal trends of the simulated variables. In total, 60 combinations of models were selected for 10 variables over six study regions using the selection procedure we present here. The results show that there is neither a combination of models nor one model that has high skill in reproducing the regional climatic-relevant features of all combinations of the considered variables in target seas. Thereby, an individual subset of models was selected according to our model selection procedure for each combination of variable and Arctic or sub-Arctic sea. Following our selection procedure, the number of selected models in the individual subsets varied from 3 to 11. The paper presents a comparison of the selected model subsets and the full-model ensemble of all available CMIP5 models to reanalysis data. The selected subsets of models generally show a better performance than the full-model ensemble. Therefore, we conclude that within the task addressed in this study it is preferable to employ the model subsets determined through application of our procedure than the full-model ensemble.

Download Full-text

Investigation on the Improvement of Prediction by Bootstrap Model Averaging

Methods of Information in Medicine ◽

10.1055/s-0038-1634035 ◽

2006 ◽

Vol 45 (01) ◽

pp. 44-50 ◽

Cited By ~ 8

Author(s):

N. H. Augustin ◽

W. Sauerbrei ◽

N. Holländer

Keyword(s):

Model Selection ◽

Mean Squared Error ◽

Model Averaging ◽

Predictive Performance ◽

Information Criterion ◽

Full Model ◽

Backward Elimination ◽

Study Results ◽

Model Selection Uncertainty ◽

Bootstrap Model

Summary Objectives: We illustrate a recently proposed two-step bootstrap model averaging (bootstrap MA) approach to cope with model selection uncertainty. The predictive performance is investigated in an example and in a simulation study. Results are compared to those derived from other model selection methods. Methods: In the framework of the linear regression model we use the two-step bootstrap MA, which consists of a screening step to eliminate covariates thought to have no influence on the response, and a model-averaging step. We also apply the full model, variable selection using backward elimination based on Akaike’s Information Criterion (AIC), the Bayes Information Criterion (BIC) and the bagging approach. The predictive performance is measured by the mean squared error (MSE) and the coverage of confidence intervals for the true response. Results: We obtained similar results for all approaches in the example. In the simulation the MSE was reduced by all approaches in comparison to the full model. The smallest values are obtained for bootstrap MA. Only the bootstrap MA and the full model correctly estimated the nominal coverage. The backward elimination procedures led to substantial underestimation and bagging to an overestimation of the true coverage. The screening step of bootstrap MA eliminates most of the unimportant factors. Conclusion: The new bootstrap MA approach shows promising results for predictive performance. It increases practical usefulness by eliminating unimportant factors in the screening step.

Download Full-text

Bootstrapping Linear Models

10.1093/oso/9780198505044.003.0016 ◽

2017 ◽

Author(s):

Russell Cheng

Keyword(s):

Model Selection ◽

Linear Models ◽

Parametric Bootstrap ◽

Real Data ◽

Original Data ◽

Difficult Problem ◽

Full Model ◽

Number Of Factors ◽

Bootstrap Model ◽

Reliability Check

Bootstrap model selection is proposed for the difficult problem of selecting important factors in non-orthogonal linear models when the number of factors, P, is large. In the method, the full model is first fitted to the original data. Then B parametric bootstrap samples are drawn from the fitted model, and the full model fitted to each. A submodel is obtained from each fitted full model by rejecting those factors found unimportant in the fit. Each distinct selected submodel is then fitted to the original data and its Mallows Cp statistic calculated. A subset of good submodels based on the Cp values is then obtained. A reliability check can be made by fitting this subset to the BS samples also, to see how often each submodel is found to be a good fit. Use of the method is illustrated using a real-data sample.

Download Full-text

Towards a Surrogate-Assisted Multi-Objective Full Model Selection

Research in Computing Science ◽

10.13053/rcs-71-1-10 ◽

2014 ◽

Vol 71 (1) ◽

pp. 95-105

Author(s):

Alejandro Rosales-Pérez ◽

Jesús A. González ◽

Carlos A. Reyes-García ◽

Carlos A. Coello Coello

Keyword(s):

Model Selection ◽

Full Model ◽

Multi Objective

Download Full-text

Prediction model optimization using full model selection with regression trees demonstrated with FTIR data from bovine milk

Preventive Veterinary Medicine ◽

10.1016/j.prevetmed.2018.12.012 ◽

2019 ◽

Vol 163 ◽

pp. 14-23 ◽

Cited By ~ 1

Author(s):

M. Tremblay ◽

M. Kammer ◽

H. Lange ◽

S. Plattner ◽

C. Baumgartner ◽

...

Keyword(s):

Model Selection ◽

Prediction Model ◽

Bovine Milk ◽

Regression Trees ◽

Full Model ◽

Model Optimization

Download Full-text

A classification-based fuzzy-rules proxy model to assist in the full model selection problem in high volume datasets

Journal of Experimental & Theoretical Artificial Intelligence ◽

10.1080/0952813x.2021.1925972 ◽

2021 ◽

pp. 1-30

Author(s):

Angel Díaz-Pacheco ◽

Carlos Alberto Reyes-Garcia

Keyword(s):

Model Selection ◽

High Volume ◽

Fuzzy Rules ◽

Selection Problem ◽

Full Model ◽

Proxy Model ◽

Model Selection Problem

Download Full-text

A mapreduce based framework to perform full model selection in very large datasets

IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS ◽

10.33965/ijcsis_2018130101 ◽

2018 ◽

Vol 13 (1) ◽

pp. 1-13

Author(s):

Angel Díaz Pacheco ◽

esús A. Gonzalez-Bernal ◽

Carlos A. Reyes-Garcia

Keyword(s):

Model Selection ◽

Large Datasets ◽

Full Model ◽

Very Large Datasets

Download Full-text