32 Cross validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy

Abstract Efficient strategies have been developed for leave-one-out cross validation (LOOCV) of predicted phenotypes in a simple model with an overall mean and marker effects or animal genetic effects to evaluate the accuracy of genomic predictions. For such a model, the correlation between the predicted and the observed phenotype is identical to the correlation between the observed phenotype and the estimated breeding value (EBV). When the model is more complex, with multiple fixed and random effects, although the correlation between the observed and predicted phenotype can be obtained efficiently by LOOCV, it is not equal to the correlation between the observed phenotype and EBV, which is the statistic of interest. The objective here was to develop and evaluate an efficient LOOCV method for EBV or for predictions of other random effects under a general mixed linear model. The approach is based on treated all effects in the model, with large variances for fixed effects. Naïve LOOCV requires inverting the (n - 1) x (n - 1) dimensional phenotypic covariance matrix for each of the n (= no. observations) training data sets. Our method efficiently obtains these inverses from the inverse of the phenotypic covariance matrix for all n observations. Naïve LOOCV of EBV by pre-correction of fixed effects using the training data (Naïve LOOCV) and the new efficient LOOCV were compared. The new efficient LOOCV for EBV was 962 times faster than Naïve LOOCV. Prediction accuracies from the two strategies were the same (0.20). Funded by USDA-NIFA grant # 2017-67007-26144.

Download Full-text

Breeding value evaluation in Polish fur animals: Statistical description of fur coat and reproduction traits – relationship and inbreeding

Czech Journal of Animal Science ◽

10.17221/4266-cjas ◽

2011 ◽

Vol 49 (No. 1) ◽

pp. 16-27 ◽

Cited By ~ 2

Author(s):

H. Wierzbicki ◽

A. Filistowicz ◽

W. Jagusiak

Keyword(s):

Fixed Effects ◽

Breeding Value ◽

The Arctic ◽

Arctic Fox ◽

Data Sets ◽

Value Evaluation ◽

Inbreeding Coefficients ◽

Silver Fox ◽

Reproduction Traits ◽

As Relationship

Three data sets were available: records on conformation and coat traits for the arctic fox from one farm (5 540 observations, collected between 1983 and 1997), and the same traits for the silver fox from three farms (8 199 observations, collected between 1984 and 1999). The third set comprised 5 829 observations on reproductive performance of the arctic fox from one farm, collected between 1984 and 1999. The GLM procedure was used to test the significance of fixed effects on the analysed reproduction traits as well as differences between groups. Phenotypic trends as well as relationship and inbreeding across the studied years were computed. Most of the phenotypic trends were positive. Low relationship and inbreeding coefficients in the arctic and silver fox populations under study were estimated. The average relationship coefficients for the silver and arctic fox populations were 0.015 and 0.010, respectively, whereas the average inbreeding coefficients for the same species were 0.0039 and 0.0016, respectively. No inbreeding was found in the arctic fox breeding females.  

Download Full-text

Eigenvector Spatial Filtering for Large Data Sets: Fixed and Random Effects Approaches

Geographical Analysis ◽

10.1111/gean.12156 ◽

2018 ◽

Vol 51 (1) ◽

pp. 23-49 ◽

Cited By ~ 17

Author(s):

Daisuke Murakami ◽

Daniel A. Griffith

Keyword(s):

Random Effects ◽

Large Data ◽

Spatial Filtering ◽

Large Data Sets ◽

Data Sets ◽

Fixed And Random Effects ◽

Eigenvector Spatial Filtering

Download Full-text

RESAMPLING METHODS IN SOFTWARE QUALITY CLASSIFICATION

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194012400037 ◽

2012 ◽

Vol 22 (02) ◽

pp. 203-223 ◽

Cited By ~ 7

Author(s):

WASIF AFZAL ◽

RICHARD TORKAR ◽

ROBERT FELDT

Keyword(s):

Software Engineering ◽

Software Quality ◽

Cross Validation ◽

Predictor Variables ◽

Primary Study ◽

Data Sets ◽

Resampling Methods ◽

Quality Classification ◽

Leave One Out ◽

Fold Cross Validation

In the presence of a number of algorithms for classification and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classification studies in software engineering, there is a lack of a definitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal meta-analysis of the primary study results. Therefore, it is desirable to examine the influence of various resampling methods and to quantify possible differences. Objective and method: This study empirically compares five common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and non-parametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classification approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no significant differences between the different resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insignificant differences between the resampling methods based on AUC. These include imbalanced data sets, insignificant predictor variables and high-dimensional data sets. With the current selection of data sets and classification techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.

Download Full-text

On modeling and analyzing barley malt data in different years

Biometrical Letters ◽

10.2478/bile-2019-0004 ◽

2019 ◽

Vol 56 (1) ◽

pp. 45-57

Author(s):

Iwona Mejza ◽

Katarzyna Ambroży-Deręgowska ◽

Jan Bocianowski ◽

Józef Błażewicz ◽

Marek Liszewski ◽

...

Keyword(s):

Linear Model ◽

Random Effects ◽

Fixed Effects ◽

Linear Models ◽

Model Fitting ◽

Mixed Linear Model ◽

Barley Grain ◽

Barley Malt ◽

Starting Point ◽

Quality Coefficient

SummaryThe main purpose of this study was the model fitting of data deriving from a three-year experiment with barley malt. Two linear models were considered: a fixed linear model with fixed effects of years and other factors, and a mixed linear model with random effects of years and fixed effects of other factors. Two cultivars of brewing barley, Sebastian and Mauritia, six methods of nitrogen fertilization and four germination times were analyzed. Three quantitative traits were observed: practical extractivity of the malt, malting productivity, and a quality coefficient Q. The starting point for the statistical analyses was the available experimental material, which consisted of barley grain samples destined for malting. The analyses were performed over a series of years with respect to fixed or random effects of years. Due to the strong differentiation of the years of the study and some significant interactions of factors with years, annual analyses were also carried out.

Download Full-text

Estimated Risk for Insulin Dose Error Among Hospital Patients Due to Glucose Meter Hematocrit Bias in 2020

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2020-0101-ra ◽

2020 ◽

Vol 144 (10) ◽

pp. 1204-1208

Author(s):

Mark Inman ◽

Andrew W. Lyon ◽

Oliver A. S. Lyon ◽

Martha E. Lyon

Keyword(s):

Random Effects ◽

Fixed Effects ◽

Insulin Dose ◽

Reference Method ◽

Hospital Inpatient ◽

Dose Error ◽

Glucose Testing ◽

Dosing Error ◽

Fixed And Random Effects ◽

Glucose Meter

Context.— Glycemic control requires accurate blood glucose testing. The extent of hematocrit interference is difficult to assess to assure quality patient care. Objective.— To predict the effect of patient hematocrit on the performance of a glucose meter and its corresponding impact on insulin-dosing error. Design.— Multilevel mixed regression was conducted to assess the extent that patient hematocrit influences Roche Accu-Chek Inform II glucose meters, using the Radiometer ABL 837 as a reference method collected during validation of 35 new meters. Regression coefficients of fixed effects for reference glucose, hematocrit, an interaction term, and random error were applied to 4 months of patient reference method results extracted from the laboratory information system. A hospital inpatient insulin dose algorithm was used to determine the frequency of insulin dose error between reference glucose and meter glucose results. Results.— Fixed effects regression for method and hematocrit predicted biases to glucose meter results that met the “95% within ±12%” for the US Food and Drug Administration goal, but combinations of fixed and random effects exceeded that target in emergency and hospital inpatient units. Insulin dose errors were predicted from the meter results. Twenty-eight percent of intensive care unit, 20.8% of hospital inpatient, and 17.7% of emergency department results were predicted to trigger a ±1 insulin dose error by fixed and random effects. Conclusions.— The current extent of hematocrit interference on glucose meter performance is anticipated to cause insulin error by 1-dose category, which is likely associated with low patient risk.

Download Full-text

Multiple Group Comparisons of the Fixed and Random Effects From the Generalized Linear Mixed Model

Sociological Methods & Research ◽

10.1177/0049124120986182 ◽

2021 ◽

pp. 004912412098618

Author(s):

Daniel Kasper ◽

Katrin Schulz-Heidorf ◽

Knut Schwippert

Keyword(s):

Random Effects ◽

Fixed Effects ◽

Mixed Model ◽

Linear Mixed Model ◽

Asymptotic Properties ◽

Generalized Linear Mixed Model ◽

Test Statistic ◽

Multiple Group ◽

Fixed And Random Effects ◽

Group Comparisons

In this article, we extend Liao’s test for across-group comparisons of the fixed effects from the generalized linear model to the fixed and random effects of the generalized linear mixed model (GLMM). Using as our basis the Wald statistic, we developed an asymptotic test statistic for across-group comparisons of these effects. The test can be applied when the fixed and random effects are multivariate normally distributed, and it works well for any link function and conditional distribution of the dependent variable of the GLMM. We also derived the asymptotic properties of this test, and because power information does not exist for either our new test statistic or Liao’s test, we implemented a power study to demonstrate the superiority of these tests over the alternatively proposed F test. Using an example, we show the application of the test and then discuss its possible restrictions with respect to the distribution of the random effects.

Download Full-text

Estimation of the breeding value of sport horses in the Czech Republic

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun200452010145 ◽

2004 ◽

Vol 52 (1) ◽

pp. 145-152

Author(s):

Iva Jiskrová

Keyword(s):

Czech Republic ◽

Random Effects ◽

Fixed Effects ◽

Prediction Method ◽

Sport Performance ◽

Breeding Value ◽

Progeny Testing ◽

The Czech Republic ◽

Breeding Values ◽

Estimated Breeding Values

The performance of 10671 horses in 10911 sport competitions was used to estimate the breeding value of the population of the Czech warm-blooded horses using the Best Linear Unibased Prediction method. The sport performance was estimated on the basis of the number of bad points (penalties) in jumping competitions. We analysed 252781 sporting results in the period 1991 – 2002. The estimations encompassed the fixed effects of sex, age, level of the competition and random effects of the breeder, rider, competition and the permanent environment. We compared the original and innovated calculations of the estimate of the breeding value of sport horses in the Czech Republic. We then compiled a list of estimated breeding values for stallions having 30 or more offspring and we compared the estimated breeding values with the results of the official system of progeny testing for performance in the Czech Republic.

Download Full-text

Fixed part of the model for breeding value estimation in pigs based on litter size

Biotechnology in Animal Husbandry ◽

10.2298/bah0701429r ◽

2007 ◽

Vol 23 (5-6-1) ◽

pp. 429-436

Author(s):

D. Radojkovic ◽

M. Petrovic ◽

M. Mijatovic ◽

C. Radovic

Keyword(s):

Linear Regression ◽

Fixed Effects ◽

Statistical Significance ◽

Least Square Method ◽

Least Square ◽

Breeding Value ◽

Data Sets ◽

Value Analysis ◽

Applied Model ◽

The Republic

The goal of this paper was to investigate the effect of various fixed effects on the number of born alive piglets in litter (NBA), based on results of Swedish Landrace sow fertility on three farms in Serbia, in order to determine the best adapted model for assessing genetic parameters and breeding value. Analysis of phenotipic variability of the NBA of Swedish Landrace sows was carried out based on fertility results on three swine farms (A, B and C) in the Republic of Serbia. Data sets encompassed reproduction indicators for 2803 (A), 1826 (B) and 2235 (C) sows, i.e. their 11014, 6757 and 8452 litters, respectively. For this analysis was used fix model of least square method which includes fixed effects of farrowing number, season of conception shown as combination of year and month, litter genotype, duration of previous period from weaning to conception, effect of sow age at farrowing like quadratic regression nested within farrowing number and linear regression influence of duration of previous lactation. The average NBA was within the interval from 9.13 (A) to 9.76 piglets (B and C). The monitored trait statistically highly significantly (p<0.001) varied under the effect of all systematic factors encompassed by the applied model, regardless of the source of analyzed data, Only the linear regression effect of duration of previous lactation for farm B was assessed as having lower statistical significance (p<0.05).

Download Full-text

HYPER-PARAMETER SELECTION FOR SPARSE LS-SVM VIA MINIMIZATION OF ITS LOCALIZED GENERALIZATION ERROR

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691313500306 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1350030 ◽

Cited By ~ 14

Author(s):

BINBIN SUN ◽

WING W. Y. NG ◽

DANIEL S. YEUNG ◽

PATRICK P. K. CHAN

Keyword(s):

Cross Validation ◽

Parameter Selection ◽

Data Sets ◽

Generalization Error ◽

Generalization Capability ◽

Sensitivity Measure ◽

Minimum Sensitivity ◽

Training Error ◽

Leave One Out ◽

Localized Generalization Error

Sparse LS-SVM yields better generalization capability and reduces prediction time in comparison to full dense LS-SVM. However, both methods require careful selection of hyper-parameters (HPS) to achieve high generalization capability. Leave-One-Out Cross Validation (LOO-CV) and k-fold Cross Validation (k-CV) are the two most widely used hyper-parameter selection methods for LS-SVMs. However, both fail to select good hyper-parameters for sparse LS-SVM. In this paper we propose a new hyper-parameter selection method, LGEM-HPS, for LS-SVM via minimization of the Localized Generalization Error (L-GEM). The L-GEM consists of two major components: empirical mean square error and sensitivity measure. A new sensitivity measure is derived for LS-SVM to enable the LGEM-HPS select hyper-parameters yielding LS-SVM with smaller training error and minimum sensitivity to minor changes in inputs. Experiments on eleven UCI data sets show the effectiveness of the proposed method for selecting hyper-parameters for sparse LS-SVM.

Download Full-text

Incremental Online Learning in High Dimensions

Neural Computation ◽

10.1162/089976605774320557 ◽

2005 ◽

Vol 17 (12) ◽

pp. 2602-2634 ◽

Cited By ~ 328

Author(s):

Sethu Vijayakumar ◽

Aaron D'Souza ◽

Stefan Schaal

Keyword(s):

Linear Models ◽

Nonlinear Function ◽

Training Data ◽

High Dimensional ◽

Data Sets ◽

Least Squares Regression ◽

Computationally Efficient ◽

Learning Techniques ◽

Leave One Out ◽

Nonlinear Function Approximation

Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of—possibly redundant—inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

Download Full-text