scholarly journals Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

Author(s):  
Paul-Christian Bürkner ◽  
Jonah Gabry ◽  
Aki Vehtari

AbstractCross-validation can be used to measure a model’s predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into simple terms, but a lot of important models in temporal and spatial statistics do not have this property or are inefficient or unstable when forced into a factorized form. We derive how to efficiently compute and validate both exact and approximate LOO-CV for any Bayesian non-factorized model with a multivariate normal or Student-$$t$$ t distribution on the outcome values. We demonstrate the method using lagged simultaneously autoregressive (SAR) models as a case study.

2019 ◽  
Vol 76 (7) ◽  
pp. 2349-2361
Author(s):  
Benjamin Misiuk ◽  
Trevor Bell ◽  
Alec Aitken ◽  
Craig J Brown ◽  
Evan N Edinger

Abstract Species distribution models are commonly used in the marine environment as management tools. The high cost of collecting marine data for modelling makes them finite, especially in remote locations. Underwater image datasets from multiple surveys were leveraged to model the presence–absence and abundance of Arctic soft-shell clam (Mya spp.) to support the management of a local small-scale fishery in Qikiqtarjuaq, Nunavut, Canada. These models were combined to predict Mya abundance, conditional on presence throughout the study area. Results suggested that water depth was the primary environmental factor limiting Mya habitat suitability, yet seabed topography and substrate characteristics influence their abundance within suitable habitat. Ten-fold cross-validation and spatial leave-one-out cross-validation (LOO CV) were used to assess the accuracy of combined predictions and to test whether this was inflated by the spatial autocorrelation of transect sample data. Results demonstrated that four different measures of predictive accuracy were substantially inflated due to spatial autocorrelation, and the spatial LOO CV results were therefore adopted as the best estimates of performance.


2018 ◽  
Author(s):  
Quentin Frederik Gronau ◽  
Eric-Jan Wagenmakers

We recently discussed several limitations of Bayesian leave-one-out cross-validation (LOO) for model selection. Our contribution attracted three thought-provoking commentaries. In this rejoinder, we address each of the commentaries and identify several additional limitations of LOO-based methods such as Bayesian stacking. We focus on differences between LOO-based methods versus approaches that consistently use Bayes' rule for both parameter estimation and model comparison. We conclude that LOO-based methods do not align satisfactorily with the epistemic goal of mathematical psychology.


2019 ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract BackgroundAccumulating evidence has demonstrated that lncRNAs are closely associated with human diseases, and it is helpful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these prediction methods as well. ResultsIn this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation, 10-Fold Cross Validation and Leave-One-Out Cross Validation, separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Moreover, comparing with the representative prediction model of KATZLDA, results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both higher than the average case study contrast score of 0.6375 achieved by KATZLDA.


2021 ◽  
Author(s):  
Maryam Lustberg ◽  
Xuan Wu ◽  
Juan Luis Fernández-Martínez ◽  
Enrique J. de Andrés-Galiana ◽  
Santosh Philips ◽  
...  

Abstract BackgroundChemotherapy-induced peripheral neuropathy (CIPN) is a common toxicity of taxanes for which there is no effective intervention. Genomic CIPN risk determination has yielded promising, but inconsistent results. The present study assessed the utility of a collective SNP cluster identified using novel analytic to describe taxane-associated CIPN risk.MethodsWe analyzed GWAS data derived from ECOG-5103, first identifying SNPs that were most strongly associated with CIPN using Fisher’s ratio. We then ranked ordered those SNPs which discriminated CIPN-positive from CIPN-negative phenotypes based on their discriminatory power and developed the cluster of SNPs which provided the highest predictive accuracy using leave-one-out cross validation (LOOCV).ResultsUsing GWAS aggregate data, we identified a 267 SNP cluster which was associated with a CIPN+ phenotype with an accuracy of 96.1%. ConclusionsIdentification of a 267 SNP cluster could accurately predict CIPN risk. Validation using an independent patient cohort should be performed.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lisha Yu ◽  
Yang Zhao ◽  
Hailiang Wang ◽  
Tien-Lung Sun ◽  
Terrence E. Murphy ◽  
...  

Abstract Background Poor balance has been cited as one of the key causal factors of falls. Timely detection of balance impairment can help identify the elderly prone to falls and also trigger early interventions to prevent them. The goal of this study was to develop a surrogate approach for assessing elderly’s functional balance based on Short Form Berg Balance Scale (SFBBS) score. Methods Data were collected from a waist-mounted tri-axial accelerometer while participants performed a timed up and go test. Clinically relevant variables were extracted from the segmented accelerometer signals for fitting SFBBS predictive models. Regularized regression together with random-shuffle-split cross-validation was used to facilitate the development of the predictive models for automatic balance estimation. Results Eighty-five community-dwelling older adults (72.12 ± 6.99 year) participated in our study. Our results demonstrated that combined clinical and sensor-based variables, together with regularized regression and cross-validation, achieved moderate-high predictive accuracy of SFBBS scores (mean MAE = 2.01 and mean RMSE = 2.55). Step length, gender, gait speed and linear acceleration variables describe the motor coordination were identified as significantly contributed variables of balance estimation. The predictive model also showed moderate-high discriminations in classifying the risk levels in the performance of three balance assessment motions in terms of AUC values of 0.72, 0.79 and 0.76 respectively. Conclusions The study presented a feasible option for quantitatively accurate, objectively measured, and unobtrusively collected functional balance assessment at the point-of-care or home environment. It also provided clinicians and elderly with stable and sensitive biomarkers for long-term monitoring of functional balance.


2014 ◽  
Vol 79 (8) ◽  
pp. 965-975 ◽  
Author(s):  
Long Jiao ◽  
Xiaofei Wang ◽  
LI. Hua ◽  
Yunxia Wang

The quantitative structure property relationship (QSPR) for gas/particle partition coefficient, Kp, of polychlorinated biphenyls (PCBs) was investigated. Molecular distance-edge vector (MDEV) index was used as the structural descriptor of PCBs. The quantitative relationship between the MDEV index and log Kp was modeled by multivariate linear regression (MLR) and artificial neural network (ANN) respectively. Leave one out cross validation and external validation were carried out to assess the prediction ability of the developed models. When the MLR method is used, the root mean square relative error (RMSRE) of prediction for leave one out cross validation and external validation is 4.72 and 8.62 respectively. When the ANN method is employed, the prediction RMSRE of leave one out cross validation and external validation is 3.87 and 7.47 respectively. It is demonstrated that the developed models are practicable for predicting the Kp of PCBs. The MDEV index is shown to be quantitatively related to the Kp of PCBs.


Sign in / Sign up

Export Citation Format

Share Document