scholarly journals Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation

2014 ◽  
Vol 23 (7) ◽  
pp. 811-820 ◽  
Author(s):  
Kévin Le Rest ◽  
David Pinaud ◽  
Pascal Monestiez ◽  
Joël Chadoeuf ◽  
Vincent Bretagnolle
2019 ◽  
Vol 76 (7) ◽  
pp. 2349-2361
Author(s):  
Benjamin Misiuk ◽  
Trevor Bell ◽  
Alec Aitken ◽  
Craig J Brown ◽  
Evan N Edinger

Abstract Species distribution models are commonly used in the marine environment as management tools. The high cost of collecting marine data for modelling makes them finite, especially in remote locations. Underwater image datasets from multiple surveys were leveraged to model the presence–absence and abundance of Arctic soft-shell clam (Mya spp.) to support the management of a local small-scale fishery in Qikiqtarjuaq, Nunavut, Canada. These models were combined to predict Mya abundance, conditional on presence throughout the study area. Results suggested that water depth was the primary environmental factor limiting Mya habitat suitability, yet seabed topography and substrate characteristics influence their abundance within suitable habitat. Ten-fold cross-validation and spatial leave-one-out cross-validation (LOO CV) were used to assess the accuracy of combined predictions and to test whether this was inflated by the spatial autocorrelation of transect sample data. Results demonstrated that four different measures of predictive accuracy were substantially inflated due to spatial autocorrelation, and the spatial LOO CV results were therefore adopted as the best estimates of performance.


2014 ◽  
Vol 79 (8) ◽  
pp. 965-975 ◽  
Author(s):  
Long Jiao ◽  
Xiaofei Wang ◽  
LI. Hua ◽  
Yunxia Wang

The quantitative structure property relationship (QSPR) for gas/particle partition coefficient, Kp, of polychlorinated biphenyls (PCBs) was investigated. Molecular distance-edge vector (MDEV) index was used as the structural descriptor of PCBs. The quantitative relationship between the MDEV index and log Kp was modeled by multivariate linear regression (MLR) and artificial neural network (ANN) respectively. Leave one out cross validation and external validation were carried out to assess the prediction ability of the developed models. When the MLR method is used, the root mean square relative error (RMSRE) of prediction for leave one out cross validation and external validation is 4.72 and 8.62 respectively. When the ANN method is employed, the prediction RMSRE of leave one out cross validation and external validation is 3.87 and 7.47 respectively. It is demonstrated that the developed models are practicable for predicting the Kp of PCBs. The MDEV index is shown to be quantitatively related to the Kp of PCBs.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Hong-Jhang Chen ◽  
Yii-Jeng Lin ◽  
Pei-Chen Wu ◽  
Wei-Hsiang Hsu ◽  
Wan-Chung Hu ◽  
...  

Traditional Chinese medicine (TCM) formulates treatment according to body constitution (BC) differentiation. Different constitutions have specific metabolic characteristics and different susceptibility to certain diseases. This study aimed to assess theYang-Xuconstitution using a body constitution questionnaire (BCQ) and clinical blood variables. A BCQ was employed to assess the clinical manifestation ofYang-Xu. The logistic regression model was conducted to explore the relationship between BC scores and biomarkers. Leave-one-out cross-validation (LOOCV) and K-fold cross-validation were performed to evaluate the accuracy of a predictive model in practice. Decision trees (DTs) were conducted to determine the possible relationships between blood biomarkers and BC scores. According to the BCQ analysis, 49% participants without any BC were classified as healthy subjects. Among them, 130 samples were selected for further analysis and divided into two groups. One group comprised healthy subjects without any BC (68%), while subjects of the other group, named as the sub-healthy group, had three BCs (32%). Six biomarkers, CRE, TSH, HB, MONO, RBC, and LH, were found to have the greatest impact on BCQ outcomes inYang-Xusubjects. This study indicated significant biochemical differences inYang-Xusubjects, which may provide a connection between blood variables and theYang-XuBC.


Author(s):  
Jung-Han Wang ◽  
Mohamed A. Abdel-Aty ◽  
Jaeyoung Lee

The Highway Safety Manual (HSM) Part C provides a series of safety performance functions (SPFs) for different roadway conditions. The SPFs suggested in the HSM are formulated on the basis of exposure variables: the logarithms of the annual average daily traffic (AADT) on the major road and on the minor road under the base condition. In this research, data from 7,802 intersections in Florida were collected and processed. These intersections were categorized into seven types based on area type (rural or urban), number of legs (three or four), and number of approaches controlled by stop signs. Twenty-two SPF formulations, including the one suggested by the HSM, were developed for each intersection type for examination of the goodness-of-fit measures of the SPFs. In addition, the goodness of fit of each model of the 22 SPFs in each category was examined with 10-fold leave-one-out cross-validation (LOOCV). With a comparison of the delta values generated with the LOOCV method, it is suggested that the SPF with the logarithm of the total entering vehicle volume and the ratio of the AADT on the minor road and the AADT on the major road are important. In addition, the SPFs with the AADT on the major road and the AADT on the minor road and their logarithmic transformations are also important. Therefore, it is suggested that the future HSM compare these two SPF formulations—as suggested in the current research, along with the original SPF formulation in the manual—and select the one with the best model fit on the basis of the delta value using LOOCV.


2019 ◽  
Vol 20 (S23) ◽  
Author(s):  
Cheng Yan ◽  
Guihua Duan ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Abstract Background Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicting virus-receptor interactions are limited. Result In this study, we propose a new computational method (IILLS) to predict virus-receptor interactions based on Initial Interaction scores method via the neighbors and the Laplacian regularized Least Square algorithm. IILLS integrates the known virus-receptor interactions and amino acid sequences of receptors. The similarity of viruses is calculated by the Gaussian Interaction Profile (GIP) kernel. On the other hand, we also compute the receptor GIP similarity and the receptor sequence similarity. Then the sequence similarity is used as the final similarity of receptors according to the prediction results. The 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) are used to assess the prediction performance of our method. We also compare our method with other three competing methods (BRWH, LapRLS, CMF). Conlusion The experiment results show that IILLS achieves the AUC values of 0.8675 and 0.9061 with the 10-fold cross validation and leave-one-out cross validation (LOOCV), respectively, which illustrates that IILLS is superior to the competing methods. In addition, the case studies also further indicate that the IILLS method is effective for the virus-receptor interaction prediction.


2014 ◽  
Vol 70 (5) ◽  
Author(s):  
Nor Fazila Rasaruddin ◽  
Mas Ezatul Nadia Mohd Ruah ◽  
Mohamed Noor Hasan ◽  
Mohd Zuli Jaafar

This paper shows the determination of iodine value (IV) of pure and frying palm oils using Partial Least Squares (PLS) regression with application of variable selection. A total of 28 samples consisting of pure and frying palm oils which acquired from markets. Seven of them were considered as high-priced palm oils while the remaining was low-priced. PLS regression models were developed for the determination of IV using Fourier Transform Infrared (FTIR) spectra data in absorbance mode in the range from 650 cm-1 to 4000 cm-1. Savitzky Golay derivative was applied before developing the prediction models. The models were constructed using wavelength selected in the FTIR region by adopting selectivity ratio (SR) plot and correlation coefficient to the IV parameter. Each model was validated through Root Mean Square Error Cross Validation, RMSECV and cross validation correlation coefficient, R2cv. The best model using SR plot was the model with mean centring for pure sample and model with a combination of row scaling and standardization of frying sample. The best model with the application of the correlation coefficient variable selection was the model with a combination of row scaling and standardization of pure sample and model with mean centering data pre-processing for frying sample. It is not necessary to row scaled the variables to develop the model since the effect of row scaling on model quality is insignificant.


Author(s):  
Federico Belotti ◽  
Franco Peracchi

In this article, we describe jackknife2, a new prefix command for jackknifing linear estimators. It takes full advantage of the available leave-one-out formula, thereby allowing for substantial reduction in computing time. Of special note is that jackknife2 allows the user to compute cross-validation and diagnostic measures that are currently not available after ivregress 2sls, xtreg, and xtivregress.


2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Guobo Xie ◽  
Zhiliang Fan ◽  
Yuping Sun ◽  
Cuiming Wu ◽  
Lei Ma

Abstract Background Recently, numerous biological experiments have indicated that microRNAs (miRNAs) play critical roles in exploring the pathogenesis of various human diseases. Since traditional experimental methods for miRNA-disease associations detection are costly and time-consuming, it becomes urgent to design efficient and robust computational techniques for identifying undiscovered interactions. Methods In this paper, we proposed a computation framework named weighted bipartite network projection for miRNA-disease association prediction (WBNPMD). In this method, transfer weights were constructed by combining the known miRNA and disease similarities, and the initial information was properly configured. Then the two-step bipartite network algorithm was implemented to infer potential miRNA-disease associations. Results The proposed WBNPMD was applied to the known miRNA-disease association data, and leave-one-out cross-validation (LOOCV) and fivefold cross-validation were implemented to evaluate the performance of WBNPMD. As a result, our method achieved the AUCs of 0.9321 and $$0.9173 \pm 0.0005$$ 0.9173 ± 0.0005 in LOOCV and fivefold cross-validation, and outperformed other four state-of-the-art methods. We also carried out two kinds of case studies on prostate neoplasm, colorectal neoplasm, and lung neoplasm, and most of the top 50 predicted miRNAs were confirmed to have an association with the corresponding diseases based on dbDeMC, miR2Disease, and HMDD V3.0 databases. Conclusions The experimental results demonstrate that WBNPMD can accurately infer potential miRNA-disease associations. We anticipated that the proposed WBNPMD could serve as a powerful tool for potential miRNA-disease associations excavation.


Minerals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 474 ◽  
Author(s):  
Laurens T. Tijsseling ◽  
Quentin Dehaine ◽  
Gavyn K. Rollinson ◽  
Hylke J. Glass

As part of a study investigating the influence of mineralogical variability in a sediment hosted copper–cobalt deposit in the Democratic Republic of Congo on flotation performance, the flotation of nine sulphide ore samples was investigated through laboratory batch kinetics tests and quantitative mineral analyses. Using a range of ore samples from the same deposit the influence of mineralogy on flotation performance was studied. Characterisation of the samples through QEMSCAN showed that bornite, chalcopyrite, chalcocite and carrollite are the main copper-bearing sulphide minerals while carrollite is the only cobalt-bearing mineral. Mineralogical characteristics were averaged per sample to allow for a quantitative correlation with flotation performance parameters. Equilibrium recoveries, rate constants and final grades of the samples were correlated to the feed mineralogy through Multiple Linear Regression (MLR). Target sulphide minerals content and particle size, magnesiochlorite content, carrollite liberation and association of the copper and cobalt minerals with magnesiochlorite and dolomite were used to predict flotation performance. Leave One Out Cross Validation (LOOCV) revealed that the final copper and cobalt grades are predicted with an R2 of 0.80 and 0.93 and Root Mean Square Error of Cross Validation (RMSECV) of 4.41% and 1.34%. The recovery of cobalt and copper with time can be predicted with an R2 of 0.94 for both and an overall test error of 4.70% and 5.14%. Overall, it was shown that quantitative understanding of changes in mineralogy allows for prediction of changes in flotation performance.


Sign in / Sign up

Export Citation Format

Share Document