cross validation error
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 17)

H-INDEX

9
(FIVE YEARS 2)

2022 ◽  
pp. 096228022110417
Author(s):  
Kian Wee Soh ◽  
Thomas Lumley ◽  
Cameron Walker ◽  
Michael O’Sullivan

In this paper, we present a new model averaging technique that can be applied in medical research. The dataset is first partitioned by the values of its categorical explanatory variables. Then for each partition, a model average is determined by minimising some form of squared errors, which could be the leave-one-out cross-validation errors. From our asymptotic optimality study and the results of simulations, we demonstrate under several high-level assumptions and modelling conditions that this model averaging procedure may outperform jackknife model averaging, which is a well-established technique. We also present an example where a cross-validation procedure does not work (that is, a zero-valued cross-validation error is obtained) when determining the weights for model averaging.


2021 ◽  
Vol 11 (24) ◽  
pp. 11820
Author(s):  
Paweena Suebsombut ◽  
Aicha Sekhari ◽  
Pradorn Sureephong ◽  
Abdelhak Belhi ◽  
Abdelaziz Bouras

Water, an essential resource for crop production, is becoming increasingly scarce, while cropland continues to expand due to the world’s population growth. Proper irrigation scheduling has been shown to help farmers improve crop yield and quality, resulting in more sustainable water consumption. Soil Moisture (SM), which indicates the amount of water in the soil, is one of the most important crop irrigation parameters. In terms of water usage optimization and crop yield, estimating future soil moisture (forecasting) is an essentially valuable task for crop irrigation. As a result, farmers can base crop irrigation decisions on this parameter. Sensors can be used to estimate this value in real time, which may assist farmers in deciding whether or not to irrigate. The soil moisture value provided by the sensors, on the other hand, is instantaneous and cannot be used to directly compute irrigation parameters such as the best timing or the required water quantity to irrigate. The soil moisture value can, in fact, vary greatly depending on factors such as humidity, weather, and time. Using machine learning methods, these parameters can be used to predict soil moisture levels in the near future. This paper proposes a new Long-Short Term Memory (LSTM)-based model to forecast soil moisture values in the future based on parameters collected from various sensors as a potential solution. To train and validate this model, a real-world dataset containing a set of parameters related to weather forecasting, soil moisture, and other related parameters was collected using smart sensors installed in a greenhouse in Chiang Mai province, Thailand. Preliminary results show that our LSTM-based model performs well in predicting soil moisture with a 0.72% RMSE error and a 0.52% cross-validation error (LSTM), and our Bi-LSTM model with a 0.76% RMSE error and a 0.57% cross-validation error. In the future, we aim to test and validate this model on other similar datasets.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1998
Author(s):  
José Ignacio Amorena ◽  
Dolores María Eugenia Álvarez ◽  
Elvira Fernández-Ahumada

Llama fibre has the potential to become the most valuable textile resource in the Puna region of Argentina. In this study near infrared reflectance spectroscopy was evaluated to predict the mean fibre diameter in llama fleeces. Analyses between sets of carded and non-carded samples in combination with spectral preprocessing techniques were carried out and a total of 169 spectral signatures of llama samples in Vis and NIR ranges (400–2500 nm) were obtained. Spectral preprocessing consisted in wavelength selection (Vis–NIR, NIR and discrete ranges) and multiplicative and derivative pretreatments; spectra without pretreatments were also included, while modified partial least squares (M-PLS) regression was used to develop prediction models. Predictability was evaluated through R2: standard cross validation error (SECV), external validation error (SEV) and residual predictive value (RPD). A total of 54 calibration models were developed in which the best model (R2 = 0.67; SECV = 1.965; SEV = 2.235 and RPD = 1.91) was obtained in the Vis–NIR range applying the first derivative pretreatment. ANOVA analysis showed differences between carded and non-carded sets and the models obtained could be used in screening programs and contribute to valorisation of llama fibre and sustainable development of textile industry in the Puna territory of Catamarca. The data presented in this paper are a contribution to enhance the scarce information on this subject.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Issam Barra ◽  
Lotfi Khiari ◽  
Stephan M. Haefele ◽  
Ruben Sakrabani ◽  
Fassil Kebede

AbstractVibrational spectroscopy such as Fourier-transform infrared (FTIR), has been used successfully for soil diagnosis owing to its low cost, minimal sample preparation, non-destructive nature, and reliable results. This study aimed at optimizing one of the essential settings during the acquisition of FTIR spectra (viz. Scans number) using the standardized moment distance index (SMDI) as a metric that could trap the fine points of the curve and extract optimal spectral fingerprints of the sample. Furthermore, it can be used successfully to assess the spectra resemblance. The study revealed that beyond 50 scans the similarity of the acquisitions has been remarkably improved. Subsequently, the effect of the number of scans on the predictive ability of partial least squares regression models for the estimation of five selected soil properties (i.e., soil pH in water, soil organic carbon, total nitrogen, cation exchange capacity and Olsen phosphorus) was assessed, and the results showed a general tendency in improving the correlation coefficient (R2) as the number of scans increased from 10 to 80. In contrast, the cross-validation error RMSECV decreased with increasing scan number, reflecting an improvement of the predictive quality of the calibration models with an increasing number of scans.


2021 ◽  
Vol 11 (6) ◽  
pp. 2630
Author(s):  
Dimitrios S. Kasampalis ◽  
Pavlos Tsouvaltzis ◽  
Konstantinos Ntouros ◽  
Athanasios Gertsis ◽  
Dimitrios Moshou ◽  
...  

Background: Quality and safety of potato is both cultivar and postharvest management dependent. The precise assessment of freshness and cultivar are complex tasks requiring time-consuming, expensive, and destructive techniques. Method: Potatoes from three commercial cultivars were stored for 5 months at 5 °C. Color and chlorophyll fluorescence were recorded, Red-Green-Blue (R-G-B), Red-Green-Near infrared (R-G-NIR) and Red-Blue-Near infrared (R-B-NIR) digital images, as well as hyperspectral images were acquired both on the external periderm of the tuber and in the inner flesh part. Partial least square regression (PLSR) and discriminant analysis, combined with feature selection techniques were implemented, in order to assess the potato freshness and to classify them into the respective genotypes. Results: The PLSR analysis of visible/near infrared (Vis/NIR) spectra reflectance most reliably predicted potato freshness, with a cross-validated regression coefficient equal to 0.981 and 0.947, as determined by external or internal measurements, respectively. Variance inflation factor, variable importance scores, and genetic algorithms identified specific wavelength regions that mostly affected the accuracy of the model in terms of strongest regression and lowest collinearity and root mean cross validation error. Conclusions: Vis/NIR spectra reflectance data from the skin of the potato tubers may be reliably used in the assessment of postharvest storage life, as well as in the cultivar discrimination process.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 258
Author(s):  
Karim Karimi ◽  
Duy Ngoc Do ◽  
Mehdi Sargolzaei ◽  
Younes Miar

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.


2020 ◽  
Vol 6 ◽  
pp. e282
Author(s):  
Thomas R. Etherington

Interpolation techniques provide a method to convert point data of a geographic phenomenon into a continuous field estimate of that phenomenon, and have become a fundamental geocomputational technique of spatial and geographical analysts. Natural neighbour interpolation is one method of interpolation that has several useful properties: it is an exact interpolator, it creates a smooth surface free of any discontinuities, it is a local method, is spatially adaptive, requires no statistical assumptions, can be applied to small datasets, and is parameter free. However, as with any interpolation method, there will be uncertainty in how well the interpolated field values reflect actual phenomenon values. Using a method based on natural neighbour distance based rates of error calculated for data points via cross-validation, a cross-validation error-distance field can be produced to associate uncertainty with the interpolation. Virtual geography experiments demonstrate that given an appropriate number of data points and spatial-autocorrelation of the phenomenon being interpolated, the natural neighbour interpolation and cross-validation error-distance fields provide reliable estimates of value and error within the convex hull of the data points. While this method does not replace the need for analysts to use sound judgement in their interpolations, for those researchers for whom natural neighbour interpolation is the best interpolation option the method presented provides a way to assess the uncertainty associated with natural neighbour interpolations.


Sign in / Sign up

Export Citation Format

Share Document