Moving window cross validation: a new cross validation method for the selection of a rational number of components in a partial least squares calibration model

The Analyst ◽  
2006 ◽  
Vol 131 (4) ◽  
pp. 529 ◽  
Author(s):  
Sumaporn Kasemsumran ◽  
Yi-Ping Du ◽  
Bo-Yan Li ◽  
Katsuhiko Maruo ◽  
Yukihiro Ozaki
2016 ◽  
Vol 81 (2) ◽  
pp. 209-218 ◽  
Author(s):  
Long Jiao ◽  
Shan Bing ◽  
Xiaofeng Zhang ◽  
Hua Li

The application of interval partial least squares (IPLS) and moving window partial least squares (MWPLS) to the enantiomeric analysis of tryptophan (Trp) was investigated. A UV-Vis spectroscopy method for determining the enantiomeric composition of Trp was developed. The calibration model was built by using partial least squares (PLS), IPLS and MWPLS respectively. Leave-one-out cross validation and external test validation were used to assess the prediction performance of the established models. The validation result demonstrates the established full-spectrum PLS model is impractical for quantifying the relationship between the spectral data and enantiomeric composition of L-Trp. On the contrary, the developed IPLS and MWPLS model are both practicable for modeling this relationship. For the IPLS model, the root mean square relative error (RMSRE) of external test validation and leave-one-out cross validation is 4.03 and 6.50 respectively. For the MWPLS model, the RMSRE of external test validation and leave-one-out cross validation is 2.93 and 4.73 respectively. Obviously, the prediction accuracy of the MWPLS model is higher than that of the IPLS model. It is demonstrated UV-Vis spectroscopy combined with MWPLS is a commendable method for determining the enantiomeric composition of Trp. MWPLS is superior to IPLS for selecting spectral region in UV-Vis spectroscopy analysis.


2017 ◽  
Vol 38 (1) ◽  
pp. 590-594
Author(s):  
Chen Yueyang ◽  
Gao Zhishan ◽  
Yu Xiaohui ◽  
Zhu Dan ◽  
Chen Ming ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-6 ◽  
Author(s):  
Jie Yu Chen ◽  
Han Zhang ◽  
Jinkui Ma ◽  
Tomohiro Tuchiya ◽  
Yelian Miao

This rapid method for determining the degree of degradation of frying rapeseed oils uses Fourier-transform infrared (FTIR) spectroscopy combined with partial least-squares (PLS) regression. One hundred and fifty-six frying oil samples that degraded to different degrees by frying potatoes were scanned by an FTIR spectrometer using attenuated total reflectance (ATR). PLS regression with full cross validation was used for the prediction of acid value (AV) and total polar compounds (TPC) based on raw, first, and second derivative FTIR spectra (4000–650 cm−1). The precise calibration model based on the second derivative FTIR spectra shows that the coefficients of determination for calibration(R2)and standard errors of cross validation (SECV) were 0.99 and 0.16 mg KOH/g and 0.98 and 1.17% for AV and TPC, respectively. The accuracy of the calibration model, tested using the validation set, yielded standard errors of prediction (SEP) of 0.16 mg KOH/g and 1.10% for AV and TPC, respectively. Therefore, the degradation of frying oils can be accurately measured using FTIR spectroscopy combined with PLS regression.


2002 ◽  
Vol 35 (5) ◽  
pp. 921-941
Author(s):  
M. Martínez Galera ◽  
D. Picón Zamora ◽  
J. L. Martínez Vidal ◽  
A. Garrido Frenich

Agronomy ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 666
Author(s):  
Rafael Font ◽  
Mercedes del Río-Celestino ◽  
Diego Luna ◽  
Juan Gil ◽  
Antonio de Haro-Bailón

The near-infrared spectroscopy (NIRS) combined with modified partial least squares (modified PLS) regression was used for determining the neutral detergent fiber (NDF) and the acid detergent fiber (ADF) fractions of the chickpea (Cicer arietinum L.) seed. Fifty chickpea accessions (24 desi and 26 kabuli types) and fifty recombinant inbred lines F5:6 derived from a kabuli × desi cross were evaluated for NDF and ADF, and scanned by NIRS. NDF and ADF values were regressed against different spectral transformations by modified partial least squares regression. The coefficients of determination in the cross-validation and the standard deviation from the standard error of cross-validation ratio were, for NDF, 0.91 and 3.37, and for ADF, 0.98 and 6.73, respectively, showing the high potential of NIRS to assess these components in chickpea for screening (NDF) or quality control (ADF) purposes. The spectral information provided by different chromophores existing in the chickpea seed highly correlated with the NDF and ADF composition of the seed, and, thus, those electronic transitions are highly influenced on model fitting for fiber.


2017 ◽  
Vol 47 (1) ◽  
Author(s):  
Fernanda Gomes da Silveira ◽  
Darlene Ana Souza Duarte ◽  
Lucas Monteiro Chaves ◽  
Fabyano Fonseca e Silva ◽  
Ivan Carvalho Filho ◽  
...  

ABSTRACT: The main application of genomic selection (GS) is the early identification of genetically superior animals for traits difficult-to-measure or lately evaluated, such as meat pH (measured after slaughter). Because the number of markers in GS is generally larger than the number of genotyped animals and these markers are highly correlated owing to linkage disequilibrium, statistical methods based on dimensionality reduction have been proposed. Among them, the partial least squares (PLS) technique stands out, because of its simplicity and high predictive accuracy. However, choosing the optimal number of components remains a relevant issue for PLS applications. Thus, we applied PLS (and principal component and traditional multiple regression) techniques to GS for pork pH traits (with pH measured at 45min and 24h after slaughter) and also identified the optimal number of PLS components based on the degree-of-freedom (DoF) and cross-validation (CV) methods. The PLS method out performs the principal component and traditional multiple regression techniques, enabling satisfactory predictions for pork pH traits using only genotypic data (low-density SNP panel). Furthermore, the SNP marker estimates from PLS revealed a relevant region on chromosome 4, which may affect these traits. The DoF and CV methods showed similar results for determining the optimal number of components in PLS analysis; thus, from the statistical viewpoint, the DoF method should be preferred because of its theoretical background (based on the "statistical information theory"), whereas CV is an empirical method based on computational effort.


2014 ◽  
Vol 70 (5) ◽  
Author(s):  
Nor Fazila Rasaruddin ◽  
Mas Ezatul Nadia Mohd Ruah ◽  
Mohamed Noor Hasan ◽  
Mohd Zuli Jaafar

This paper shows the determination of iodine value (IV) of pure and frying palm oils using Partial Least Squares (PLS) regression with application of variable selection. A total of 28 samples consisting of pure and frying palm oils which acquired from markets. Seven of them were considered as high-priced palm oils while the remaining was low-priced. PLS regression models were developed for the determination of IV using Fourier Transform Infrared (FTIR) spectra data in absorbance mode in the range from 650 cm-1 to 4000 cm-1. Savitzky Golay derivative was applied before developing the prediction models. The models were constructed using wavelength selected in the FTIR region by adopting selectivity ratio (SR) plot and correlation coefficient to the IV parameter. Each model was validated through Root Mean Square Error Cross Validation, RMSECV and cross validation correlation coefficient, R2cv. The best model using SR plot was the model with mean centring for pure sample and model with a combination of row scaling and standardization of frying sample. The best model with the application of the correlation coefficient variable selection was the model with a combination of row scaling and standardization of pure sample and model with mean centering data pre-processing for frying sample. It is not necessary to row scaled the variables to develop the model since the effect of row scaling on model quality is insignificant.


Sign in / Sign up

Export Citation Format

Share Document