scholarly journals The extension of generalized cross-validation to a multi-parameter class of estimators

Author(s):  
D. M. O'Brien ◽  
J. N. Holt

AbstractThe method of generalized cross-validation (GCV) provides a good value for the “ridge” regularization parameter for an ill-conditioned linear system, such as the system produced by discretization of a Fredholm integral equation of the first kind. In this note we apply GCV to a wider class of estimators than the one parameter ridge estimators. We observe that the expected values of the parameter mean-square error, the predictive mean-square error, and the GCV function are simultaneously minimized over this new class, so we accept the minimizer of the GCV function as the best computable estimator. We present a simple algorithm for computing this estimator from the data, so that a numerical search is not needed.

Author(s):  
Syafruddin Side ◽  
Wahidah Sanusi ◽  
Mustati'atul Waidah Maksum

Abstrak. Regresi semiparametrik merupakan model regresi yang memuat komponen parametrik dan komponen nonparametrik dalam suatu model. Pada penelitian ini digunakan model regresi semiparametrik spline untuk data longitudinal dengan studi kasus penderita Demam Berdarah Dengue (DBD) di Rumah Sakit Universitas Hasanuddin Makassar periode bulan  Januari sampai bulan Maret 2018. Estimasi model regresi terbaik didapat dari pemilihan titik knot optimal dengan melihat nilai Generalized Cross Validation (GCV) dan Mean Square Error (MSE) yang minimum. Komponen parametrik pada penelitian ini adalah hemoglobin (g/dL) dan umur (tahun), suhu tubuh ( ), trombosit ( ) sebagai komponen nonparametrik dengan nilai GCV minimum sebesar 221,67745153 dicapai pada titik knot yaitu 14,552; 14,987; dan 15,096; nilai MSE sebesar 199,1032; dan nilai koefisien determinasi sebesar 75,3% yang diperoleh dari model regresi semiparametrik spline linear dengan tiga titik knot..Kata Kunci: regresi semiparametrik, spline, knot, Generalized Cross Validation, Demam Berdarah Dengue.Abstract. Semiparametric regression is a regression model that includes parametric and nonparametric components in it. The regression model in this research is spline semiparametric regression with case studies of patients with Dengue Hemorrahagic Fever (DHF) at University of Hasanuddin Makassar Hospital during the period of January to March 2018. The best regression model estimation is obtained from the selection of optimal knot which has minimum Generalized Cross Validation (GCV) and Mean Square Error (MSE). Parametric component in this research is hemoglobin (g/dL) and age (years), body temperature ( ), platelets ( ) as a nonparametric components. The minimum value of GCV is 221,67745153 achieved at the point 14,552; 14,987; and 15,096 knot; MSE value of 199,1032; and the value of coefficient determination is 75,3% obtained from semiparametric regression model linear spline with third point of knots.Keywords: semiparametric regression, spline, knot, Generalized Cross Validation, Dengue Hemorrahagic Fever.


2013 ◽  
Vol 807-809 ◽  
pp. 1967-1971
Author(s):  
Yan Bai ◽  
Xiao Yan Duan ◽  
Hai Yan Gong ◽  
Cai Xia Xie ◽  
Zhi Hong Chen ◽  
...  

In this paper, the content of forsythoside A and ethanol-extract were rapidly determinated by near-infrared reflectance spectroscopy (NIRS). 85 samples of Forsythiae Fructus harvested in Luoyang from July to September in 2012 were divided into a calibration set (75 samples) and a validation set (10 samples). In combination with the partical least square (PLS), the quantitative calibration models of forsythoside A and ethanol-extract were established. The correlation coefficient of cross-validation (R2) was 0.98247 and 0.97214 for forsythoside A and ethanol-extract, the root-mean-square error of calibration (RMSEC) was 0.184 and 0.570, the root-mean-square error of cross-validation (RMSECV) was 0.81736 and 0.36656. The validation set were used to evaluate the performance of the models, the root-mean-square error of prediction (RMSEP) was 0.221 and 0.518. The results indicated that it was feasible to determine the content of forsythoside A and ethanol-extract in Forsythiae Fructus by near-infrared spectroscopy.


2020 ◽  
Vol 2 (1) ◽  
pp. 9-26
Author(s):  
Syed Abdul Rehman ◽  
Mohammad Asif

In this paper we propose a class of estimators for the estimation of finitepopulation mean using the auxiliary information when SRS scheme is used. Theexpressions for the Bias and mean square error (MSE) of the existing andsuggested class of estimators are derived up to first degree of approximation andthe efficiency comparison of suggested class of estimators is made with otherexisting estimators, using both theoretically and numerically based on realpopulation sets.


2019 ◽  
Vol 27 (3) ◽  
pp. 220-231
Author(s):  
Emmanuel Amomba Seweh ◽  
Zou Xiaobo ◽  
Feng Tao ◽  
Shi Jiachen ◽  
Haroon Elrasheid Tahir ◽  
...  

A comparative study of three chemometric algorithms combined with NIR spectroscopy with the aim of determining the best performing algorithm for quantitative prediction of iodine value, saponification value, free fatty acids content, and peroxide values of unrefined shea butter. Multivariate calibrations were developed for each parameter using supervised partial least squares, interval partial least squares, and genetic-algorithm partial least square regression methods to establish a linear relationship between standard reference and the Fourier transformed-near infrared predicted. Results showed that genetic-algorithm partial least square models were superior in predicting iodine value and saponification value while partial least squares was excellent in predicting free fatty acids content and peroxide values. The nine-factor genetic-algorithm partial least square iodine value calibration model for predicting iodine value yielded excellent ( R2 cal = 0.97), ( R2 val = 0.97), low (root mean square error of cross-validation = 0.26), low (root mean square error of Prediction = 0.23), and (ratio of performance to deviation = 6.41); for saponification value, the nine-factor genetic-algorithm partial least square saponification value calibration model had excellent R2 cal (0.97), R2 val (0.99); low root mean square error of cross-validation (0.73), low root mean square error of Prediction (0.53), and (ratio of performance to deviation = 8.27); while for free fatty acids, the 11-factor partial least square free fatty acids produced very high R2 cal (0.97) and R2 val (0.97) with very low root mean square error of cross-validation (0.03), low root mean square error of Prediction (0.04) and (ratio of performance to deviation = 5.30) and finally for peroxide values, the 11-factor partial least square peroxide values calibration model obtained excellent R2 cal (0.96) and R2val (0.98) with low root mean square error of cross-validation (0.05), low root mean square error of Prediction (0.04), and (ratio of performance to deviation = 5.86). The built models were accurate and robust and can be reliably applied in developing a handheld quality detection device for screening, quality control checks, and prediction of shea butter quality on-site.


Geophysics ◽  
1971 ◽  
Vol 36 (5) ◽  
pp. 938-942 ◽  
Author(s):  
W. E. Sims ◽  
F. X. Bostick ◽  
H. W. Smith

Six different estimates of the magnetotelluric impedance tensor elements may be computed from measured data by use of auto‐power and cross‐power density spectra. We show that each of the estimates satisfies a mean‐square error criterion. Two of the six estimates are relatively unstable in the one‐dimensional case when the incident fields are unpolarized. For the remaining four estimates, it is shown that two are unaffected by random noise on the H signal, but are biased upward by random noise on the E signal. The remaining two estimates are unaffected by random noise on the E signal, but are biased downward by random noise on the H signal. Computation of all of the estimates provides a measure of the total amount of noise present, as indicated by a stability coefficient for the estimates. In the absence of additional information as to the relative signal‐to‐noise ratios of the E and H signals, we suggest that a mean estimate be used. A numerical example is included.


2010 ◽  
Vol 16 (2) ◽  
pp. 187-193 ◽  
Author(s):  
Yang Meiyan ◽  
Li Jing ◽  
Nie Shaoping ◽  
Hu Jielun ◽  
Yu Qiang ◽  
...  

Near-infrared spectroscopy (NIRS) was used as a rapid and nondestructive method to determine the content of docosahexaenoic acid (DHA) in powdered oil samples. A total of 82 samples were scanned in the diffuse reflectance mode by Nicolet 5700 FTIR spectrometer and the reference values for DHA was measured by gas chromatography. Calibration equations were developed using partial least-squares regression (PLS) with internal cross-validation. Samples were split in two sets, one set used as calibration (n = 66) whereas the remaining samples (n=16) were used as validation set. Two mathematical treatments (first and second derivative), none (log(1/R)) and standard normal variate as scatter corrections and Savitzky—Golay smoothing were explored. To decide upon the number of PLS factors included in the PLS model, the model with the lowest root mean square error of cross-validation (RMSECV=0.44) for the validation set is chosen. The correlation coefficient (r) between the predicted and the reference results which used as an evaluation parameter for the models is 0.968. The root mean square error of prediction of the final model is 0.59. The results reported in this article demonstrate that FT-NIR measurements can serve as a rapid method to determine DHA in powdered oil.


2016 ◽  
Vol 06 (02) ◽  
pp. 254-273 ◽  
Author(s):  
Mohamed M. Shoukri ◽  
Tusneem Al-Hassan ◽  
Michael DeNiro ◽  
Abdelmoneim El Dali ◽  
Futwan Al-Mohanna

Author(s):  
OCTAVIANUS BUDI SANTOSA ◽  
MICHAEL RAHARJA GANI ◽  
SRI HARTATI YULIANI

Objective: The objective of this study was to develop a UV spectroscopy method in combination with multivariate analysis for determining vitexin in binahong (Anredera cordifolia (Ten.) Steenis) leaves extract. Methods: The partial least square (PLS) regression and the principal component regression (PCR) was performed in this study to evaluate several statistical performances such as coefficient of determination (R2), root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and relative error of prediction (REP). Cross-validation in this study was performed using leave one out technique. Results: The R2 values of calibration data sets resulted from PLS ​​and PCR method were 0.9675 and 0.9648, respectively. The low values of RMSEC and RMSECV both for PLS ​​and PCR method indicated the minimum error of the calibration models. The R2 values of validation data sets resulted from PLS ​​and PCR method were 0.9778 and 0.9820, respectively. The low values of RMSEP both for PLS ​​and PCR method indicated the minimum error of prediction generated from the calibration data sets. Multivariate calibration techniques were applied to determine the content of vitexin in binahong leaves extract. Predicted values from the multivariate calibration models were compared to the actual values determined from a validated HPLC method. It was found that PLS models resulted in the lowest REP values compared to the PCR models. Conclusion: The chemometrics technique can be applied as an alternative method for determining vitexin levels in the ethanol solution of binahong leaves extract.


Sign in / Sign up

Export Citation Format

Share Document