scholarly journals AUTOMATIC KERNEL REGRESSION MODELLING USING COMBINED LEAVE-ONE-OUT TEST SCORE AND REGULARISED ORTHOGONAL LEAST SQUARES

2004 ◽  
Vol 14 (01) ◽  
pp. 27-37 ◽  
Author(s):  
X. HONG ◽  
S. CHEN ◽  
P. M. SHARKEY

This paper introduces an automatic robust nonlinear identification algorithm using the leave-one-out test score also known as the PRESS (Predicted REsidual Sums of Squares) statistic and regularised orthogonal least squares. The proposed algorithm aims to achieve maximised model robustness via two effective and complementary approaches, parameter regularisation via ridge regression and model optimal generalisation structure selection. The major contributions are to derive the PRESS error in a regularised orthogonal weight model, develop an efficient recursive computation formula for PRESS errors in the regularised orthogonal least squares forward regression framework and hence construct a model with a good generalisation property. Based on the properties of the PRESS statistic the proposed algorithm can achieve a fully automated model construction procedure without resort to any other validation data set for model evaluation.

1994 ◽  
Vol 2 (4) ◽  
pp. 185-198 ◽  
Author(s):  
Joseph G. Montalvo ◽  
Steven E. Buco ◽  
Harmon H. Ramey

In Part I of this series, both cotton fibre property and reflectance spectra data on 185 US cottons including four Pimas were analysed by descriptive statistics. In this paper, principal components regression (PCR) models for measuring six properties from the cotton's vis/NIR reflectance spectra are critically examined. These properties are upper-half mean length (UHM), uniformity index (UI), bundle strength (STR), micronaire (MIC) and colour (Rd and +b). The spectra were recorded with a scanning spectrophotometer in the wavelength range from 400 to 2498 nm. A variety of spectral processing options, some of which give improved PCR analysis results, were applied prior to the regressions and allowed for testing of over 100 PCR models. All PCR model results are based on the PRESS statistic by one-out-rotation, a fast approximation of the PRESS statistic (to reduce computer time) or on cluster analysis using separate calibration and validation data sets. The standard error of prediction (SEP) of all the properties except UHM compared well to the reference method precision. The precision of the UHM measure by reflectance spectroscopy was strongly influenced by the sample repack error. The SEP of UHM, UI and STR was improved by excluding the Pimas from the data set.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Katarzyna Sujka ◽  
Piotr Koczoń ◽  
Alicja Ceglińska ◽  
Magdalena Reder ◽  
Hanna Ciemniewska-Żytkiewicz

Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models’ characteristics were as follows: R2=0.97, PRESS = 2.14; R2=0.96, PRESS = 0.69; R2=0.95, PRESS = 1.27; R2=0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2=0.86; 0.82; and 0.78, resp.).


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


BMJ Open ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. e040778
Author(s):  
Vineet Kumar Kamal ◽  
Ravindra Mohan Pandey ◽  
Deepak Agrawal

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.


Author(s):  
Jinming Wen ◽  
Jie Li ◽  
Huanmin Ge ◽  
Zhengchun Zhou ◽  
Weiqi Luo

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhixiang Yu ◽  
Haiyan He ◽  
Yanan Chen ◽  
Qiuhe Ji ◽  
Min Sun

AbstractOvarian cancer (OV) is a common type of carcinoma in females. Many studies have reported that ferroptosis is associated with the prognosis of OV patients. However, the mechanism by which this occurs is not well understood. We utilized Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) to identify ferroptosis-related genes in OV. In the present study, we applied Cox regression analysis to select hub genes and used the least absolute shrinkage and selection operator to construct a prognosis prediction model with mRNA expression profiles and clinical data from TCGA. A series of analyses for this signature was performed in TCGA. We then verified the identified signature using International Cancer Genome Consortium (ICGC) data. After a series of analyses, we identified six hub genes (DNAJB6, RB1, VIMP/ SELENOS, STEAP3, BACH1, and ALOX12) that were then used to construct a model using a training data set. The model was then tested using a validation data set and was found to have high sensitivity and specificity. The identified ferroptosis-related hub genes might play a critical role in the mechanism of OV development. The gene signature we identified may be useful for future clinical applications.


2014 ◽  
Vol 44 (7) ◽  
pp. 784-795 ◽  
Author(s):  
Susan J. Prichard ◽  
Eva C. Karau ◽  
Roger D. Ottmar ◽  
Maureen C. Kennedy ◽  
James B. Cronan ◽  
...  

Reliable predictions of fuel consumption are critical in the eastern United States (US), where prescribed burning is frequently applied to forests and air quality is of increasing concern. CONSUME and the First Order Fire Effects Model (FOFEM), predictive models developed to estimate fuel consumption and emissions from wildland fires, have not been systematically evaluated for application in the eastern US using the same validation data set. In this study, we compiled a fuel consumption data set from 54 operational prescribed fires (43 pine and 11 mixed hardwood sites) to assess each model’s uncertainties and application limits. Regions of indifference between measured and predicted values by fuel category and forest type represent the potential error that modelers could incur in estimating fuel consumption by category. Overall, FOFEM predictions have narrower regions of indifference than CONSUME and suggest better correspondence between measured and predicted consumption. However, both models offer reliable predictions of live fuel (shrubs and herbaceous vegetation) and 1 h fine fuels. Results suggest that CONSUME and FOFEM can be improved in their predictive capability for woody fuel, litter, and duff consumption for eastern US forests. Because of their high biomass and potential smoke management problems, refining estimates of litter and duff consumption is of particular importance.


Geophysics ◽  
2006 ◽  
Vol 71 (5) ◽  
pp. U67-U76 ◽  
Author(s):  
Robert J. Ferguson

The possibility of improving regularization/datuming of seismic data is investigated by treating wavefield extrapolation as an inversion problem. Weighted, damped least squares is then used to produce the regularized/datumed wavefield. Regularization/datuming is extremely costly because of computing the Hessian, so an efficient approximation is introduced. Approximation is achieved by computing a limited number of diagonals in the operators involved. Real and synthetic data examples demonstrate the utility of this approach. For synthetic data, regularization/datuming is demonstrated for large extrapolation distances using a highly irregular recording array. Without approximation, regularization/datuming returns a regularized wavefield with reduced operator artifacts when compared to a nonregularizing method such as generalized phase shift plus interpolation (PSPI). Approximate regularization/datuming returns a regularized wavefield for approximately two orders of magnitude less in cost; but it is dip limited, though in a controllable way, compared to the full method. The Foothills structural data set, a freely available data set from the Rocky Mountains of Canada, demonstrates application to real data. The data have highly irregular sampling along the shot coordinate, and they suffer from significant near-surface effects. Approximate regularization/datuming returns common receiver data that are superior in appearance compared to conventional datuming.


Sign in / Sign up

Export Citation Format

Share Document