AUTOMATIC KERNEL REGRESSION MODELLING USING COMBINED LEAVE-ONE-OUT TEST SCORE  AND REGULARISED ORTHOGONAL LEAST SQUARES

This paper introduces an automatic robust nonlinear identification algorithm using the leave-one-out test score also known as the PRESS (Predicted REsidual Sums of Squares) statistic and regularised orthogonal least squares. The proposed algorithm aims to achieve maximised model robustness via two effective and complementary approaches, parameter regularisation via ridge regression and model optimal generalisation structure selection. The major contributions are to derive the PRESS error in a regularised orthogonal weight model, develop an efficient recursive computation formula for PRESS errors in the regularised orthogonal least squares forward regression framework and hence construct a model with a good generalisation property. Based on the properties of the PRESS statistic the proposed algorithm can achieve a fully automated model construction procedure without resort to any other validation data set for model evaluation.

Download Full-text

Studies to Measure Cotton Fibre Length, Strength, Micronaire and Colour by vis/NIR Reflectance Spectroscopy. Part II: Principal Components Regression

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.45 ◽

1994 ◽

Vol 2 (4) ◽

pp. 185-198 ◽

Cited By ~ 10

Author(s):

Joseph G. Montalvo ◽

Steven E. Buco ◽

Harmon H. Ramey

Keyword(s):

Principal Components ◽

Fibre Length ◽

Reflectance Spectroscopy ◽

Cotton Fibre ◽

Reflectance Spectra ◽

Principal Components Regression ◽

Validation Data ◽

Data Set ◽

Fibre Property ◽

The Press

In Part I of this series, both cotton fibre property and reflectance spectra data on 185 US cottons including four Pimas were analysed by descriptive statistics. In this paper, principal components regression (PCR) models for measuring six properties from the cotton's vis/NIR reflectance spectra are critically examined. These properties are upper-half mean length (UHM), uniformity index (UI), bundle strength (STR), micronaire (MIC) and colour (Rd and +b). The spectra were recorded with a scanning spectrophotometer in the wavelength range from 400 to 2498 nm. A variety of spectral processing options, some of which give improved PCR analysis results, were applied prior to the regressions and allowed for testing of over 100 PCR models. All PCR model results are based on the PRESS statistic by one-out-rotation, a fast approximation of the PRESS statistic (to reduce computer time) or on cluster analysis using separate calibration and validation data sets. The standard error of prediction (SEP) of all the properties except UHM compared well to the reference method precision. The precision of the UHM measure by reflectance spectroscopy was strongly influenced by the sample repack error. The SEP of UHM, UI and STR was improved by excluding the Pimas from the data set.

Download Full-text

The Application of FT-IR Spectroscopy for Quality Control of Flours Obtained from Polish Producers

Journal of Analytical Methods in Chemistry ◽

10.1155/2017/4315678 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 18

Author(s):

Katarzyna Sujka ◽

Piotr Koczoń ◽

Alicja Ceglińska ◽

Magdalena Reder ◽

Hanna Ciemniewska-Żytkiewicz

Keyword(s):

Spectral Data ◽

Statistical Models ◽

Calibration Data ◽

Validation Data ◽

Data Set ◽

Mir Spectroscopy ◽

Calibration Models ◽

Ft Ir ◽

Leave One Out

Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models’ characteristics were as follows: R2=0.97, PRESS = 2.14; R2=0.96, PRESS = 0.69; R2=0.95, PRESS = 1.27; R2=0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2=0.86; 0.82; and 0.78, resp.).

Download Full-text

Large Sample and Jackknife Procedures for Small Sample Orthogonal Least Squares Inference

Communications in Statistics - Simulation and Computation ◽

10.1080/03610917508548348 ◽

1975 ◽

Vol 4 (2) ◽

pp. 193-202

Author(s):

Peter Anderson

Keyword(s):

Least Squares ◽

Small Sample ◽

Large Sample ◽

Orthogonal Least Squares

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Development and temporal external validation of a simple risk score tool for prediction of outcomes after severe head injury based on admission characteristics from level-1 trauma centre of India using retrospectively collected data

BMJ Open ◽

10.1136/bmjopen-2020-040778 ◽

2021 ◽

Vol 11 (1) ◽

pp. e040778

Author(s):

Vineet Kumar Kamal ◽

Ravindra Mohan Pandey ◽

Deepak Agrawal

Keyword(s):

Hospital Mortality ◽

External Validation ◽

Trauma Centre ◽

Unfavourable Outcome ◽

Motor Score ◽

Validation Data ◽

Data Set ◽

Development Data ◽

Level 1 ◽

Pupillary Reactivity

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.

Download Full-text

Orthogonal Least Squares Detector for Generalized Spatial Modulation

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2021.3065383 ◽

2021 ◽

pp. 1-1

Author(s):

Jinming Wen ◽

Jie Li ◽

Huanmin Ge ◽

Zhengchun Zhou ◽

Weiqi Luo

Keyword(s):

Least Squares ◽

Spatial Modulation ◽

Orthogonal Least Squares

Download Full-text

A novel ferroptosis related gene signature is associated with prognosis in patients with ovarian serous cystadenocarcinoma

Scientific Reports ◽

10.1038/s41598-021-90126-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zhixiang Yu ◽

Haiyan He ◽

Yanan Chen ◽

Qiuhe Ji ◽

Min Sun

Keyword(s):

Cox Regression ◽

Expression Profiles ◽

Critical Role ◽

Gene Signature ◽

Cancer Genome ◽

Training Data ◽

Hub Genes ◽

Validation Data ◽

Data Set ◽

Cox Regression Analysis

AbstractOvarian cancer (OV) is a common type of carcinoma in females. Many studies have reported that ferroptosis is associated with the prognosis of OV patients. However, the mechanism by which this occurs is not well understood. We utilized Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) to identify ferroptosis-related genes in OV. In the present study, we applied Cox regression analysis to select hub genes and used the least absolute shrinkage and selection operator to construct a prognosis prediction model with mRNA expression profiles and clinical data from TCGA. A series of analyses for this signature was performed in TCGA. We then verified the identified signature using International Cancer Genome Consortium (ICGC) data. After a series of analyses, we identified six hub genes (DNAJB6, RB1, VIMP/ SELENOS, STEAP3, BACH1, and ALOX12) that were then used to construct a model using a training data set. The model was then tested using a validation data set and was found to have high sensitivity and specificity. The identified ferroptosis-related hub genes might play a critical role in the mechanism of OV development. The gene signature we identified may be useful for future clinical applications.

Download Full-text

Optimal Restricted Isometry Condition of Normalized Sampling Matrices for Exact Sparse Recovery with Orthogonal Least Squares

IEEE Transactions on Signal Processing ◽

10.1109/tsp.2021.3060010 ◽

2021 ◽

pp. 1-1

Author(s):

Junhan Kim ◽

Jian Wang ◽

Byonghyo Shim

Keyword(s):

Least Squares ◽

Sparse Recovery ◽

Orthogonal Least Squares

Download Full-text

Evaluation of the CONSUME and FOFEM fuel consumption models in pine and mixed hardwood forests of the eastern United States

Canadian Journal of Forest Research ◽

10.1139/cjfr-2013-0499 ◽

2014 ◽

Vol 44 (7) ◽

pp. 784-795 ◽

Cited By ~ 10

Author(s):

Susan J. Prichard ◽

Eva C. Karau ◽

Roger D. Ottmar ◽

Maureen C. Kennedy ◽

James B. Cronan ◽

...

Keyword(s):

United States ◽

Fuel Consumption ◽

Prescribed Burning ◽

Forest Type ◽

Fire Effects ◽

Eastern United States ◽

Validation Data ◽

Data Set ◽

Predicted Values ◽

Fine Fuels

Reliable predictions of fuel consumption are critical in the eastern United States (US), where prescribed burning is frequently applied to forests and air quality is of increasing concern. CONSUME and the First Order Fire Effects Model (FOFEM), predictive models developed to estimate fuel consumption and emissions from wildland fires, have not been systematically evaluated for application in the eastern US using the same validation data set. In this study, we compiled a fuel consumption data set from 54 operational prescribed fires (43 pine and 11 mixed hardwood sites) to assess each model’s uncertainties and application limits. Regions of indifference between measured and predicted values by fuel category and forest type represent the potential error that modelers could incur in estimating fuel consumption by category. Overall, FOFEM predictions have narrower regions of indifference than CONSUME and suggest better correspondence between measured and predicted consumption. However, both models offer reliable predictions of live fuel (shrubs and herbaceous vegetation) and 1 h fine fuels. Results suggest that CONSUME and FOFEM can be improved in their predictive capability for woody fuel, litter, and duff consumption for eastern US forests. Because of their high biomass and potential smoke management problems, refining estimates of litter and duff consumption is of particular importance.

Download Full-text

Regularization and datuming of seismic data by weighted, damped least squares

Geophysics ◽

10.1190/1.2235616 ◽

2006 ◽

Vol 71 (5) ◽

pp. U67-U76 ◽

Cited By ~ 7

Author(s):

Robert J. Ferguson

Keyword(s):

Least Squares ◽

Seismic Data ◽

Synthetic Data ◽

Real Data ◽

Structural Data ◽

Irregular Sampling ◽

Data Set ◽

Near Surface ◽

Damped Least Squares ◽

Wavefield Extrapolation

The possibility of improving regularization/datuming of seismic data is investigated by treating wavefield extrapolation as an inversion problem. Weighted, damped least squares is then used to produce the regularized/datumed wavefield. Regularization/datuming is extremely costly because of computing the Hessian, so an efficient approximation is introduced. Approximation is achieved by computing a limited number of diagonals in the operators involved. Real and synthetic data examples demonstrate the utility of this approach. For synthetic data, regularization/datuming is demonstrated for large extrapolation distances using a highly irregular recording array. Without approximation, regularization/datuming returns a regularized wavefield with reduced operator artifacts when compared to a nonregularizing method such as generalized phase shift plus interpolation (PSPI). Approximate regularization/datuming returns a regularized wavefield for approximately two orders of magnitude less in cost; but it is dip limited, though in a controllable way, compared to the full method. The Foothills structural data set, a freely available data set from the Rocky Mountains of Canada, demonstrates application to real data. The data have highly irregular sampling along the shot coordinate, and they suffer from significant near-surface effects. Approximate regularization/datuming returns common receiver data that are superior in appearance compared to conventional datuming.

Download Full-text