Performances of some high dimensional regression methods: sparse principal component regression

Author(s):  
Fatma Sevinç Kurnaz
1988 ◽  
Vol 42 (7) ◽  
pp. 1273-1284 ◽  
Author(s):  
Tomas Isaksson ◽  
Tormod Næs

Near-infrared (NIR) reflectance spectra of five different food products were measured. The spectra were transformed by multiplicative scatter correction (MSC). Principal component regression (PCR) was performed, on both scatter-corrected and uncorrected spectra. Calibration and prediction were performed for four food constituents: protein, fat, water, and carbohydrates. All regressions gave lower prediction errors (7–68% improvement) by the use of MSC spectra than by the use of uncorrected absorbance spectra. One of these data sets was studied in more detail to clarify the effects of the MSC, by using PCR score, residual, and leverage plots. The improvement by using nonlinear regression methods is indicated.


Horticulturae ◽  
2021 ◽  
Vol 7 (3) ◽  
pp. 56
Author(s):  
Milon Chowdhury ◽  
Viet-Duc Ngo ◽  
Md Nafiul Islam ◽  
Mohammod Ali ◽  
Sumaiya Islam ◽  
...  

The spectral reflectance technique for the quantification of the functional components was applied in different studies for different crops, but related research on kale leaves is limited. This study was conducted to estimate the glucosinolate and anthocyanin components of kale leaves cultivated in a plant factory based on diffuse reflectance spectroscopy through regression methods. Kale was grown in a plant factory under different treatments. After specific periods of transplantation, leaf samples were collected, and reflectance spectra were measured immediately from nine different points on each leaf. The same leaf samples were freeze-dried and stored for analysis of the functional components. Regression procedures, such as principal component regression (PCR), partial least squares regression (PLSR), and stepwise multiple linear regression (SMLR), were applied to relate the functional components with the spectral data. In the laboratory analysis, progoitrin and glucobrassicin, as well as cyanidin and malvidin, were found to be dominating components in glucosinolates and anthocyanins, respectively. From the overall analysis, the SMLR model showed better performance, and the identified wavelengths for estimating the glucosinolates and anthocyanins were in the early near-infrared (NIR) region. Specifically, reflectance at 742, 761, 787, 796, 805, 833, 855, 932, 947, and 1000 nm showed a strong correlation.


2019 ◽  
Vol 30 (3) ◽  
pp. 697-719 ◽  
Author(s):  
Fan Wang ◽  
Sach Mukherjee ◽  
Sylvia Richardson ◽  
Steven M. Hill

AbstractPenalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.


Sign in / Sign up

Export Citation Format

Share Document