Multiple and complex network built by the path coefficients for partial least squares variable selection research

Author(s):  
Wangping Xiong ◽  
Ying Xiong ◽  
Jianqiang Du ◽  
Bin Nie
2002 ◽  
Vol 56 (3) ◽  
pp. 337-345 ◽  
Author(s):  
S. Kamaledin Setarehdan ◽  
John J. Soraghan ◽  
David Littlejohn ◽  
Daran A. Sadler

Circulation ◽  
2021 ◽  
Vol 143 (Suppl_1) ◽  
Author(s):  
Natalie Gasca ◽  
Robyn McClelland

Most nutritional epidemiology studies investigating trends between diet and heart disease use outcome-independent dimension reduction methods, like principal component analysis, to create dietary patterns. While these methods construct patterns that describe important aspects of food consumption, these patterns are not inherently related to heart disease. Incorporating disease data into the pattern construction offers the possibility of more concisely summarizing the most disease-related foods. Sparse partial least squares (SPLS), one such method, was found to have favorable interpretation and prediction properties in the continuous outcome setting; while selecting a subset of relevant foods, it constructed a few dietary patterns that were correlated with BMI while also capturing variation in diet composition. These results were validated with simulated data. We propose incorporating SPLS into the Cox proportional hazards model to analyze a right-censored survival outcome. We hypothesized that this method would inherit the beneficial parsimony properties seen in the continuous setting, and we assessed whether this proposed method could use the most relevant covariates to create a few patterns that were associated with a survival outcome. While the proposed method targets covariate-level sparsity (i.e. variable selection), one competitor method exists that integrates pattern-level parsimony and partial least squares (PLS) in the Cox model, but it imposes more model parameters than the proposed method. We compared the variable selection, pattern selection, and predictive performance of four survival methods (Lasso, PLS, competitor sparse PLS, and proposed SPLS) via a simulation study. Simulation settings were informed in part by the Multi-Ethnic Study of Atherosclerosis (MESA), which has detailed food frequency questionnaire data on a large multi-ethnic population-based sample (6814 participants aged 45-84), as well as subsequent cardiovascular disease follow-up for over 15 years. In most studied simulation settings, the proposed method selected all 9 relevant predictors and the fewest number of irrelevant predictors (of 15) while creating a similar number of patterns and maintaining predictive ability of the outcome. In the setting most comparable to MESA, PLS chose all 24 predictors (by default) and 3.4 patterns (C-statistic=0.90), the competitor SPLS selected 21.1 predictors and 4.4 patterns (C-statistic=0.91), Lasso chose 16.4 predictors (C-statistic=0.91), and the proposed SPLS selected 11.7 predictors and 4.3 patterns (C-statistic=0.91), on average. We will also present an analysis of a coronary event in MESA using these four survival methods. In conclusion, we propose that using methods like SPLS to summarize food intake can create more heart disease-tailored dietary patterns that can complement the current nutritional epidemiology literature.


2014 ◽  
Vol 70 (5) ◽  
Author(s):  
Nor Fazila Rasaruddin ◽  
Mas Ezatul Nadia Mohd Ruah ◽  
Mohamed Noor Hasan ◽  
Mohd Zuli Jaafar

This paper shows the determination of iodine value (IV) of pure and frying palm oils using Partial Least Squares (PLS) regression with application of variable selection. A total of 28 samples consisting of pure and frying palm oils which acquired from markets. Seven of them were considered as high-priced palm oils while the remaining was low-priced. PLS regression models were developed for the determination of IV using Fourier Transform Infrared (FTIR) spectra data in absorbance mode in the range from 650 cm-1 to 4000 cm-1. Savitzky Golay derivative was applied before developing the prediction models. The models were constructed using wavelength selected in the FTIR region by adopting selectivity ratio (SR) plot and correlation coefficient to the IV parameter. Each model was validated through Root Mean Square Error Cross Validation, RMSECV and cross validation correlation coefficient, R2cv. The best model using SR plot was the model with mean centring for pure sample and model with a combination of row scaling and standardization of frying sample. The best model with the application of the correlation coefficient variable selection was the model with a combination of row scaling and standardization of pure sample and model with mean centering data pre-processing for frying sample. It is not necessary to row scaled the variables to develop the model since the effect of row scaling on model quality is insignificant.


2013 ◽  
Vol 28 (5) ◽  
pp. 439-447 ◽  
Author(s):  
Åsmund Rinnan ◽  
Martin Andersson ◽  
Carsten Ridder ◽  
Søren Balling Engelsen

2018 ◽  
Vol 26 (2) ◽  
pp. 95-100 ◽  
Author(s):  
Yanjie Li ◽  
Wenhao Shao ◽  
Ruxiang Dong ◽  
Jingmin Jiang ◽  
Songfeng Diao

In this study, near infrared spectroscopy has been demonstrated to quickly determine the saponin content in soapnut fruits. Partial least squares analysis combined with pre-processing methods and significance multivariate correlation variable selection was introduced to develop a statistical model calibrated for saponin content in soapnut fruits. The results showed that the first derivative yielded the best partial least squares calibration models with spectra of both the surface of dried fruits and the powder of dry seeded fruits with root mean square error of calibration values of 0.85% and 0.59%, respectively. The surface model presented less accuracy than the powder model. However, when the significance multivariate correlation variable selection method was applied to select the best variables from the spectra, the partial least squares models using spectra of surface and powder samples became similar, with higher R2 values (0.84 and 0.90), lower root mean square error of calibration values of 0.23% and 0.39%. It was suggested that near infrared spectroscopy could be a promising and rapid method for predicting the saponin content in the soapnut fruits without grinding them into powder.


Sign in / Sign up

Export Citation Format

Share Document