scholarly journals PCR, PLS, or OPLS Evaluation of different regression techniques for hypothesis generation

Author(s):  
Avani Ahuja

In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation

Processes ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 166
Author(s):  
Majed Aljunaid ◽  
Yang Tao ◽  
Hongbo Shi

Partial least squares (PLS) and linear regression methods are widely utilized for quality-related fault detection in industrial processes. Standard PLS decomposes the process variables into principal and residual parts. However, as the principal part still contains many components unrelated to quality, if these components were not removed it could cause many false alarms. Besides, although these components do not affect product quality, they have a great impact on process safety and information about other faults. Removing and discarding these components will lead to a reduction in the detection rate of faults, unrelated to quality. To overcome the drawbacks of Standard PLS, a novel method, MI-PLS (mutual information PLS), is proposed in this paper. The proposed MI-PLS algorithm utilizes mutual information to divide the process variables into selected and residual components, and then uses singular value decomposition (SVD) to further decompose the selected part into quality-related and quality-unrelated components, subsequently constructing quality-related monitoring statistics. To ensure that there is no information loss and that the proposed MI-PLS can be used in quality-related and quality-unrelated fault detection, a principal component analysis (PCA) model is performed on the residual component to obtain its score matrix, which is combined with the quality-unrelated part to obtain the total quality-unrelated monitoring statistics. Finally, the proposed method is applied on a numerical example and Tennessee Eastman process. The proposed MI-PLS has a lower computational load and more robust performance compared with T-PLS and PCR.


2009 ◽  
Vol 51 (2) ◽  
pp. 1-19 ◽  
Author(s):  
Monica Gomez ◽  
Shintaro Okazaki

Despite abundant research that examines the effects of store brands on retail decision making, little attention has been paid to the predictive model of store brand shelf space. This paper intends to fill this research gap by proposing and testing a theoretical model of store brand shelf space. From the literature review, 11 independent variables were identified (i.e. store format, reputation, brand assortment, depth of assortment, in-store promotions, leading national brands’ rivalry, retailers’ rivalry, manufacturers’ concentration, store brand market share, advertising, and innovation) and analysed as potential predictors of the dependent variable (i.e. store brand shelf space). Data were collected for 29 product categories in 55 retail stores. In designing the statistical treatment, a three-phase procedure was adopted: (1) interdependence analysis via principal component analysis; (2) dependence analysis via neural network simulation; and (3) structural equation modelling via partial least squares. The findings corroborate our proposed model, in that all hypothesised relationships and directions are supported. On this basis, we draw theoretical as well as managerial implications. In closing, we acknowledge the limitations of this study and suggest future research directions.


2017 ◽  
Vol 47 (1) ◽  
Author(s):  
Fernanda Gomes da Silveira ◽  
Darlene Ana Souza Duarte ◽  
Lucas Monteiro Chaves ◽  
Fabyano Fonseca e Silva ◽  
Ivan Carvalho Filho ◽  
...  

ABSTRACT: The main application of genomic selection (GS) is the early identification of genetically superior animals for traits difficult-to-measure or lately evaluated, such as meat pH (measured after slaughter). Because the number of markers in GS is generally larger than the number of genotyped animals and these markers are highly correlated owing to linkage disequilibrium, statistical methods based on dimensionality reduction have been proposed. Among them, the partial least squares (PLS) technique stands out, because of its simplicity and high predictive accuracy. However, choosing the optimal number of components remains a relevant issue for PLS applications. Thus, we applied PLS (and principal component and traditional multiple regression) techniques to GS for pork pH traits (with pH measured at 45min and 24h after slaughter) and also identified the optimal number of PLS components based on the degree-of-freedom (DoF) and cross-validation (CV) methods. The PLS method out performs the principal component and traditional multiple regression techniques, enabling satisfactory predictions for pork pH traits using only genotypic data (low-density SNP panel). Furthermore, the SNP marker estimates from PLS revealed a relevant region on chromosome 4, which may affect these traits. The DoF and CV methods showed similar results for determining the optimal number of components in PLS analysis; thus, from the statistical viewpoint, the DoF method should be preferred because of its theoretical background (based on the "statistical information theory"), whereas CV is an empirical method based on computational effort.


Processes ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1691
Author(s):  
Nikesh Patel ◽  
Kavitha Sivanathan ◽  
Prashant Mhaskar

This paper addresses the problem of quality modeling in polymethyl methacrylate (PMMA) production. The key challenge is handling the large amounts of missing quality measurements in each batch due to the time and cost sensitive nature of the measurements. To this end, a missing data subspace algorithm that adapts nonlinear iterative partial least squares (NIPALS) algorithms from both partial least squares (PLS) and principal component analysis (PCA) is utilized to build a data driven dynamic model. The use of NIPALS algorithms allows for the correlation structure of the input–output data to minimize the impact of the large amounts of missing quality measurements. These techniques are utilized in a simulated case study to successfully model the PMMA process in particular, and demonstrate the efficacy of the algorithm to handle the quality prediction problem in general.


2017 ◽  
Vol 2 (1) ◽  
pp. 21
Author(s):  
Muhammad Amin Paris

Structural Equation Modeling (SEM) is one of multivariate techniques  that can estimates a series of interrelated dependence relationships from a number of endogenous and exogenous variables, as well as latent (unobserved) variables simultaneously. Estimation of Parameter methods that is often applied in SEM are Maximum Likelihood (ML), Weighted Least Squares (WLS), Unweighted Least Squares (ULS), Generalized Least Squares (GLS) and Partial Least Squares (PLS). This research aims to compare ULS method and PLS method in estimating parameter model of achievement of student learning in first year undergraduate Mathematics students, FMIPA, Bogor  Agricultural University ( IPB). This research use secondary and primary data which amounts to 112. The result of this research indicates that ULS method is more accurate than PLS methods. The analysis done with ULS method shows that motivation, capability and environmental had an effect to achievement of student learning.


Sign in / Sign up

Export Citation Format

Share Document