scholarly journals An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data

Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2211
Author(s):  
Siti Zahariah ◽  
Habshah Midi ◽  
Mohd Shafie Mustafa

Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where p>>n. The statistically inspired modification of the partial least squares (SIMPLS) is a very popular technique for solving a partial least squares regression problem due to its efficiency, speed, and ease of understanding. The execution of SIMPLS is based on the empirical covariance matrix of explanatory variables and response variables. Nevertheless, SIMPLS is very easily affected by outliers. In order to rectify this problem, a robust iteratively reweighted SIMPLS (RWSIMPLS) is introduced. Nonetheless, it is still not very efficient as the algorithm of RWSIMPLS is based on a weighting function that does not specify any method of identification of high leverage points (HLPs), i.e., outlying observations in the X-direction. HLPs have the most detrimental effect on the computed values of various estimates, which results in misleading conclusions about the fitted regression model. Hence, their effects need to be reduced by assigning smaller weights to them. As a solution to this problem, we propose an improvised SIMPLS based on a new weight function obtained from the MRCD-PCA diagnostic method of the identification of HLPs for HDD and name this method MRCD-PCA-RWSIMPLS. A new MRCD-PCA-RWSIMPLS diagnostic plot is also established for classifying observations into four data points, i.e., regular observations, vertical outliers, and good and bad leverage points. The numerical examples and Monte Carlo simulations signify that MRCD-PCA-RWSIMPLS offers substantial improvements over SIMPLS and RWSIMPLS. The proposed diagnostic plot is able to classify observations into correct groups. On the contrary, SIMPLS and RWSIMPLS plots fail to correctly classify observations into correct groups and show masking and swamping effects.


Author(s):  
Zhi-yong Zhang ◽  
Xin Liu ◽  
Cai-xia Huang ◽  
Da Pan

This paper introduces an application of non-linear partial least squares for vibro-acoustic regression modeling and for an industrial sewing machine. In the vibro-acoustic regression model, the vibration accelerations of reference points are defined as explanatory variables, while the noise sound pressure of target points is defined as response variables, and the number of explanatory variables is determined initially by a correlation analysis in the time domain. To improve predictive accuracy while a non-linear relationship exists between the explanatory and response variables, the explanatory variables are preprocessed by kernel function transformation. The comparison of regressive noise sound pressure to experimental data indicates that the non-linear partial least squares regression model has high predictive accuracy. Furthermore, the contributions of vibration accelerations to noise sound pressure are analyzed, by which the structure optimizations are guided and practiced. The comparison of noise test results before and after optimization testifies to the effectiveness of the contribution analysis.



2018 ◽  
Vol 7 (4.30) ◽  
pp. 106
Author(s):  
N S M Shariff ◽  
H M B Duzan

The Ordinary Least Squares (OLS) is a common method to investigate the linear relationship among variable of interest. The presence of multicollinearity will produce unreliable result in the parameter estimates if OLS is applied to estimate the model. Due to such reason, this study aims to use the proposed ridge estimator as linear combinations of the coefficient of least squares regression of explanatory variables to the real application. The numerical example of stock market price and macroeconomic variables in Malaysia is employed using both methods with the aim of investigating the relationship of the variables in the presence of multicollinearity in the data set.  The variables on interest are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The obtained findings show that the proposed procedure is able to estimate the model and produce reliable result by reducing the effect of multicollinearity in the data set.



2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.



2013 ◽  
Vol 38 (4) ◽  
pp. 465-470 ◽  
Author(s):  
Jingjie Yan ◽  
Xiaolan Wang ◽  
Weiyi Gu ◽  
LiLi Ma

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.



Beverages ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 12 ◽  
Author(s):  
Rosa Perestrelo ◽  
Catarina Silva ◽  
Carolina Gonçalves ◽  
Mariangie Castillo ◽  
José S. Câmara

Madeira wine is a fortified Portuguese wine, which has a crucial impact on the Madeira Island economy. The particular properties of Madeira wine result from the unique and specific winemaking and ageing processes that promote the occurrence of chemical reactions among acids, sugars, alcohols, and polyphenols, which are important to the extraordinary quality of the wine. These chemical reactions contribute to the appearance of novel compounds and/or the transformation of others, consequently promoting changes in qualitative and quantitative volatile and non-volatile composition. The current review comprises an overview of Madeira wines related to volatile (e.g., terpenes, norisoprenoids, alcohols, esters, fatty acids) and non-volatile composition (e.g., polyphenols, organic acids, amino acids, biogenic amines, and metals). Moreover, types of aroma compounds, the contribution of volatile organic compounds (VOCs) to the overall Madeira wine aroma, the change of their content during the ageing process, as well as the establishment of the potential ageing markers will also be reviewed. The viability of several analytical methods (e.g., gas chromatography-mass spectrometry (GC-MS), two-dimensional gas chromatography and time-of-flight mass spectrometry (GC×GC-ToFMS)) combined with chemometrics tools (e.g., partial least squares regression (PLS-R), partial least squares discriminant analysis (PLS-DA) was investigated to establish potential ageing markers to guarantee the Madeira wine authenticity. Acetals, furanic compounds, and lactones are the chemical families most commonly related with the ageing process.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.



Sign in / Sign up

Export Citation Format

Share Document