sparse pca
Recently Published Documents


TOTAL DOCUMENTS

96
(FIVE YEARS 26)

H-INDEX

17
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Santanu S. Dey ◽  
Rahul Mazumder ◽  
Guanyi Wang

Dual Bounds of Sparse Principal Component Analysis Sparse principal component analysis (PCA) is a widely used dimensionality reduction tool in machine learning and statistics. Compared with PCA, sparse PCA enhances the interpretability by incorporating a sparsity constraint. However, unlike PCA, conventional heuristics for sparse PCA cannot guarantee the qualities of obtained primal feasible solutions via associated dual bounds in a tractable fashion without underlying statistical assumptions. In “Using L1-Relaxation and Integer Programming to Obtain Dual Bounds for Sparse PCA,” Santanu S. Dey, Rahul Mazumder, and Guanyi Wang present a convex integer programming (IP) framework of sparse PCA to derive dual bounds. They show the worst-case results on the quality of the dual bounds provided by the convex IP. Moreover, the authors empirically illustrate that the proposed convex IP framework outperforms existing sparse PCA methods of finding dual bounds.


2021 ◽  
Author(s):  
Loc Tran ◽  
Bich Ngo ◽  
Tuan Tran ◽  
Lam Pham ◽  
An Mai

Psychometrika ◽  
2021 ◽  
Author(s):  
Rosember Guerra-Urzola ◽  
Katrijn Van Deun ◽  
Juan C. Vera ◽  
Klaas Sijtsma

AbstractPCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify the nonzero coefficients in the components, including rotation-thresholding methods and, more recently, PCA methods subject to sparsity inducing penalties or constraints. Here, we offer guidelines on how to choose among the different sparse PCA methods. Current literature misses clear guidance on the properties and performance of the different sparse PCA methods, often relying on the misconception that the equivalence of the formulations for ordinary PCA also holds for sparse PCA. To guide potential users of sparse PCA methods, we first discuss several popular sparse PCA methods in terms of where the sparseness is imposed on the loadings or on the weights, assumed model, and optimization criterion used to impose sparseness. Second, using an extensive simulation study, we assess each of these methods by means of performance measures such as squared relative error, misidentification rate, and percentage of explained variance for several data generating models and conditions for the population model. Finally, two examples using empirical data are considered.


Author(s):  
Maximilian Theisen ◽  
Gyula Dörgő ◽  
János Abonyi ◽  
Ahmet Palazoglu
Keyword(s):  

2021 ◽  
Author(s):  
Sundaresh Ram ◽  
Wenfei Tang ◽  
Alexander J. Bell ◽  
Cara Spencer ◽  
Alexander Buschhuas ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Alhadi Bustamam ◽  
Haris Hamzah ◽  
Nadya Asanul Husna ◽  
Sarah Syarofina ◽  
Nalendra Dwimantara ◽  
...  

Abstract Background: New dipeptidyl peptidase-4 (DPP-4) inhibitors need to be developed to be used as agents with low adverse effects for the treatment of type 2 diabetes mellitus. This study aims to build quantitative structure-activity relationship (QSAR) models using the artificial intelligence paradigm. Random Forest and Deep Neural Network are used to predict QSAR models. We compared principal component analysis (PCA) with sparse PCA as methods for transforming Rotation Forest. K-modes clustering with Levenshtein distance was used for the selection method of molecules, and CatBoost was used for the feature selection method. Results: The amount of the DPP-4 inhibitor molecules resulting from the selection process of molecules using K-Modes clustering algorithm is 1020 with logP range value of -1.6693 to 4.99044. Several fingerprint methods such as extended connectivity fingerprint and functional class fingerprint with diameters of 4 and 6 were used to construct four fingerprint datasets, ECFP_4, ECFP_6, FCFP_4, and FCFP_6. There are 1024 features from the four fingerprint datasets that are then selected using the CatBoost method. CatBoost can represent QSAR models with good performance for machine learning and deep learning methods respectively with evaluation metrics, such as Sensitivity, Specificity, Accuracy, and Matthew's correlation coefficient, all valued above 70% with a feature importance level of 60%, 70%, 80%, and 90%.Conclusions: The K-Modes clustering algorithm can produce a representative subset of DPP-4 inhibitor molecules. Feature selection in the fingerprint dataset using CatBoost is best used before making QSAR Classification and QSAR Regression models. QSAR Classification using Machine Learning and QSAR Classification using Deep Learning, each of which has an accuracy of above 70%. The Rotation Forest (PCA) model performed better than the Rotation Forest (Sparse PCA) model, both in the QSAR Classification and QSAR Regression model because the Rotation Forest (PCA) has a more effective time than the Rotation Forest (Sparse PCA).


2021 ◽  
Vol 208 ◽  
pp. 104212
Author(s):  
J. Camacho ◽  
A.K. Smilde ◽  
E. Saccenti ◽  
J.A. Westerhuis ◽  
Rasmus Bro
Keyword(s):  

2021 ◽  
Vol 69 ◽  
pp. 1507-1520
Author(s):  
Arnaud Breloy ◽  
Sandeep Kumar ◽  
Ying Sun ◽  
Daniel P. Palomar
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document