number of principal components
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 6)

H-INDEX

13
(FIVE YEARS 0)

2021 ◽  
Vol 7 (1) ◽  
pp. 1-12
Author(s):  
Borislava Vrigazova

The confirmed approach to choosing the number of principal components for prediction models includes exploring the contribution of each principal component to the total variance of the target variable. A combination of possible important principal components can be chosen to explain a big part of the variance in the target. Sometimes several combinations of principal components should be explored to achieve the highest accuracy in classification. This research proposes a novel automatic way of deciding how many principal components should be retained to improve classification accuracy. We do that by combining principal components with the ANOVA selection. To improve the accuracy resulting from our automatic approach, we use the bootstrap procedure for model selection. We call this procedure the Bootstrapped-ANOVA PCA selection. Our results suggest that this procedure can automate the principal components selection and improve the accuracy of classification models, in our example, the logistic regression. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Author(s):  
Avani Ahuja

In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation


2021 ◽  
Vol 20 (3) ◽  
pp. 450-461
Author(s):  
Stanley L. Sclove

AbstractThe use of information criteria, especially AIC (Akaike’s information criterion) and BIC (Bayesian information criterion), for choosing an adequate number of principal components is illustrated.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248896
Author(s):  
Nico Migenda ◽  
Ralf Möller ◽  
Wolfram Schenck

“Principal Component Analysis” (PCA) is an established linear technique for dimensionality reduction. It performs an orthonormal transformation to replace possibly correlated variables with a smaller set of linearly independent variables, the so-called principal components, which capture a large portion of the data variance. The problem of finding the optimal number of principal components has been widely studied for offline PCA. However, when working with streaming data, the optimal number changes continuously. This requires to update both the principal components and the dimensionality in every timestep. While the continuous update of the principal components is widely studied, the available algorithms for dimensionality adjustment are limited to an increment of one in neural network-based and incremental PCA. Therefore, existing approaches cannot account for abrupt changes in the presented data. The contribution of this work is to enable in neural network-based PCA the continuous dimensionality adjustment by an arbitrary number without the necessity to learn all principal components. A novel algorithm is presented that utilizes several PCA characteristics to adaptivly update the optimal number of principal components for neural network-based PCA. A precise estimation of the required dimensionality reduces the computational effort while ensuring that the desired amount of variance is kept. The computational complexity of the proposed algorithm is investigated and it is benchmarked in an experimental study against other neural network-based and incremental PCA approaches where it produces highly competitive results.


Author(s):  
Robert Beinert ◽  
Gabriele Steidl

AbstractPrincipal component analysis (PCA) is known to be sensitive to outliers, so that various robust PCA variants were proposed in the literature. A recent model, called reaper, aims to find the principal components by solving a convex optimization problem. Usually the number of principal components must be determined in advance and the minimization is performed over symmetric positive semi-definite matrices having the size of the data, although the number of principal components is substantially smaller. This prohibits its use if the dimension of the data is large which is often the case in image processing. In this paper, we propose a regularized version of reaper which enforces the sparsity of the number of principal components by penalizing the nuclear norm of the corresponding orthogonal projector. If only an upper bound on the number of principal components is available, our approach can be combined with the L-curve method to reconstruct the appropriate subspace. Our second contribution is a matrix-free algorithm to find a minimizer of the regularized reaper which is also suited for high-dimensional data. The algorithm couples a primal-dual minimization approach with a thick-restarted Lanczos process. This appears to be the first efficient convex variational method for robust PCA that can handle high-dimensional data. As a side result, we discuss the topic of the bias in robust PCA. Numerical examples demonstrate the performance of our algorithm.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10086
Author(s):  
Omneya Attallah ◽  
Dina A. Ragab ◽  
Maha Sharkas

Coronavirus (COVID-19) was first observed in Wuhan, China, and quickly propagated worldwide. It is considered the supreme crisis of the present era and one of the most crucial hazards threatening worldwide health. Therefore, the early detection of COVID-19 is essential. The common way to detect COVID-19 is the reverse transcription-polymerase chain reaction (RT-PCR) test, although it has several drawbacks. Computed tomography (CT) scans can enable the early detection of suspected patients, however, the overlap between patterns of COVID-19 and other types of pneumonia makes it difficult for radiologists to diagnose COVID-19 accurately. On the other hand, deep learning (DL) techniques and especially the convolutional neural network (CNN) can classify COVID-19 and non-COVID-19 cases. In addition, DL techniques that use CT images can deliver an accurate diagnosis faster than the RT-PCR test, which consequently saves time for disease control and provides an efficient computer-aided diagnosis (CAD) system. The shortage of publicly available datasets of CT images, makes the CAD system’s design a challenging task. The CAD systems in the literature are based on either individual CNN or two-fused CNNs; one used for segmentation and the other for classification and diagnosis. In this article, a novel CAD system is proposed for diagnosing COVID-19 based on the fusion of multiple CNNs. First, an end-to-end classification is performed. Afterward, the deep features are extracted from each network individually and classified using a support vector machine (SVM) classifier. Next, principal component analysis is applied to each deep feature set, extracted from each network. Such feature sets are then used to train an SVM classifier individually. Afterward, a selected number of principal components from each deep feature set are fused and compared with the fusion of the deep features extracted from each CNN. The results show that the proposed system is effective and capable of detecting COVID-19 and distinguishing it from non-COVID-19 cases with an accuracy of 94.7%, AUC of 0.98 (98%), sensitivity 95.6%, and specificity of 93.7%. Moreover, the results show that the system is efficient, as fusing a selected number of principal components has reduced the computational cost of the final model by almost 32%.


Biometrika ◽  
2018 ◽  
Vol 105 (2) ◽  
pp. 389-402 ◽  
Author(s):  
Sungkyu Jung ◽  
Myung Hee Lee ◽  
Jeongyoun Ahn

2017 ◽  
Vol 45 (6) ◽  
pp. 2590-2617 ◽  
Author(s):  
Yunjin Choi ◽  
Jonathan Taylor ◽  
Robert Tibshirani

Sign in / Sign up

Export Citation Format

Share Document