scholarly journals The use of principal component analysis to measure fundamental cognitive processes in neuropsychological data

2021 ◽  
Author(s):  
Christoph Sperber

For years, dissociation studies on neurological single cases were the dominant method to infer fundamental cognitive functions in neuropsychology. In contrast, the association between deficits was considered to be of less epistemological value and even misleading. Still, principal component analysis (PCA), an associational method for dimensionality reduction, recently became popular for the identification of fundamental functions. The current study evaluated the ability of PCA to identify the fundamental variables underlying a battery of measures. Synthetic data were simulated to resemble typical neuropsychological data, including varying dissociation patterns. In most experiments, PCA succeeded to measure the underlying target variables with high up to almost perfect precision. However, this success relied on additional factor rotation. Unroated PCA struggled with the dependence of data and often failed. On the other hand, the performance of rotated factor solutions required single measures that anchored the rotation. When no test scores existed that primarily and precisely measured each underlying target variable, rotated solutions also failed their intended purpose. Further, the dimensionality of the simulated data was consistently underestimated. Commonly used strategies to estimate the number of meaningful factors appear to be inappropriate for neuropsychological data. Finally, simulations suggested a high potential of PCA to denoise data, with factor rotation providing an additional filter function. This can be invaluable in neuropsychology, where measures are often inherently noisy, and PCA can be superior to common compound measures - such as the arithmetic mean - in the measurement of variables with high reliability. In summary, PCA appears to be a powerful tool in neuropsychology that is well capable to infer fundamental cognitive functions with high precision, but the typical structure of neuropsychological data places clear limitations and a risk of a complete methodological failure on the method.

Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 548 ◽  
Author(s):  
Yuqing Sun ◽  
Jun Niu

Hydrological regionalization is a useful step in hydrological modeling and prediction. The regionalization is not always straightforward, however, due to the lack of long-term hydrological data and the complex multi-scale variability features embedded in the data. This study examines the multiscale soil moisture variability for the simulated data on a grid cell base obtained from a large-scale hydrological model, and clusters the grid-cell based soil moisture data using wavelet-based multiscale entropy and principal component analysis, over the Xijiang River basin in South China, for the period of 2002–2010. The effective regionalization, for 169 grid cells with the special resolution of 0.5° × 0.5°, produced homogeneous groups based on the pattern of wavelet-based entropy information. Four distinct modes explain 80.14% of the total embedded variability of the transformed wavelet power across different timescales. Moreover, the possible implications of the regionalization results for local hydrological applications, such as parameter estimation for an ungagged catchment and designing a uniform prediction strategy for a sub-area in a large-scale basin, are discussed.


2014 ◽  
Vol 556-562 ◽  
pp. 4317-4320
Author(s):  
Qiang Zhang ◽  
Li Ping Liu ◽  
Chao Liu

As a zero-emission mode of transportation, an increasing number of Electric Vehicles (EV) have come into use in our daily lives. The EV charging station is an important component of the Smart Grid which is now facing the challenges of big data. This paper presents a data compression and reconstruction method based on the technique of Principal Component Analysis (PCA). The data reconstruction error Normalized Absolute Percent Error (NAPE) is taken into consideration to balance the compression ratio and data reconstruction quality. By using the simulated data, the effectiveness of data compression and reconstruction for EV charging stations are verified.


2019 ◽  
Author(s):  
Florian Wagner ◽  
Dalia Barkley ◽  
Itai Yanai

AbstractSingle-cell RNA-Seq measurements are commonly affected by high levels of technical noise, posing challenges for data analysis and visualization. A diverse array of methods has been proposed to computationally remove noise by sharing information across similar cells or genes, however their respective accuracies have been difficult to establish. Here, we propose a simple denoising strategy based on principal component analysis (PCA). We show that while PCA performed on raw data is biased towards highly expressed genes, this bias can be mitigated with a cell aggregation step, allowing the recovery of denoised expression values for both highly and lowly expressed genes. We benchmark our resulting ENHANCE algorithm and three previously described methods on simulated data that closely mimic real datasets, showing that ENHANCE provides the best overall denoising accuracy, recovering modules of co-expressed genes and cell subpopulations. Implementations of our algorithm are available at https://github.com/yanailab/enhance.


Geofluids ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Lihua Huang ◽  
Liudan Mao ◽  
YiRong Zhu ◽  
YuLing Wang

Aiming at the problems of low accuracy, low efficiency, and many parameters required in the current calculation of rock slope stability, a prediction model of rock slope stability is proposed, which combines principal component analysis (PCA) and relevance vector machine (RVM). In this model, PCA is used to reduce the dimension of several influencing factors, and four independent principal component variables are selected. With the help of RVM mapping the nonlinear relationship between the safety factor of slope stability and the principal component variables, the prediction model of rock slope stability based on PCA-RVM is established. The results show that under the same sample, the maximum relative error of the PCA-RVM model is only 1.26%, the average relative error is 0.95%, and the mean square error is 0.011, which is far lower than that of the RVM model and the GEP model. By comparing the results of traditional calculation method and PCA-RVM model, it can be concluded that the PCA-RVM model has the characteristics of high prediction accuracy, small discreteness, and high reliability, which provides reference value for accurately predicting the stability of rock slope.


Mathematics ◽  
2018 ◽  
Vol 6 (11) ◽  
pp. 269 ◽  
Author(s):  
Sergio Camiz ◽  
Valério Pillar

The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis (PCA) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules—have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA—and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.


2010 ◽  
Vol 08 (06) ◽  
pp. 995-1011 ◽  
Author(s):  
HAO ZHENG ◽  
HONGWEI WU

Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Xuanli Han ◽  
Jigen Peng ◽  
Angang Cui ◽  
Fujun Zhao

In this paper, we describe a novel approach to sparse principal component analysis (SPCA) via a nonconvex sparsity-inducing fraction penalty function SPCA (FP-SPCA). Firstly, SPCA is reformulated as a fraction penalty regression problem model. Secondly, an algorithm corresponding to the model is proposed and the convergence of the algorithm is guaranteed. Finally, numerical experiments were carried out on a synthetic data set, and the experimental results show that the FP-SPCA method is more adaptable and has a better performance in the tradeoff between sparsity and explainable variance than SPCA.


2017 ◽  
Vol 33 (1) ◽  
pp. 15-41 ◽  
Author(s):  
Aida Calviño

Abstract In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.


Author(s):  
S.M. Shaharudin ◽  
N. Ahmad ◽  
N.H. Zainuddin ◽  
N.S. Mohamed

A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and  compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a  more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.


Sign in / Sign up

Export Citation Format

Share Document