A Neural Network Based on Canonical Correlation for Multicollinearity Diagnosis

2013 ◽  
Vol 756-759 ◽  
pp. 3324-3329
Author(s):  
Ji Fu Nong

We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression to Canonical Correlation Analysis and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.

2017 ◽  
Vol 29 (10) ◽  
pp. 2825-2859 ◽  
Author(s):  
Jia Cai ◽  
Hongwei Sun

Canonical correlation analysis (CCA) is a useful tool in detecting the latent relationship between two sets of multivariate variables. In theoretical analysis of CCA, a regularization technique is utilized to investigate the consistency of its analysis. This letter addresses the consistency property of CCA from a least squares view. We construct a constrained empirical risk minimization framework of CCA and apply a two-stage randomized Kaczmarz method to solve it. In the first stage, we remove the noise, and in the second stage, we compute the canonical weight vectors. Rigorous theoretical consistency is addressed. The statistical consistency of this novel scenario is extended to the kernel version of it. Moreover, experiments on both synthetic and real-world data sets demonstrate the effectiveness and efficiency of the proposed algorithms.


2011 ◽  
Vol 18 (3) ◽  
pp. 399-436
Author(s):  
SAMI VIRPIOJA ◽  
MARI-SANNA PAUKKERI ◽  
ABHISHEK TRIPATHI ◽  
TIINA LINDH-KNUUTILA ◽  
KRISTA LAGUS

AbstractVector space models are used in language processing applications for calculating semantic similarities of words or documents. The vector spaces are generated with feature extraction methods for text data. However, evaluation of the feature extraction methods may be difficult. Indirect evaluation in an application is often time-consuming and the results may not generalize to other applications, whereas direct evaluations that measure the amount of captured semantic information usually require human evaluators or annotated data sets. We propose a novel direct evaluation method based on canonical correlation analysis (CCA), the classical method for finding linear relationship between two data sets. In our setting, the two sets are parallel text documents in two languages. A good feature extraction method should provide representations that reflect the semantic content of the documents. Assuming that the underlying semantic content is independent of the language, we can study feature extraction methods that capture the content best by measuring dependence between the representations of a document and its translation. In the case of CCA, the applied measure of dependence is correlation. The evaluation method is based on unsupervised learning, it is language- and domain-independent, and it does not require additional resources besides a parallel corpus. In this paper, we demonstrate the evaluation method on a sentence-aligned parallel corpus. The method is validated by showing that the obtained results with bag-of-words representations are intuitive and agree well with the previous findings. Moreover, we examine the performance of the proposed evaluation method with indirect evaluation methods in simple sentence matching tasks, and a quantitative manual evaluation of word translations. The results of the proposed method correlate well with the results of the indirect and manual evaluations.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Xun Chen ◽  
Aiping Liu ◽  
Z. Jane Wang ◽  
Hu Peng

Corticomuscular activity modeling based on multiple data sets such as electroencephalography (EEG) and electromyography (EMG) signals provides a useful tool for understanding human motor control systems. In this paper, we propose modeling corticomuscular activity by combining partial least squares (PLS) and canonical correlation analysis (CCA). The proposed method takes advantage of both PLS and CCA to ensure that the extracted components are maximally correlated across two data sets and meanwhile can well explain the information within each data set. This complementary combination generalizes the statistical assumptions beyond both PLS and CCA methods. Simulations were performed to illustrate the performance of the proposed method. We also applied the proposed method to concurrent EEG and EMG data collected in a Parkinson’s disease (PD) study. The results reveal several highly correlated temporal patterns between EEG and EMG signals and indicate meaningful corresponding spatial activation patterns. In PD subjects, enhanced connections between occipital region and other regions are noted, which is consistent with previous medical knowledge. The proposed framework is a promising technique for performing multisubject and bimodal data analysis.


2020 ◽  
Vol 57 (1) ◽  
pp. 1-12
Author(s):  
Tomasz Górecki ◽  
Mirosław Krzyśko ◽  
Waldemar Wołyński

SummaryThere is a growing need to analyze data sets characterized by several sets of variables observed on the same set of individuals. Such complex data structures are known as multiblock (or multiple-set) data sets. Multi-block data sets are encountered in diverse fields including bioinformatics, chemometrics, food analysis, etc. Generalized Canonical Correlation Analysis (GCCA) is a very powerful method to study this kind of relationships between blocks. It can also be viewed as a method for the integration of information from K > 2 distinct sources (Takane and Oshima-Takane 2002). In this paper, GCCA is considered in the context of multivariate functional data. Such data are treated as realizations of multivariate random processes. GCCA is a technique that allows the joint analysis of several sets of data through dimensionality reduction. The central problem of GCCA is to construct a series of components aiming to maximize the association among the multiple variable sets. This method will be presented for multivariate functional data. Finally, a practical example will be discussed.


Biostatistics ◽  
2020 ◽  
Author(s):  
Arnaud Gloaguen ◽  
Cathy Philippe ◽  
Vincent Frouin ◽  
Giulia Gennari ◽  
Ghislaine Dehaene-Lambertz ◽  
...  

Summary Regularized generalized canonical correlation analysis (RGCCA) is a general multiblock data analysis framework that encompasses several important multivariate analysis methods such as principal component analysis, partial least squares regression, and several versions of generalized canonical correlation analysis. In this article, we extend RGCCA to the case where at least one block has a tensor structure. This method is called multiway generalized canonical correlation analysis (MGCCA). Convergence properties of the MGCCA algorithm are studied, and computation of higher-level components are discussed. The usefulness of MGCCA is shown on simulation and on the analysis of a cognitive study in human infants using electroencephalography (EEG).


2013 ◽  
Vol 50 (2) ◽  
pp. 95-105 ◽  
Author(s):  
Mirosław Krzyśko ◽  
Łukasz Waszak

Summary Classical canonical correlation analysis seeks the associations between two data sets, i.e. it searches for linear combinations of the original variables having maximal correlation. Our task is to maximize this correlation, and is equivalent to solving a generalized eigenvalue problem. The maximal correlation coefficient (being a solution of this problem) is the first canonical correlation coefficient. In this paper we propose a new method of constructing canonical correlations and canonical variables for a pair of stochastic processes represented by a finite number of orthonormal basis functions.


Sign in / Sign up

Export Citation Format

Share Document