Canonical Correlation Analysis for Multiview Semisupervised Feature Extraction

Author(s):  
Olcay Kursun ◽  
Ethem Alpaydin
2011 ◽  
Vol 18 (3) ◽  
pp. 399-436
Author(s):  
SAMI VIRPIOJA ◽  
MARI-SANNA PAUKKERI ◽  
ABHISHEK TRIPATHI ◽  
TIINA LINDH-KNUUTILA ◽  
KRISTA LAGUS

AbstractVector space models are used in language processing applications for calculating semantic similarities of words or documents. The vector spaces are generated with feature extraction methods for text data. However, evaluation of the feature extraction methods may be difficult. Indirect evaluation in an application is often time-consuming and the results may not generalize to other applications, whereas direct evaluations that measure the amount of captured semantic information usually require human evaluators or annotated data sets. We propose a novel direct evaluation method based on canonical correlation analysis (CCA), the classical method for finding linear relationship between two data sets. In our setting, the two sets are parallel text documents in two languages. A good feature extraction method should provide representations that reflect the semantic content of the documents. Assuming that the underlying semantic content is independent of the language, we can study feature extraction methods that capture the content best by measuring dependence between the representations of a document and its translation. In the case of CCA, the applied measure of dependence is correlation. The evaluation method is based on unsupervised learning, it is language- and domain-independent, and it does not require additional resources besides a parallel corpus. In this paper, we demonstrate the evaluation method on a sentence-aligned parallel corpus. The method is validated by showing that the obtained results with bag-of-words representations are intuitive and agree well with the previous findings. Moreover, we examine the performance of the proposed evaluation method with indirect evaluation methods in simple sentence matching tasks, and a quantitative manual evaluation of word translations. The results of the proposed method correlate well with the results of the indirect and manual evaluations.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Chin-Teng Lin ◽  
Chih-Sheng Huang ◽  
Wen-Yu Yang ◽  
Avinash Kumar Singh ◽  
Chun-Hsiang Chuang ◽  
...  

Electroencephalogram (EEG) signals are usually contaminated with various artifacts, such as signal associated with muscle activity, eye movement, and body motion, which have a noncerebral origin. The amplitude of such artifacts is larger than that of the electrical activity of the brain, so they mask the cortical signals of interest, resulting in biased analysis and interpretation. Several blind source separation methods have been developed to remove artifacts from the EEG recordings. However, the iterative process for measuring separation within multichannel recordings is computationally intractable. Moreover, manually excluding the artifact components requires a time-consuming offline process. This work proposes a real-time artifact removal algorithm that is based on canonical correlation analysis (CCA), feature extraction, and the Gaussian mixture model (GMM) to improve the quality of EEG signals. The CCA was used to decompose EEG signals into components followed by feature extraction to extract representative features and GMM to cluster these features into groups to recognize and remove artifacts. The feasibility of the proposed algorithm was demonstrated by effectively removing artifacts caused by blinks, head/body movement, and chewing from EEG recordings while preserving the temporal and spectral characteristics of the signals that are important to cognitive research.


Sign in / Sign up

Export Citation Format

Share Document