scholarly journals Conditional canonical correlation estimation based on covariates with random forests

Author(s):  
Cansu Alakuş ◽  
Denis Larocque ◽  
Sébastien Jacquemont ◽  
Fanny Barlaam ◽  
Charles-Olivier Martin ◽  
...  

Abstract Motivation Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. Results We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. Availability and implementation RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Sean D. McCabe ◽  
Dan-Yu Lin ◽  
Michael I. Love

AbstractSummaryThe growth of multi-omics datasets has given rise to many methods for identifying sources of common variation across data types. The unsupervised nature of these methods makes it difficult to evaluate their performance. We present MOVIE, Multi-Omics Visualization of Estimated contributions, as a framework for evaluating the degree of overfitting and the stability of unsupervised multi-omics methods. MOVIE plots the contributions of one data type against another to produce contribution plots, where contributions are calculated for each subject and each data type from the results of each multi-omics method. The usefulness of MOVIE is demonstrated by applying existing multi-omics methods to permuted null data and breast cancer data from The Cancer Genome Atlas. Contribution plots indicated that principal components-based Canonical Correlation Analysis overfit null data, while Sparse multiple Canonical Correlation Analysis and Multi-Omics Factor Analysis provided stable results with high specificity for both the real and permuted null datasets.AvailabilityMOVIE is available as an R package at https://github.com/mccabes292/[email protected] informationSupplementary data are available at Bioinformatics online.


2017 ◽  
Vol 5 (325) ◽  
Author(s):  
Mirosław Krzyśko ◽  
Łukasz Waszak

Canonical correlation methods for data representing functions or curves have received much attention in recent years. Such data, known in the literature as functional data (Ramsay and Silverman, 2005), has been the subject of much recent research interest. Examples of functional data can be found in several application domains, such as medicine, economics, meteorology and many others. Unfortunately, the multivariate data canonical correlation methods cannot be used directly for functional data, because of the problem of dimensionality and difficulty in taking into account the correlation and order of functional data. The problem of constructing canonical correlations and canonical variables for functional data was addressed by Leurgans et al. (1993), and further developments were made by Ramsay and Silverman (2005). In this paper we propose a new method of constructing canonical correlations and canonical variables for functional data.


1972 ◽  
Vol 9 (2) ◽  
pp. 187-192 ◽  
Author(s):  
Mark I. Alpert ◽  
Robert A. Peterson

Canonical correlation analysis has been increasingly applied to marketing problems. This article presents some suggestions for interpreting canonical correlations, particularly for avoiding overstatement of the shared variation between sets of independent variables and for explicating relationships among variables within each set.


2017 ◽  
Vol 15 (05) ◽  
pp. 1750018 ◽  
Author(s):  
Guoli Ji ◽  
Qianmin Lin ◽  
Yuqi Long ◽  
Congting Ye ◽  
Wenbin Ye ◽  
...  

Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.


2015 ◽  
Vol 32 (11) ◽  
pp. 2130-2146 ◽  
Author(s):  
Clarence O. Collins ◽  
C. Linwood Vincent ◽  
Hans C. Graber

AbstractOcean wave spectra are complex. Because of this complexity, no widely accepted method has been developed for the comparison between two sets of paired wave spectra. A method for intercomparing wave spectra is developed based on an example paradigm of the comparison of model spectra to observed spectra. Canonical correlation analysis (CCA) is used to investigate the correlation structure of the matrix of spectral correlations. The set of N ranked canonical correlations developed through CCA (here termed the r-sequence) is shown to be an effective method for understanding the degree of correlation between sets of paired spectral observation. A standard method for intercomparing sets of wave spectra based on CCA is then described. The method is elucidated through analyses of synthetic and real spectra that span a range of correlation from random to almost equal.


2011 ◽  
Vol 50 (No. 4) ◽  
pp. 163-168 ◽  
Author(s):  
Y. Akbaş ◽  
Ç. Takma

In this study, canonical correlation analysis was applied to layer data to estimate the relationships of egg production with age at sexual maturity, body weight and egg weight. For this purpose, it was designed to evaluate the relationship between two sets of variables of laying hens: egg numbers at three different periods as the first set of variables (Y) and age at sexual maturity, body weight, egg weight as the second set of variables (X) by using canonical correlation analysis. Estimated canonical correlations between the first and the second pair of canonical variates were significant (P < 0.01). Canonical weights and loadings from canonical correlation analysis indicated that age at sexual maturity had the largest contribution as compared with body weight and egg weight to variation of the number of egg productions at three different periods.  


Author(s):  
Charles Christian Adarkwah ◽  
Oliver Hirsch

Background: Burnout is known to have detrimental effects on healthcare staff with regard to both personal and occupational matters. The association between burnout symptoms and work satisfaction in endoscopy nursing staff in Germany has not been studied previously. We aimed to investigate the association between work satisfaction and risk of burnout in endoscopy nursing staff in Germany and to extract predictors for burnout in the area of work satisfaction, which can inform the design of future interventions. Setting: All members of the German Association of Endoscopy Staff in Germany (Deutsche Gesellschaft für Endoskopiefachberufe e.V.—DEGEA) were invited to take part in an online survey. Methods: The total sample consisted of 674 endoscopy staff members. Of those, 579 were female (85.9%) and 95 were male (14.1%). The mean age of the participants was 44.3 years (SD 10.6), with a median age of 46 years, a minimum age of 20, and a maximum age of 64 years. We used confirmatory factor analyses to examine the Maslach burnout inventory (MBI) and, a questionnaire for assessing general and facet-specific job satisfaction (KAFA), regarding their postulated internal structure in our special sample. Canonical correlations were performed to examine the association between work satisfaction and burnout in endoscopy staff members. Results: We were able to replicate the factorial structures of the MBI and the KAFA, both showing an acceptable model fit. The canonical correlation analysis resulted in three canonical functions, with canonical correlations of 0.64 (p < 0.001), 0.32 (p < 0.001), and 0.17 (p < 0.001). The first canonical function revealed that KAFA scales for colleagues, professional development, payment, supervisor, and general job satisfaction were good predictors for less exhaustion, less depersonalization and lack of empathy, and higher personal accomplishment. Commonality analysis revealed that general job satisfaction was the most significant factor in explaining the squared canonical correlation. The second canonical function showed that occupational function and colleagues were good predictors for exhaustion and personal accomplishment. Conclusions: Interventions aimed at ameliorating symptoms of burnout in endoscopy staff should be tailored to address specific needs as experienced by the employees. Therefore, the results of this study could contribute to the design of various interventions, which could be employed to address the issue of work satisfaction and burnout in endoscopy staff most effectively.


2013 ◽  
Vol 50 (2) ◽  
pp. 95-105 ◽  
Author(s):  
Mirosław Krzyśko ◽  
Łukasz Waszak

Summary Classical canonical correlation analysis seeks the associations between two data sets, i.e. it searches for linear combinations of the original variables having maximal correlation. Our task is to maximize this correlation, and is equivalent to solving a generalized eigenvalue problem. The maximal correlation coefficient (being a solution of this problem) is the first canonical correlation coefficient. In this paper we propose a new method of constructing canonical correlations and canonical variables for a pair of stochastic processes represented by a finite number of orthonormal basis functions.


2020 ◽  
Vol 13 (4) ◽  
pp. 1463
Author(s):  
Geber Barbosa De Albuquerque Moura ◽  
José Ivaldo Barbosa de Brito ◽  
Francisco de Assis Salviano de Sousa ◽  
Enilson Palmeira Cavalcanti ◽  
Jhon Lennon Bezerra da Silva ◽  
...  

O objetivo deste trabalho foi encontrar as melhores variáveis preditoras através de análise de correlação canônica nos ventos alísios, Temperatura da Superfície do Mar (TSM), Pressão atmosférica à superfície no Oceano Pacífico Equatorial e TSM no Atlântico Tropical (área do Dipolo), de forma que se possam elaborar modelos de previsão da precipitação pluvial (período chuvoso) do setor leste do Nordeste do Brasil para os quatro meses mais chuvosos dos três grupos homogêneos, com antecedência de três meses. Os grupos foram escolhidos a partir de análise de agrupamento utilizando o método hierárquico. Para estudar as correlações canônicas entre a precipitação dos grupos com os dados padronizados de TSM, vento e pressão atmosférica, as análises fundamentaram-se na série dos totais de precipitação de abril a julho e dados defasados de médias de três meses (média de Novembro a Janeiro) de TSM, vento em 850 hPa no Pacífico Equatorial e pressão da atmosfera em Tahiti e Darwin para o período de 1986 a 2017. Percebe-se que os principais preditores para os grupos homogêneos foram, por ordem de maior importância: Média de três meses de atraso do índice de ventos alísios Equatorial central (MedWC), Média da pressão atmosférica à superfície em Darwin (Mdarwin), Média do EN 34 (MEN34), Média da pressão atmosférica à superfície em Tahiti (Mtahiti) e Média de índice de ventos alísios leste (MedWE). Nota-se deste atraso que a principal influência está no Pacífico, no ENOS. Predictors identification for rain in the east sector of the Northeast Brazil using canonical correlation analysis A B S T R A C TThe objective of this work was to find the best predictor variables through canonical correlation analysis in trade winds, Sea Surface Temperature (SST), Atmospheric pressure at the surface in the Equatorial Pacific Ocean and SST in the Tropical Atlantic (Dipole area), that models for forecasting rainfall (rainy season) in the eastern sector of northeastern Brazil can be developed for the four rainiest months of the three homogeneous groups, three months in advance. The groups were chosen from the cluster analysis using the hierarchical method. To study the canonical correlations between the precipitation of the groups with the standardized data of SST, wind and atmospheric pressure, the analyzes were based on the series of precipitation totals from April to July and lagged data of three-month averages (average from November to July). January) of SST, wind at 850 hPa in the Equatorial Pacific and atmospheric pressure in Tahiti and Darwin for the period from 1986 to 2017. It can be seen that the main predictors for homogeneous groups were, in order of greatest importance: Average of three months delay of the central Equatorial trade winds index (MedWC), mean of the atmospheric pressure at the surface in Darwin (Mdarwin), mean of the EN 34 (MEN34), mean of the atmospheric pressure at the surface in Tahiti (Mtahiti) and mean of the east trade winds (MedWE). It is noted from this delay that the main influence is in the Pacific, in the ENSO.Keywords: wind, SST, precipitation.


Sign in / Sign up

Export Citation Format

Share Document