Linear combination test for gene set analysis of a continuous phenotype

Gene-set analysis (GSA) aims to identify sets of differentially expressed genes by a phenotype in DNA microarray studies. Challenges occur due to the salient characteristics of the data: (1) the number of genes is far larger than the number of observations; (2) gene expression measurements, especially within each gene set, can be highly correlated; and (3) the number of gene sets that can be examined is large and increasing rapidly. These challenges call for gene-set testing procedures that have both efficiency in computation for large GSAs and high power in the presence of the high correlation.We propose a new GSA approach called Linear Combination Test (LCT), incorporating the covariance matrix estimator of gene expression into the test statistic. The proposed LCT and two other GSA methods, a mod-ification of Hotelling’s T2 using a shrinkage covariance matrix and our SAM-GS (Dinu et. al. 2007), the two methods that have been reported by Tsai and Chen (2009) to perform best in terms of power, are evaluated in simulation studies and a real microarray study. The LCT method is more computationally efficient than the modified Hotelling’s T2 and approximates the superb power of the modified Hotelling’s T2. LCT is slightly faster than SAM-GS, but more powerful, due to incorporating the covariance matrix estimator. An extra step to enhance the interpretation of GSA results is also proposed in the form of a hierarchical LC (HLC) testing procedure, providing scientists useful hierarchical information on gene sets that LCT identified as differentially expressed.Availability: A free R-code to perform LCT-GSA and HLC test is available at http://www.ualberta.ca/~yyasui/homepage.html.

Download Full-text

Gene set analysis and reduction for a continuous phenotype: Identifying markers of birth weight variation based on embryonic stem cells and immunologic signatures

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2019.103389 ◽

2019 ◽

Vol 113 ◽

pp. 103389

Author(s):

Shabnam Vatanpour ◽

Saumyadipta Pyne ◽

Ana Paula Leite ◽

Irina Dinu

Keyword(s):

Stem Cells ◽

Birth Weight ◽

Embryonic Stem Cells ◽

Embryonic Stem ◽

Gene Set Analysis ◽

Gene Set ◽

Weight Variation ◽

Continuous Phenotype

Download Full-text

A linear combination test for detecting serial correlation in multivariate samples

Institute of Mathematical Statistics Lecture Notes - Monograph Series - Topics in statistical dependence ◽

10.1214/lnms/1215457569 ◽

1990 ◽

pp. 299-313 ◽

Cited By ~ 2

Author(s):

Richard A. Johnson ◽

T. Langeland

Keyword(s):

Linear Combination ◽

Serial Correlation ◽

Combination Test ◽

Linear Combination Test

Download Full-text

Longitudinal linear combination test for gene set analysis

BMC Bioinformatics ◽

10.1186/s12859-019-3221-7 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Elham Khodayari Moez ◽

Morteza Hajihosseini ◽

Jeffrey L. Andrews ◽

Irina Dinu

Keyword(s):

Linear Combination ◽

Pathway Analysis ◽

Time Course ◽

Expression Patterns ◽

Gene Sets ◽

Wide Range ◽

Combination Test ◽

Microarray Studies ◽

Study Designs ◽

Linear Combination Test

Abstract Background Although microarray studies have greatly contributed to recent genetic advances, lack of replication has been a continuing concern in this area. Complex study designs have the potential to address this concern, though they remain undervalued by investigators due to the lack of proper analysis methods. The primary challenge in the analysis of complex microarray study data is handling the correlation structure within data while also dealing with the combination of large number of genetic measurements and small number of subjects that are ubiquitous even in standard microarray studies. Motivated by the lack of available methods for analysis of repeatedly measured phenotypic or transcriptomic data, herein we develop a longitudinal linear combination test (LLCT). Results LLCT is a two-step method to analyze multiple longitudinal phenotypes when there is high dimensionality in response and/or explanatory variables. Alternating between calculating within-subjects and between-subjects variations in two steps, LLCT examines if the maximum possible correlation between a linear combination of the time trends and a linear combination of the predictors given by the gene expressions is statistically significant. A generalization of this method can handle family-based study designs when the subjects are not independent. This method is also applicable to time-course microarray, with the ability to identify gene sets that exhibit significantly different expression patterns over time. Based on the results from a simulation study, LLCT outperformed its alternative: pathway analysis via regression. LLCT was shown to be very powerful in the analysis of large gene sets even when the sample size is small. Conclusions This self-contained pathway analysis method is applicable to a wide range of longitudinal genomics, proteomics, metabolomics (OMICS) data, allows adjusting for potentially time-dependent covariates and works well with unbalanced and incomplete data. An important potential application of this method could be time-course linkage of OMICS, an attractive possibility for future genetic researchers. Availability: R package of LLCT is available at: https://github.com/its-likeli-jeff/LLCT

Download Full-text

A Linear Combination Test for Detecting Serial Correlation in Multivariate Samples

10.21236/ada158179 ◽

1985 ◽

Author(s):

Richard A. Johnson ◽

Thore Langeland

Keyword(s):

Linear Combination ◽

Serial Correlation ◽

Combination Test ◽

Linear Combination Test

Download Full-text

Clique-Based Clustering of Correlated SNPs in a Gene Can Improve Performance of Gene-Based Multi-Bin Linear Combination Test

BioMed Research International ◽

10.1155/2015/852341 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Yun Joo Yoo ◽

Sun Ah Kim ◽

Shelley B. Bull

Keyword(s):

Linear Combination ◽

Degrees Of Freedom ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Snp Analysis ◽

Nucleotide Polymorphisms ◽

Test Statistic ◽

Global Test ◽

Combination Test ◽

Linear Combination Test

Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster. The test has degrees of freedom equal to the number of clusters and can be more powerful than a fully quadratic or fully linear test statistic. In this study, we develop a new SNP clustering algorithm designed to find cliques, which are complete subnetworks of SNPs with all pairwise correlations above a threshold. We evaluate the performance of the MLC test using the clique-based CLQ algorithm versus using the tag-SNP-based LDSelect algorithm. In our numerical power calculations we observed that the two clustering algorithms produce identical clusters about 40~60% of the time, yielding similar power on average. However, because the CLQ algorithm tends to produce smaller clusters with stronger positive correlation, the MLC test is less likely to be affected by the occurrence of opposing signs in the individual SNP effect coefficients.

Download Full-text