scholarly journals Longitudinal linear combination test for gene set analysis

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Elham Khodayari Moez ◽  
Morteza Hajihosseini ◽  
Jeffrey L. Andrews ◽  
Irina Dinu

Abstract Background Although microarray studies have greatly contributed to recent genetic advances, lack of replication has been a continuing concern in this area. Complex study designs have the potential to address this concern, though they remain undervalued by investigators due to the lack of proper analysis methods. The primary challenge in the analysis of complex microarray study data is handling the correlation structure within data while also dealing with the combination of large number of genetic measurements and small number of subjects that are ubiquitous even in standard microarray studies. Motivated by the lack of available methods for analysis of repeatedly measured phenotypic or transcriptomic data, herein we develop a longitudinal linear combination test (LLCT). Results LLCT is a two-step method to analyze multiple longitudinal phenotypes when there is high dimensionality in response and/or explanatory variables. Alternating between calculating within-subjects and between-subjects variations in two steps, LLCT examines if the maximum possible correlation between a linear combination of the time trends and a linear combination of the predictors given by the gene expressions is statistically significant. A generalization of this method can handle family-based study designs when the subjects are not independent. This method is also applicable to time-course microarray, with the ability to identify gene sets that exhibit significantly different expression patterns over time. Based on the results from a simulation study, LLCT outperformed its alternative: pathway analysis via regression. LLCT was shown to be very powerful in the analysis of large gene sets even when the sample size is small. Conclusions This self-contained pathway analysis method is applicable to a wide range of longitudinal genomics, proteomics, metabolomics (OMICS) data, allows adjusting for potentially time-dependent covariates and works well with unbalanced and incomplete data. An important potential application of this method could be time-course linkage of OMICS, an attractive possibility for future genetic researchers. Availability: R package of LLCT is available at: https://github.com/its-likeli-jeff/LLCT

Author(s):  
Xiaoming Wang ◽  
Irina Dinu ◽  
Wei Liu ◽  
Yutaka Yasui

Gene-set analysis (GSA) aims to identify sets of differentially expressed genes by a phenotype in DNA microarray studies. Challenges occur due to the salient characteristics of the data: (1) the number of genes is far larger than the number of observations; (2) gene expression measurements, especially within each gene set, can be highly correlated; and (3) the number of gene sets that can be examined is large and increasing rapidly. These challenges call for gene-set testing procedures that have both efficiency in computation for large GSAs and high power in the presence of the high correlation.We propose a new GSA approach called Linear Combination Test (LCT), incorporating the covariance matrix estimator of gene expression into the test statistic. The proposed LCT and two other GSA methods, a mod-ification of Hotelling’s T2 using a shrinkage covariance matrix and our SAM-GS (Dinu et. al. 2007), the two methods that have been reported by Tsai and Chen (2009) to perform best in terms of power, are evaluated in simulation studies and a real microarray study. The LCT method is more computationally efficient than the modified Hotelling’s T2 and approximates the superb power of the modified Hotelling’s T2. LCT is slightly faster than SAM-GS, but more powerful, due to incorporating the covariance matrix estimator. An extra step to enhance the interpretation of GSA results is also proposed in the form of a hierarchical LC (HLC) testing procedure, providing scientists useful hierarchical information on gene sets that LCT identified as differentially expressed.Availability: A free R-code to perform LCT-GSA and HLC test is available at http://www.ualberta.ca/~yyasui/homepage.html.


mBio ◽  
2020 ◽  
Vol 11 (4) ◽  
Author(s):  
José Luis López ◽  
Mauricio Javier Lozano ◽  
María Laura Fabre ◽  
Antonio Lagares

ABSTRACT Prokaryote genomes exhibit a wide range of GC contents and codon usages, both resulting from an interaction between mutational bias and natural selection. In order to investigate the basis underlying specific codon changes, we performed a comprehensive analysis of 29 different prokaryote families. The analysis of core gene sets with increasing ancestries in each family lineage revealed that the codon usages became progressively more adapted to the tRNA pools. While, as previously reported, highly expressed genes presented the most optimized codon usage, the singletons contained the less selectively favored codons. The results showed that usually codons with the highest translational adaptation were preferentially enriched. In agreement with previous reports, a C bias in 2- to 3-fold pyrimidine-ending codons, and a U bias in 4-fold codons occurred in all families, irrespective of the global genomic GC content. Furthermore, the U biases suggested that U3-mRNA–U34-tRNA interactions were responsible for a prominent codon optimization in both the most ancestral core and the highly expressed genes. A comparative analysis of sequences that encode conserved (cr) or variable (vr) translated products, with each one being under high (HEP) and low (LEP) expression levels, demonstrated that the efficiency was more relevant (by a factor of 2) than accuracy to modeling codon usage. Finally, analysis of the third position of codons (GC3) revealed that in genomes with global GC contents higher than 35 to 40%, selection favored a GC3 increase, whereas in genomes with very low GC contents, a decrease in GC3 occurred. A comprehensive final model is presented in which all patterns of codon usage variations are condensed in four distinct behavioral groups. IMPORTANCE The prokaryotic genomes—the current heritage of the most ancient life forms on earth—are comprised of diverse gene sets, all characterized by varied origins, ancestries, and spatial-temporal expression patterns. Such genetic diversity has for a long time raised the question of how cells shape their coding strategies to optimize protein demands (i.e., product abundance) and accuracy (i.e., translation fidelity) through the use of the same genetic code in genomes with GC contents that range from less than 20 to more than 80%. Here, we present evidence on how codon usage is adjusted in the prokaryotic tree of life and on how specific biases have operated to improve translation. Through the use of proteome data, we characterized conserved and variable sequence domains in genes of either high or low expression level and quantitated the relative weight of efficiency and accuracy—as well as their interaction—in shaping codon usage in prokaryotes.


2009 ◽  
Vol 07 (04) ◽  
pp. 645-661 ◽  
Author(s):  
XIN CHEN

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.


2021 ◽  
Author(s):  
Ingrid Jakobsen ◽  
Max Sundkvist ◽  
Niclas Björn ◽  
Henrik Gréen ◽  
Kourosh Lotfi

Abstract Background: Elucidation of the genetic mechanisms underlying treatment response to standard induction chemotherapy in AML patients is warranted, in order to aid in risk-adapted treatment decisions as novel treatments are emerging. In this pilot study, we explored the treatment-induced expression patterns in a small cohort of AML patients by analyzing differential gene expression (DGE) over the first two days of induction chemotherapy.Methods: Blood samples were collected from ten AML patients at baseline (before treatment initiation) and during the first two days of treatment (Day 1; approximately 24 h, and Day 2; approximately 48 h after treatment initiation, respectively) and RNA was extracted for subsequent RNA sequencing. DGE between time points were assessed by pairwise analysis using the R package edgeR version 3.18.1 in all patients as well as in relation to treatment response (complete remission, CR, vs non-complete remission, nCR). Ingenuity Pathway Analysis (Qiagen) software was used for pathway analysis and visualization.Results: After initial data quality control, two patients was excluded from further analysis, resulting in a final cohort of eight patients with data from all three timepoints. DGE analysis demonstrated activation of pathways with genes directly or indirectly associated with NF-κB signaling. Significant activation of the NF-κB pathway was seen in 50% of the patients two days after treatment start, while iNOS pathway effects could be identified already after one day. nCR patients displayed activation of pathways associated with cell cycle progression, oncogenesis and anti-apoptotic behavior, including the STAT3 pathway and Salvage pathways of pyrimidine ribonucleotides. Notably, a significant induction of cytidine deaminase, an enzyme responsible for the deamination of Ara-C, could be observed between baseline and Day 2 in the nCR patients but not in patients achieving CR.Conclusions: In conclusion, we show that time-course analysis of gene expression represents a feasible approach to identify relevant pathways affected by standard induction chemotherapy in AML patients. This poses as a potential method for elucidating new drug targets and biomarkers for categorizing disease aggressiveness and evaluating treatment response. However, more studies on larger cohorts are warranted to elucidate the transcriptional basis for drug response.


2013 ◽  
Vol 14 (1) ◽  
Author(s):  
Irina Dinu ◽  
Xiaoming Wang ◽  
Linda E Kelemen ◽  
Shabnam Vatanpour ◽  
Saumyadipta Pyne

2020 ◽  
Author(s):  
José Luis López ◽  
Mauricio Javier Lozano ◽  
María Laura Fabre ◽  
Antonio Lagares

ABSTRACTProkaryote genomes exhibit a wide range of GC contents and codon usages, both resulting from an interaction between mutational bias and natural selection. In order to investigate the basis underlying specific codon changes, we performed a comprehensive analysis of 29-different prokaryote families. The analysis of core-gene sets with increasing ancestries in each family lineage revealed that the codon usages became progressively more adapted to the tRNA pools. While, as previously reported, highly-expressed genes presented the more optimized codon usage, the singletons contained the less selectively-favored codons. Results showed that usually codons with the highest translational adaptation were preferentially enriched. In agreement with previous reports, a C-bias in 2- to 3-fold codons, and a U-bias in 4-fold codons occurred in all families, irrespective of the global genomic-GC content. Furthermore, the U-biases suggested that U3-mRNA–U34-tRNA interactions were responsible for a prominent codon optimization in both the more ancestral core and the highly expressed genes. A comparative analysis of sequences that encode conserved-(cr) or variable-(vr) translated products, with each one being under high- (HEP) and low- (LEP) expression levels, demonstrated that the efficiency was more relevant (by a factor of 2) than accuracy to modelling codon usage. Finally, analysis of the third position of codons (GC3) revealed that, in genomes of global-GC contents higher than 35-40%, selection favored a GC3 increase; whereas in genomes with very low-GC contents, a decrease in GC3 occurred. A comprehensive final model is presented where all patterns of codon usage variations are condensed in five-distinct behavioral groups.IMPORTANCEThe prokaryotic genomes—the current heritage of the more ancient life forms on earth— are comprised of diverse gene sets; all characterized by varied origins, ancestries, and spatial-temporal–expression patterns. Such genetic diversity has for a long time raised the question of how cells shape their coding strategies to optimize protein demands (i.e., product abundance) and accuracy (i.e., translation fidelity) through the use of the same genetic code in genomes with GC-contents that range from less than 20 to over 80%. In this work, we present evidence on how codon usage is adjusted in the prokaryote tree of life, and on how specific biases have operated to improve translation. Through the use of proteome data, we characterized conserved and variable sequence domains in genes of either high- or low-expression level, and quantitated the relative weight of efficiency and accuracy—as well as their interaction—in shaping codon usage in prokaryotes.


2021 ◽  
Author(s):  
Ingrid Jakobsen ◽  
Max Sundkvist ◽  
Niclas Björn ◽  
Henrik Gréen ◽  
Kourosh Lotfi

Abstract Background: Elucidation of the genetic mechanisms underlying treatment response to standard induction chemotherapy in AML patients is warranted, in order to aid in risk-adapted treatment decisions as novel treatments are emerging. In this pilot study, we explored the treatment-induced expression patterns in a small cohort of AML patients by analyzing differential gene expression (DGE) over the first two days of induction chemotherapy.Methods: Blood samples were collected from ten AML patients at baseline (before treatment initiation) and during the first two days of treatment (Day 1; approximately 24 h, and Day 2; approximately 48 h after treatment initiation, respectively) and RNA was extracted for subsequent RNA sequencing. DGE between time points were assessed by pairwise analysis using the R package edgeR version 3.18.1 in all patients as well as in relation to treatment response (complete remission, CR, vs non-complete remission, nCR). Ingenuity Pathway Analysis (Qiagen) software was used for pathway analysis and visualization.Results: After initial data quality control, two patients was excluded from further analysis, resulting in a final cohort of eight patients with data from all three timepoints. DGE analysis demonstrated activation of pathways with genes directly or indirectly associated with NF-κB signaling. Significant activation of the NF-κB pathway was seen in 50% of the patients two days after treatment start, while iNOS pathway effects could be identified already after one day. nCR patients displayed activation of pathways associated with cell cycle progression, oncogenesis and anti-apoptotic behavior, including the STAT3 pathway and Salvage pathways of pyrimidine ribonucleotides. Notably, a significant induction of cytidine deaminase, an enzyme responsible for the deamination of Ara-C, could be observed between baseline and Day 2 in the nCR patients but not in patients achieving CR.Conclusions: In conclusion, we show that time-course analysis of gene expression represents a feasible approach to identify relevant pathways affected by standard induction chemotherapy in AML patients. This poses as a potential method for elucidating new drug targets and biomarkers for categorizing disease aggressiveness and evaluating treatment response. However, more studies on larger cohorts are warranted to elucidate the transcriptional basis for drug response.


Sign in / Sign up

Export Citation Format

Share Document