scholarly journals The gene expression data of Mycobacterium tuberculosis based on Affymetrix gene chips provide insight into regulatory and hypothetical genes

2007 ◽  
Vol 7 (1) ◽  
pp. 37 ◽  
Author(s):  
Li M Fu ◽  
Casey S Fu-Liu
2020 ◽  
Author(s):  
Cynthia Ma ◽  
Michael R. Brent

ABSTRACTBackgroundThe activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now.ResultsUsing a new dataset, we systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. These approaches require a TF network map, which specifies the target genes of each TF, as input. We evaluate different approaches to building the network map and deriving constraints on the matrices. We find that such constraints are essential for good performance. Constraints can be obtained from expression data in which the activities of individual TFs have been perturbed, and we find that such data are both necessary and sufficient for obtaining good performance. Remaining uncertainty about whether a TF activates or represses a target is a major source of error. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions. As a result, the control strength matrices derived here can be used for other applications. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of four yeast TFs: Gcr2, Gln3, Gcn4, and Msn2. Evaluation code and data available at https://github.com/BrentLab/TFA-evaluationConclusionsWhen a high-quality network map, constraints, and perturbation-response data are available, inferring TF activity levels by factoring gene expression matrices is effective. Furthermore, it provides insight into regulators of TF activity.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi148-vi148
Author(s):  
Sonali Arora ◽  
Nicholas Nuechterlein ◽  
Siobhan Pattwell ◽  
Eric Holland

Abstract Whole transcriptome sequencing (RNA-seq) is an important tool for understanding genetic mechanisms underlying human diseases and gaining a better insight into complex human diseases. Several ground-breaking projects have uniformly processed RNASeq data from publicly available studies to enable cross-comparison. One noteworthy study is the recount2 pipeline, which in 2017, has reprocessed ~70,0000 samples from Short Read Archive(SRA), The Cancer Genome Atlas (TCGA), and Genotype-Tissue Expression (GTEx). This vast dataset also includes gene expression data for GTEx-defined brain regions, neurological and psychiatric disorders (such as Parkinson's, Alzheimer’s, Huntington’s) and gliomas (such as TCGA, Chinese Glioma Genome Atlas (CGGA)). We apply uniform manifold approximation and projection (UMAP), a non-linear dimension reduction tool, to bulk gene expression data from brain-related diseases to build a BRAIN-UMAP, which allows for visualization of gene expression profiles across datasets. This UMAP shows that while gliomas form a distinct cluster, the neurological and psychiatric diseases are similar to GTEX-defined normal brain regions which exhibit tissue-specific profiles and patterns. Incorporating gliomas from various publicly available datasets also allows for the ability to observe unique clustering of particular subtypes, which can increase our genetic understanding of the disease. We also present a resource where researchers interested in mechanisms, can easily compare, and contrast the expression of a given gene and/or pathway of interest across various diseases, gliomas, and normal brain regions. Our current study, focusing on brain related diseases, offers insight into what may be possible for the broader neuroscientific community if we continually reprocess newly available brain related RNASeq samples using recount2. Additionally, if we build similar uniformly processing pipelines for other kinds of next-generation sequencing data, we would be able to use multi-omic sequencing data to find novel associations between biological entities and increase our mechanistic knowledge of the disease.


Sign in / Sign up

Export Citation Format

Share Document