Building Gene Networks by Analyzing Gene Expression Profiles

Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter we examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Li Teng ◽  
Laiwan Chan

SummaryTraditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.


2015 ◽  
Vol 11 (1) ◽  
pp. 86-96 ◽  
Author(s):  
Aakash Chavan Ravindranath ◽  
Nolen Perualila-Tan ◽  
Adetayo Kasim ◽  
Georgios Drakakis ◽  
Sonia Liggi ◽  
...  

Integrating gene expression profiles with certain proteins can improve our understanding of the fundamental mechanisms in protein–ligand binding.


Cells ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 675 ◽  
Author(s):  
Xia ◽  
Liu ◽  
Zhang ◽  
Guo

High-throughput technologies generate a tremendous amount of expression data on mRNA, miRNA and protein levels. Mining and visualizing the large amount of expression data requires sophisticated computational skills. An easy to use and user-friendly web-server for the visualization of gene expression profiles could greatly facilitate data exploration and hypothesis generation for biologists. Here, we curated and normalized the gene expression data on mRNA, miRNA and protein levels in 23315, 9009 and 9244 samples, respectively, from 40 tissues (The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GETx)) and 1594 cell lines (Cancer Cell Line Encyclopedia (CCLE) and MD Anderson Cell Lines Project (MCLP)). Then, we constructed the Gene Expression Display Server (GEDS), a web-based tool for quantification, comparison and visualization of gene expression data. GEDS integrates multiscale expression data and provides multiple types of figures and tables to satisfy several kinds of user requirements. The comprehensive expression profiles plotted in the one-stop GEDS platform greatly facilitate experimental biologists utilizing big data for better experimental design and analysis. GEDS is freely available on http://bioinfo.life.hust.edu.cn/web/GEDS/.


2005 ◽  
Vol 14 (05) ◽  
pp. 771-789 ◽  
Author(s):  
JIONG YANG ◽  
HAIXUN WANG ◽  
WEI WANG ◽  
PHILIP S. YU

Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church1 to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data2 shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.


Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 2779-2779 ◽  
Author(s):  
Andrea Pellagatti ◽  
Moritz Gerstung ◽  
Elli Papaemmanuil ◽  
Luca Malcovati ◽  
Aristoteles Giagounidis ◽  
...  

Abstract A particular profile of gene expression can reflect an underlying molecular abnormality in malignancy. Distinct gene expression profiles and deregulated gene pathways can be driven by specific gene mutations and may shed light on the biology of the disease and lead to the identification of new therapeutic targets. We selected 143 cases from our large-scale gene expression profiling (GEP) dataset on bone marrow CD34+ cells from patients with myelodysplastic syndromes (MDS), for which matching genotyping data were obtained using next-generation sequencing of a comprehensive list of 111 genes involved in myeloid malignancies (including the spliceosomal genes SF3B1, SRSF2, U2AF1 and ZRSR2, as well as TET2, ASXL1and many other). The GEP data were then correlated with the mutational status to identify significantly differentially expressed genes associated with each of the most common gene mutations found in MDS. The expression levels of the mutated genes analyzed were generally lower in patients carrying a mutation than in patients wild-type for that gene (e.g. SF3B1, ASXL1 and TP53), with the exception of RUNX1 for which patients carrying a mutation showed higher expression levels than patients without mutation. Principal components analysis showed that the main directions of gene expression changes (principal components) tend to coincide with some of the common gene mutations, including SF3B1, SRSF2 and TP53. SF3B1 and STAG2 were the mutated genes showing the highest number of associated significantly differentially expressed genes, including ABCB7 as differentially expressed in association with SF3B1 mutation and SULT2A1 in association with STAG2 mutation. We found distinct differentially expressed genes associated with the four most common splicing gene mutations (SF3B1, SRSF2, U2AF1 and ZRSR2) in MDS, suggesting that different phenotypes associated with these mutations may be driven by different effects on gene expression and that the target gene may be different. We have also evaluated the prognostic impact of the GEP data in comparison with that of the genotype data and importantly we have found a larger contribution of gene expression data in predicting progression free survival compared to mutation-based multivariate survival models. In summary, this analysis correlating gene expression data with genotype data has revealed that the mutational status shapes the gene expression landscape. We have identified deregulated genes associated with the most common gene mutations in MDS and found that the prognostic power of gene expression data is greater than the prognostic power provided by mutation data. AP and MG contributed equally to this work. JB and PJC are co-senior authors. Disclosures: No relevant conflicts of interest to declare.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia D. van Asten ◽  
Ji Won Oh ◽  
Arantza Farina-Sarasqueta ◽  
Joanne Verheij ◽  
...  

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


2021 ◽  
Author(s):  
Kangning Dong ◽  
Shihua Zhang

Recent advances in spatially resolved transcriptomics have enabled comprehensive measurements of gene expression patterns while retaining spatial context of tissue microenvironment. Deciphering the spatial context of spots in a tissue needs to use their spatial information carefully. To this end, we developed a graph attention auto- encoder framework STGATE to accurately identify spatial domains by learning low-dimensional latent embeddings via integrating spatial information and gene expression profiles. To better characterize the spatial similarity at the boundary of spatial domains, STGATE adopts an attention mechanism to adaptively learn the similarity of neighboring spots, and an optional cell type-aware module through integrating the pre-clustering of gene expressions. We validated STGATE on diverse spatial transcriptomics datasets generated by different platforms with different spatial resolutions. STGATE could substantially improve the identification accuracy of spatial domains, and denoise the data while preserving spatial expression patterns. Importantly, STGATE could be extended to multiple consecutive sections for reducing batch effects between sections and extracting 3D expression domains from the reconstructed 3D tissue effectively.


2020 ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia van Asten ◽  
Ji-won Oh ◽  
Arantza Fariña-Sarasqueta ◽  
Joanne Verheij ◽  
...  

Abstract High-resolution deconvolution of bulk gene expression profiles is pivotal to characterize the complex cellular make-up of tissues, such as tumor microenvironment. Single-cell RNA-seq provides reliable prior knowledge for deconvolution, however, a comprehensive statistical model is required for efficient utilization due to the inherently variable nature of gene expression. We introduce BLADE (Bayesian Log-normAl Deconvolution), a comprehensive probabilistic framework to estimate both cellular make-up and gene expression profiles of each cell type in each sample. Unlike previous comprehensive statistical approaches, BLADE can handle >20 cell types thanks to the efficient variational inference. Throughout an intensive evaluation using >700 datasets, BLADE showed enhanced robustness against gene expression variability and better completeness than conventional methods, in particular to reconstruct gene expression profiles of each cell type. All-in-all, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems based on standard bulk gene expression data.


2017 ◽  
Vol 16 ◽  
pp. 117693511772851 ◽  
Author(s):  
Baishali Bandyopadhyay ◽  
Veda Chanda ◽  
Yupeng Wang

Background: Constructing gene co-expression networks from cancer expression data is important for investigating the genetic mechanisms underlying cancer. However, correlation coefficients or linear regression models are not able to model sophisticated relationships among gene expression profiles. Here, we address the 3-way interaction that 2 genes’ expression levels are clustered in different space locations under the control of a third gene’s expression levels. Results: We present xSyn, a software tool for identifying such 3-way interactions from cancer gene expression data based on an optimization procedure involving the usage of UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and synergy. The effectiveness is demonstrated by application to 2 real gene expression data sets. Conclusions: xSyn is a useful tool for decoding the complex relationships among gene expression profiles. xSyn is available at http://www.bdxconsult.com/xSyn.html .


Sign in / Sign up

Export Citation Format

Share Document