Integrative Genomics of the Mammalian Alveolar Macrophage Response to Intracellular Mycobacteria

Abstract BackgroundBovine TB (BTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production, particularly in many developing countries. The key innate immune that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection.ResultsThe host gene expression data consisted of bovine RNA-seq data from alveolar macrophages infected with M. bovis at 24 and 48 hours post-infection. These RNA-seq data were analysed using three distinct analysis pipelines and novel response pathways and modules were further refined using cross-comparison and integration of the results. First, a differential expression analysis was carried out to determine the most significantly differentially expressed (DE) genes between conditions at each time point. Second, two networks were constructed at each time point using gene correlation patterns to determine changes in expression across conditions. Functional sub-modules within each correlation network were selected by statistical criteria for modularity. Third, a base gene interaction network of the mammalian host response to mycobacterial infection was generated using the GeneCards database and InnateDB. Differential gene expression data were superimposed on this base network to extract functional modules of interconnected DE genes.ConclusionsBovine GWAS data was obtained from a published BTB susceptibility/resistance study. The results from the three parallel analyses were integrated with this data to determine which of the three approaches identified genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets.

Download Full-text

Integrative genomics of the mammalian alveolar macrophage response to intracellular mycobacteria

BMC Genomics ◽

10.1186/s12864-021-07643-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Thomas J. Hall ◽

Michael P. Mullen ◽

Gillian P. McHugo ◽

Kate E. Killick ◽

Siobhán C. Ring ◽

...

Keyword(s):

Data Integration ◽

Alveolar Macrophage ◽

Differential Expression ◽

Genome Wide Association Study ◽

Immune Cell ◽

Gwas Data ◽

Data Sets ◽

Rna Seq ◽

Gene Sets ◽

Holstein Friesian

Abstract Background Bovine TB (bTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production. The key innate immune cell that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection. Results The host gene expression data consisted of RNA-seq data from bovine alveolar macrophages (bAM) infected with M. bovis at 24 and 48 h post-infection (hpi) compared to non-infected control bAM. These RNA-seq data were analysed using three distinct computational pipelines to produce six separate gene sets: 1) DE genes filtered using stringent fold-change and P-value thresholds (DEG-24: 378 genes, DEG-48: 390 genes); 2) genes obtained from expression correlation networks (CON-24: 460 genes, CON-48: 416 genes); and 3) genes obtained from differential expression networks (DEN-24: 339 genes, DEN-48: 495 genes). These six gene sets were integrated with three bTB breed GWAS data sets by employing a new genomics data integration tool—gwinteR. Using GWAS summary statistics, this methodology enabled detection of 36, 102 and 921 prioritised SNPs for Charolais, Limousin and Holstein-Friesian, respectively. Conclusions The results from the three parallel analyses showed that the three computational approaches could identify genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets. These analyses also demonstrated significant differences among breeds, with the Holstein-Friesian breed GWAS proving most useful for prioritising SNPS through data integration. Because the functional genomics data were generated using bAM from this population, this suggests that the genomic architecture of bTB resilience traits may be more breed-specific than previously assumed.

Download Full-text

Comparing RNA-Seq and microarray gene expression data in two zones of the Arabidopsis root apex relevant to spaceflight

Applications in Plant Sciences ◽

10.1002/aps3.1197 ◽

2018 ◽

Vol 6 (11) ◽

pp. e01197 ◽

Cited By ~ 3

Author(s):

Aparna Krishnamurthy ◽

Robert J. Ferl ◽

Anna-Lisa Paul

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Root Apex ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Arabidopsis Root ◽

Microarray Gene

Download Full-text

Analyzing Large Gene Expression Data Sets

Computational Text Analysis ◽

10.1093/oso/9780198567400.003.0014 ◽

2006 ◽

Author(s):

Soumya Raychaudhuri

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Data Sets ◽

Expression Data ◽

Clustering Methods ◽

Biologically Relevant ◽

Large Gene ◽

Functional Coherence

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.

Download Full-text

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis

PLoS Computational Biology ◽

10.1371/journal.pcbi.1006792 ◽

2019 ◽

Vol 15 (2) ◽

pp. e1006792 ◽

Cited By ~ 11

Author(s):

Brandon Monier ◽

Adam McDermaid ◽

Cankun Wang ◽

Jing Zhao ◽

Allison Miller ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Interpretation System

Download Full-text

ExAtlas: An interactive online tool for meta-analysis of gene expression data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015500195 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1550019 ◽

Cited By ~ 37

Author(s):

Alexei A. Sharov ◽

David Schlessinger ◽

Minoru S. H. Ko

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Fixed Effects ◽

Expression Profiles ◽

Meta Analysis ◽

Data Sets ◽

Expression Data ◽

Gene Set ◽

Public Data

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.

Download Full-text

Abstract P3-04-10: Comparison between RNA-Seq and Affymetrix gene expression data

10.1158/0008-5472.sabcs12-p3-04-10 ◽

2012 ◽

Cited By ~ 1

Author(s):

D Fumagalli ◽

B Haibe-Kains ◽

S Michiels ◽

DN Brown ◽

D Gacquer ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Affymetrix Gene Expression

Download Full-text

Abstract 328: An integrated analysis of three distinct IBC/non-IBC Affymetrix gene expression data sets to study the transcriptional heterogeneity both between IBC and non-IBC and within IBC

10.1158/1538-7445.am2011-328 ◽

2011 ◽

Author(s):

Steven J. Van Laere ◽

Naoto Ueno ◽

Pascal Finetti ◽

Peter B. Vermeulen ◽

Anthony Lucci ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Integrated Analysis ◽

Data Sets ◽

Expression Data ◽

Affymetrix Gene Expression

Download Full-text

CLASSIFYING TEMPORAL MICROARRAY DATA BY SELECTING INFORMATIVE GENES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013410060 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1341006

Author(s):

QIANG LOU ◽

ZORAN OBRADOVIC

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Data ◽

Data Sets ◽

Temporal Data ◽

Expression Data ◽

Selection Methods ◽

Temporal Gene Expression ◽

Single Matrix

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.

Download Full-text

INTERRELATED TWO-WAY CLUSTERING AND ITS APPLICATION ON GENE EXPRESSION DATA

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002272 ◽

2005 ◽

Vol 14 (04) ◽

pp. 577-597 ◽

Cited By ~ 6

Author(s):

CHUN TANG ◽

AIDONG ZHANG

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Domain Knowledge ◽

Gene Clusters ◽

Data Sets ◽

Messenger Rnas ◽

Expression Data ◽

Large Numbers ◽

Clustering Approach ◽

Mrna Expression Profiling

Microarray technologies are capable of simultaneously measuring the signals for thousands of messenger RNAs and large numbers of proteins from single samples. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Most research has focused on the interpretation of the meaning of the microarray data which are transformed into gene expression matrices where usually the rows represent genes, the columns represent various samples. Clustering samples can be done by analyzing and eliminating of irrelevant genes. However, majority methods are supervised (or assisted by domain knowledge), less attention has been paid on unsupervised approaches which are important when little domain knowledge is available. In this paper, we present a new framework for unsupervised analysis of gene expression data, which applies an interrelated two-way clustering approach on the gene expression matrices. The goal of clustering is to identify important genes and perform cluster discovery on samples. The advantage of this approach is that we can dynamically manipulate the relationship between the gene clusters and sample groups while conducting an iterative clustering through both of them. The performance of the proposed method with various gene expression data sets is also illustrated.

Download Full-text

Defining Immune Response Signatures in DLBCL As Potential Predictive Biomarkers for Outcome to Immunotherapy

Blood ◽

10.1182/blood.v126.23.2663.2663 ◽

2015 ◽

Vol 126 (23) ◽

pp. 2663-2663

Author(s):

Matthew A Care ◽

Stephen M Thirdborough ◽

Andrew J Davies ◽

Peter W.M. Johnson ◽

Andrew Jack ◽

...

Keyword(s):

Gene Expression ◽

Immune Response ◽

Network Analysis ◽

Gene Expression Data ◽

Research Funding ◽

Data Sets ◽

Expression Data ◽

Data Set ◽

Gene Correlation ◽

Cancer Types

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.

Download Full-text