scholarly journals Integrative genomics of the mammalian alveolar macrophage response to intracellular mycobacteria

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Thomas J. Hall ◽  
Michael P. Mullen ◽  
Gillian P. McHugo ◽  
Kate E. Killick ◽  
Siobhán C. Ring ◽  
...  

Abstract Background Bovine TB (bTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production. The key innate immune cell that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection. Results The host gene expression data consisted of RNA-seq data from bovine alveolar macrophages (bAM) infected with M. bovis at 24 and 48 h post-infection (hpi) compared to non-infected control bAM. These RNA-seq data were analysed using three distinct computational pipelines to produce six separate gene sets: 1) DE genes filtered using stringent fold-change and P-value thresholds (DEG-24: 378 genes, DEG-48: 390 genes); 2) genes obtained from expression correlation networks (CON-24: 460 genes, CON-48: 416 genes); and 3) genes obtained from differential expression networks (DEN-24: 339 genes, DEN-48: 495 genes). These six gene sets were integrated with three bTB breed GWAS data sets by employing a new genomics data integration tool—gwinteR. Using GWAS summary statistics, this methodology enabled detection of 36, 102 and 921 prioritised SNPs for Charolais, Limousin and Holstein-Friesian, respectively. Conclusions The results from the three parallel analyses showed that the three computational approaches could identify genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets. These analyses also demonstrated significant differences among breeds, with the Holstein-Friesian breed GWAS proving most useful for prioritising SNPS through data integration. Because the functional genomics data were generated using bAM from this population, this suggests that the genomic architecture of bTB resilience traits may be more breed-specific than previously assumed.

2020 ◽  
Author(s):  
Thomas J. Hall ◽  
Michael P. Mullen ◽  
Gillian P. McHugo ◽  
Kate E. Killick ◽  
Siobhán C. Ring ◽  
...  

Abstract BackgroundBovine TB (BTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production, particularly in many developing countries. The key innate immune that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection.ResultsThe host gene expression data consisted of bovine RNA-seq data from alveolar macrophages infected with M. bovis at 24 and 48 hours post-infection. These RNA-seq data were analysed using three distinct analysis pipelines and novel response pathways and modules were further refined using cross-comparison and integration of the results. First, a differential expression analysis was carried out to determine the most significantly differentially expressed (DE) genes between conditions at each time point. Second, two networks were constructed at each time point using gene correlation patterns to determine changes in expression across conditions. Functional sub-modules within each correlation network were selected by statistical criteria for modularity. Third, a base gene interaction network of the mammalian host response to mycobacterial infection was generated using the GeneCards database and InnateDB. Differential gene expression data were superimposed on this base network to extract functional modules of interconnected DE genes.ConclusionsBovine GWAS data was obtained from a published BTB susceptibility/resistance study. The results from the three parallel analyses were integrated with this data to determine which of the three approaches identified genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets.


2017 ◽  
Author(s):  
Charlotte Soneson ◽  
Mark D. Robinson

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.


2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 784
Author(s):  
Daniel E. Carlin ◽  
Kassi Kosnicki ◽  
Sara Garamszegi ◽  
Trey Ideker ◽  
Helga Thorvaldsdóttir ◽  
...  

One commonly performed bioinformatics task is to infer functional regulation of transcription factors by observing differential expression under a knockout, and integrating DNA binding information of that transcription factor.   However, until now, this task has required dedicated bioinformatics support to perform the necessary data integration. GenomeSpace provides a protocol, or “recipe”, and a user interface with inter-operating software tools to identify protein occupancies along the genome from a ChIP-seq experiment and associated differentially regulated genes from a RNA-Seq experiment. By integrating RNA-Seq and ChIP-seq analyses, a user is easily able to associate differing expression phenotypes with changing epigenetic landscapes.


2020 ◽  
Author(s):  
John Stegmayr ◽  
Hani N. Alsafadi ◽  
Wojciech Langwiński ◽  
Anna Niroomand ◽  
Sandra Lindstedt ◽  
...  

AbstractPrecision-cut lung slices (PCLS) have gained increasing interest as a model to study lung biology and disease, as well as for screening novel therapeutics. In particular, PCLS derived from human tissue can better recapitulate some aspects of lung biology and disease as compared to PCLS derived from animals (e.g. clinical heterogeneity), but access to human tissue is limited. A number of different experimental readouts have been established for use with PCLS, but obtaining high yield and quality RNA for downstream gene expression analysis has remained challenging. This is particularly problematic for utilizing the power of next-generation sequencing techniques, such as RNA-sequencing (RNA-seq), for non-biased and high through-put analysis of PCLS human cohorts. In the current study, we present a novel approach for isolating high quality RNA from a small amount of tissue, including diseased human tissue, such as idiopathic pulmonary fibrosis (IPF). We show that the RNA isolated using this method is of sufficient quality for both RT-qPCR and RNA-seq analysis. Furthermore, the RNA-seq data from human PCLS was comparable to data generated from native tissue and could be used in several established computational pipelines, including deconvolution of bulk RNA-seq data using publicly available single-cell RNA-seq data sets. Deconvolution using Bisque revealed a diversity of cell populations in human PCLS derived from distal lung tissue, including several immune cell populations, which correlated with cell populations known to be present and aberrant in human disease, such as IPF.


2021 ◽  
Vol 12 ◽  
Author(s):  
Triinu Peters ◽  
Jochen Antel ◽  
Roaa Naaresh ◽  
Björn-Hergen Laabs ◽  
Manuel Föcker ◽  
...  

Genetic correlations suggest a coexisting genetic predisposition to both low leptin levels and risk for anorexia nervosa (AN). To investigate the causality and direction of these associations, we performed bidirectional two-sample Mendelian randomization (MR) analyses using data of the most recent genome-wide association study (GWAS) for AN and both a GWAS and an exome-wide-association-study (EWAS) for leptin levels. Most MR methods with genetic instruments from GWAS showed a causal effect of lower leptin levels on higher risk of AN (e.g. IVW b = −0.923, p = 1.5 × 10−4). Because most patients with AN are female, we additionally performed analyses using leptin GWAS data of females only. Again, there was a significant effect of leptin levels on the risk of AN (e.g. IVW b = −0.826, p = 1.1 × 10−04). MR with genetic instruments from EWAS showed no overall effect of leptin levels on the risk for AN. For the opposite direction, MR revealed no causal effect of AN on leptin levels. If our results are confirmed in extended GWAS data sets, a low endogenous leptin synthesis represents a risk factor for developing AN.


2019 ◽  
Author(s):  
Megan E. Chan ◽  
Pranav S. Bhamidipati ◽  
Heather J. Goldsby ◽  
Arend Hintze ◽  
Hans A. Hofmann ◽  
...  

AbstractDespite life’s diversity, studies of variation across animals often remind us of our shared evolutionary past. Abundant genome sequencing over the last ~25 years reveals remarkable conservation of genes and recent analyses of gene regulatory networks illustrate that not only genes but entire pathways are conserved, reused, and elaborated in the evolution of diversity. Predating these discoveries, 19th-century embryologists observed that though morphology at birth varies tremendously, certain stages of embryogenesis appear remarkably similar across vertebrates. Specifically, while early and late stages are variable across species, anatomy of mid-stages embryos (the ‘phylotypic’ stage) is conserved. This model of vertebrate development and diversification has found mixed support in recent analyses comparing gene expression across species possibly owing to differences across studies in species, embryonic stages, and gene sets compared. Here we perform a comparative analysis using 186 microarray and RNA-seq expression data sets covering embryogenesis in six vertebrate species spanning ~420 million years of evolution. We use an unbiased clustering approach to group stages of embryogenesis by transcriptomic similarity and ask whether gene expression similarity of clustered embryonic stages deviates from the null hypothesis of no relationship between timing and diversification. We use a phylogenetic comparative approach to characterize expression conservation pattern (i.e., early conservation, hourglass, inverse hourglass, late conservation, or no relationship) of each gene at each evolutionary node. Across vertebrates, we find an enrichment of genes exhibiting early conservation, hourglass, late conservation patterns and a large depletion of gene exhibiting no distinguishable pattern of conservation in both microarray and RNA-seq data sets. Enrichment of genes showing patterned conservation through embryogenesis indicates diversification of embryogenesis may be temporally constrained. However, the circumstances (e.g., gene groups, evolutionary nodes, species) under which each pattern emerges remain unknown and require both broad evolutionary sampling and systematic examination of embryogenesis across species.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 784
Author(s):  
Daniel E. Carlin ◽  
Kassi Kosnicki ◽  
Sara Garamszegi ◽  
Trey Ideker ◽  
Helga Thorvaldsdóttir ◽  
...  

One commonly performed bioinformatics task is to infer functional regulation of transcription factors by observing differential expression under a knockout, and integrating DNA binding information of that transcription factor.   However, until now, this this task has required dedicated bioinformatics support to perform the necessary data integration. GenomeSpace provides a protocol, or “recipe”, and a user interface with inter-operating software tools to identifying protein occupancies along the genome from a ChIP-seq experiment and associated differentially regulated genes from an RNA-Seq experiment. By integrating RNA-Seq and ChIP-seq analyses, a user is easily able to associate differing expression phenotypes with changing epigenetic landscapes.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 276-276
Author(s):  
Bhakti Dwivedi ◽  
Manoj Bhasin

Abstract The genomics data-driven identification of gene signatures and pathways has been routinely explored for predicting cancer survival and making decisions related to targeted treatments. Many packages and tools have been developed to correlate single-gene features to clinical outcomes, but lack in performing such analysis based on multiple-genes, gene sets, and genes ratio. Furthermore, cluster marker genes associated with cell types, states and function from cancer single-cell transcriptomics studies remain an underutilized prognostic option. Additionally, no bioinformatics online tool evaluates associations between the enrichment of known cell types and survival outcome across cancers. We have developed Survival Genie (https://bbisr.shinyapps.winship.emory.edu/SurvivalGenie/, a web tool to perform survival analysis on single-cell RNA-seq data and a variety of other molecular inputs such as gene sets, genes ratio, tumor-infiltrating immune cells proportion, gene expression profile scores, and tumor mutation burden (Fig. 1). For comprehensive survival evaluation, the Survival Genie contains 53 datasets of 27 distinct malignancies from 11 different cancer programs for both adult and pediatric cancers including different types of leukemia. Users can upload single-cell data or gene sets and select partitioning methods (i.e., mean, median, quartile, cutp) to determine the effect of their levels on patient survival outcomes. The tool provides comprehensive results including box plots of low and high-risk groups, Kaplan-Meier plots with univariate Cox proportional hazards model, and correlation of immune cell enrichment and molecular profile (Fig. 1). The Survival Genie source code is written in the R programming language and the interactive web application with the R Shiny framework. We demonstrate the application of the Survival Genie tool by exploring the prognostic utility of blast cell and immune cell markers generated from single cell RNA-seq analysis of paired pediatric AML bone marrow samples taken at the time of diagnosis and end of induction (Thomas, Perumalla et al. 2020) . We identified AML blast specific signature consisting of 7 genes (CLEC11A, PRAME, AZU1, NREP, ARMH1, C1QBP, TRH) that depicted significant association with poor survival (HR=2.3 and Log Rank P-value=.007). Further analysis of AML relapse-associated single cell clusters showed increased levels of individual markers, including CRIP1, FLNA, and RFLNB/FAM101B and their significant association with poor survival in TARGET AML dataset. Additionally, expression of combined RFLNB/FAM101B and WDFY4 genes was associated with poor overall survival (HR=1.8 Log Rank P-value=0.01) and shorter event-free survival (HR=1.9, Log Rank P-value<0.0001). This clearly shows the usefulness of Survival Genie tool in exploring the prognostic association of genes as well as gene sets. Survival Genie is a one-stop web-portal for single-cell phenotype clusters, list of genes, and cell composition-based survival analyses across multiple cancer datasets including hematological malignancies. The analytical options and harmonized collection of multiple cancer types makes Survival Genie a comprehensive resource to correlate gene sets, pathways, cellular enrichment, and single cell clusters to clinical outcome to assist in developing next generation prognostic and therapeutic biomarkers. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document