scholarly journals Identification of population-level differentially expressed genes in one-phenotype data

2020 ◽  
Vol 36 (15) ◽  
pp. 4283-4290
Author(s):  
Jiajing Xie ◽  
Yang Xu ◽  
Haifeng Chen ◽  
Meirong Chi ◽  
Jun He ◽  
...  

Abstract Motivation For some specific tissues, such as the heart and brain, normal controls are difficult to obtain. Thus, studies with only a particular type of disease samples (one phenotype) cannot be analyzed using common methods, such as significance analysis of microarrays, edgeR and limma. The RankComp algorithm, which was mainly developed to identify individual-level differentially expressed genes (DEGs), can be applied to identify population-level DEGs for the one-phenotype data but cannot identify the dysregulation directions of DEGs. Results Here, we optimized the RankComp algorithm, termed PhenoComp. Compared with RankComp, PhenoComp provided the dysregulation directions of DEGs and had more robust detection power in both simulated and real one-phenotype data. Moreover, using the DEGs detected by common methods as the ‘gold standard’, the results showed that the DEGs detected by PhenoComp using only one-phenotype data were comparable to those identified by common methods using case-control samples, independent of the measurement platform. PhenoComp also exhibited good performance for weakly differential expression signal data. Availability and implementation The PhenoComp algorithm is available on the web at https://github.com/XJJ-student/PhenoComp. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Peter Ebert ◽  
Marcel H Schulz

Abstract Motivation The generation of genome-wide maps of histone modifications using chromatin immunoprecipitation sequencing (ChIP-seq) is a standard approach to dissect the complexity of the epigenome. Interpretation and differential analysis of histone datasets remains challenging due to regulatory meaningful co-occurrences of histone marks and their difference in genomic spread. To ease interpretation, chromatin state segmentation maps are a commonly employed abstraction combining individual histone marks. We developed the tool SCIDDO as a fast, flexible, and statistically sound method for the differential analysis of chromatin state segmentation maps. Results We demonstrate the utility of SCIDDO in a comparative analysis that identifies differential chromatin domains (DCD) in various regulatory contexts and with only moderate computational resources. We show that the identified DCDs correlate well with observed changes in gene expression and can recover a substantial number of differentially expressed genes. We showcase SCIDDO’s ability to directly interrogate chromatin dynamics such as enhancer switches in downstream analysis, which simplifies exploring specific questions about regulatory changes in chromatin. By comparing SCIDDO to competing methods, we provide evidence that SCIDDO’s performance in identifying differentially expressed genes (DEG) via differential chromatin marking is more stable across a range of cell-type comparisons and parameter cut-offs. Availability The SCIDDO source code is openly available under github.com/ptrebert/sciddo Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Nick Strayer ◽  
Jana K Shirey-Rice ◽  
Yu Shyr ◽  
Joshua C Denny ◽  
Jill M Pulley ◽  
...  

Abstract Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene–disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability and implementation A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Ling-Yun Chen ◽  
Diego F. Morales-Briones ◽  
Courtney N. Passow ◽  
Ya Yang

AbstractMotivationQuality of gene expression analyses using de novo assembled transcripts in species experienced recent polyploidization is yet unexplored.ResultsFive plant species with various polyploidy history were used for differential gene expression (DGE) analyses. DGE analyses using putative genes inferred by Trinity performed similar to or better than Corset and Grouper in precision, but lower in sensitivity. In species that lack polyploidy event in the past few million years, DGE analyses using de novo assembled transcriptome identified 50–76% of the differentially expressed genes recovered by mapping reads to the reference genes. However, in species with more recent polyploidy event, the percentage decreased to 7–30%. In addition, 7–89% of differentially expressed genes from de novo assembly are contaminations. Gene co-expression network analyses using de novo assemblies vs. mapping to the reference genes recovered the same module that significantly correlated with treatment in one of the five species tested.Availability and ImplementationCommands and scripts used in this study are available at https://bitbucket.org/lychen83/chen_et_al_2018_benchmark_dge/; Analysis files are available at Dryad doi: [email protected] informationSupplementary data are available at Bioinformatics online


2021 ◽  
Author(s):  
Jiyeon Kim Denninger ◽  
Logan A Walker ◽  
Xi Chen ◽  
Altan M Turkoglu ◽  
Alexander Pan ◽  
...  

Multipotent neural stem cells (NSCs) are found in several isolated niches of the adult mammalian brain where they have unique potential to assist in tissue repair. Modern transcriptomics offer high-throughput methods for identifying disease or injury associated gene expression signatures in endogenous adult NSCs, but they require adaptation to accommodate the rarity of NSCs. Bulk RNA sequencing (RNAseq) of NSCs requires pooling several mice, which impedes application to labor-intensive injury models. Alternatively, single cell RNAseq can profile hundreds to thousands of cells from a single mouse and is increasingly used to study NSCs. The consequences of the low RNA input from a single NSC on downstream identification of differentially expressed genes (DEGs) remains largely unexplored. Here, to clarify the role that low RNA input plays in NSC DEG identification, we directly compared DEGs in an oxidative stress model of cultured NSCs by bulk and single cell sequencing. While both methods yielded DEGs that were replicable, single cell sequencing DEGs derived from genes with higher relative transcript counts compared to all detected genes and exhibited smaller fold changes than DEGs identified by bulk RNAseq. The loss of high fold-change DEGs in the single cell platform presents an important limitation for identifying disease-relevant genes. To facilitate identification of such genes, we determined an RNA-input threshold that enables transcriptional profiling of NSCs comparable to standard bulk sequencing and used it to establish a workflow for in vivo profiling of endogenous NSCs. We then applied this workflow to identify DEGs after lateral fluid percussion injury, a labor-intensive animal model of traumatic brain injury. Our work suggests that single cell RNA sequencing may underestimate the diversity of pathologic DEGs but population level transcriptomic analysis can be adapted to capture more of these DEGs with similar efficacy and diversity as standard bulk sequencing. Together, our data and workflow will be useful for investigators interested in understanding and manipulating adult hippocampal NSC responses to various stimuli.


Author(s):  
Silver A Wolf ◽  
Lennard Epping ◽  
Sandro Andreotti ◽  
Knut Reinert ◽  
Torsten Semmler

Abstract Summary RNA-sequencing (RNA-Seq) is the current method of choice for studying bacterial transcriptomes. To date, many computational pipelines have been developed to predict differentially expressed genes from RNA-Seq data, but no gold-standard has been widely accepted. We present the Snakemake-based tool Smart Consensus Of RNA Expression (SCORE) which uses a consensus approach founded on a selection of well-established tools for differential gene expression analysis. This allows SCORE to increase the overall prediction accuracy and to merge varying results into a single, human-readable output. SCORE performs all steps for the analysis of bacterial RNA-Seq data, from read preprocessing to the overrepresentation analysis of significantly associated ontologies. Development of consensus approaches like SCORE will help to streamline future RNA-Seq workflows and will fundamentally contribute to the creation of new gold-standards for the analysis of these types of data. Availability and implementation https://github.com/SiWolf/SCORE. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2 (Supplement_2) ◽  
pp. ii8-ii8
Author(s):  
Mario Henriquez ◽  
Wei Huff ◽  
Jack Shireman ◽  
Gina Monaco ◽  
Namita Agrawal ◽  
...  

Abstract BACKGROUND Stereotactic radiosurgery (SRS) is an increasingly common modality used with or without surgery for the treatment of brain metastases (BM). However, the effects of SRS on tumors in vivo is unknown. METHODS Patients were treated with SRS prior to surgery as per clinical trial NCT03398694. Resected tumor was divided into two groups: ‘center’ and ‘periphery’ with respect to SRS treatment. Tissue were analyzed by DNA and RNA sequencing and compared between the two and to non-radiated tumor. RESULTS DNA analysis showed at the individual level, matched comparison of SRS samples from the center or periphery of the same tumor had mutational burden differences. RNA analysis revealed no differentially expressed genes between center and periphery, but there were 62 and 192 differentially expressed genes between the center or periphery and non-radiated control, respectively. At an individual level, matched center and periphery tumor had an average of 16641 differentially expressed genes. Comparing total number of up- and downregulated genes with SNP and Indel mutations of matched patient samples, in patients with higher mutational burdens in peripheral tumors as compared to center there was a higher number of upregulated genes. Reciprocally, when mutation burden was higher in center tumor, total number of genes that were either up- or downregulated were about the same. Pooled analysis revealed significant downregulation of oncogenes, such as TP63 and RECQL4, in the SRS group. DO enrichment analysis also revealed pathways related to NSCLC and lung carcinoma significantly altered in radiation cohort. CONCLUSION In summary, this study demonstrates that SRS alters the molecular and genomic profile of NSCLC BM. It results in downregulation of oncogenes and pathways related to lung cancer. Additionally, by sampling the tumor at the center and periphery, there are differential effects of the dose gradient on the cellular and molecular response to ionizing radiation.


Author(s):  
Yixin Guo ◽  
Ziwei Xue ◽  
Ruihong Yuan ◽  
Jingyi Jessica Li ◽  
William A Pastor ◽  
...  

Abstract Summary With the advance of genomic sequencing techniques, chromatin accessible regions, transcription factor binding sites and epigenetic modifications can be identified at genome-wide scale. Conventional analyses focus on the gene regulation at proximal regions; however, distal regions are usually less focused, largely due to the lack of reliable tools to link these regions to coding genes. In this study, we introduce RAD (Region Associated Differentially expressed genes), a user-friendly web tool to identify both proximal and distal region associated differentially expressed genes (DEGs). With DEGs and genomic regions of interest (gROI) as input, RAD maps the up- and down-regulated genes associated with any gROI and helps researchers to infer the regulatory function of these regions based on the distance of gROI to differentially expressed genes. RAD includes visualization of the results and statistical inference for significance. Availability and implementation RAD is implemented with Python 3.7 and run on a Nginx server. RAD is freely available at https://labw.org/rad as online web service. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 2 ◽  
pp. 63 ◽  
Author(s):  
Syed Shariyar Murtaza ◽  
Patrycja Kolpak ◽  
Ayse Bener ◽  
Prabhat Jha

Verbal autopsy (VA) deals with post-mortem surveys about deaths, mostly in low and middle income countries, where the majority of deaths occur at home rather than a hospital, for retrospective assignment of causes of death (COD) and subsequently evidence-based health system strengthening. Automated algorithms for VA COD assignment have been developed and their performance has been assessed against physician and clinical diagnoses. Since the performance of automated classification methods remains low, we aimed to enhance the Naïve Bayes Classifier (NBC) algorithm to produce better ranked COD classifications on 26,766 deaths from four globally diverse VA datasets compared to some of the leading VA classification methods, namely Tariff, InterVA-4, InSilicoVA and NBC. We used a different strategy, by training multiple NBC algorithms using the one-against-all approach (OAA-NBC). To compare performance, we computed the cumulative cause-specific mortality fraction (CSMF) accuracies for population-level agreement from rank one to five COD classifications. To assess individual-level COD assignments, cumulative partially-chance corrected concordance (PCCC) and sensitivity was measured for up to five ranked classifications. Overall results show that OAA-NBC consistently assigns CODs that are the most alike physician and clinical COD assignments compared to some of the leading algorithms based on the cumulative CSMF accuracy, PCCC and sensitivity scores. The results demonstrate that our approach improves the performance of classification (sensitivity) by between 6% and 8% compared with other VA algorithms. Population-level agreements for OAA-NBC and NBC were found to be similar or higher than the other algorithms used in the experiments. Although OAA-NBC still requires improvement for individual-level COD assignment, the one-against-all approach improved its ability to assign CODs that more closely resemble physician or clinical COD classifications compared to some of the other leading VA classifiers.


2018 ◽  
Vol 2 ◽  
pp. 63
Author(s):  
Syed Shariyar Murtaza ◽  
Patrycja Kolpak ◽  
Ayse Bener ◽  
Prabhat Jha

Verbal autopsy (VA) deals with post-mortem surveys about deaths, mostly in low and middle income countries, where the majority of deaths occur at home rather than a hospital, for retrospective assignment of causes of death (COD) and subsequently evidence-based health system strengthening. Automated algorithms for VA COD assignment have been developed and their performance has been assessed against physician and clinical diagnoses. Since the performance of automated classification methods remains low, we aimed to enhance the Naïve Bayes Classifier (NBC) algorithm to produce better ranked COD classifications on 26,766 deaths from four globally diverse VA datasets compared to some of the leading VA classification methods, namely Tariff, InterVA-4, InSilicoVA and NBC. We used a different strategy, by training multiple NBC algorithms using the one-against-all approach (OAA-NBC). To compare performance, we computed the cumulative cause-specific mortality fraction (CSMF) accuracies for population-level agreement from rank one to five COD classifications. To assess individual-level COD assignments, cumulative partially-chance corrected concordance (PCCC) and sensitivity was measured for up to five ranked classifications. Overall results show that OAA-NBC consistently assigns CODs that are the most alike physician and clinical COD assignments compared to some of the leading algorithms based on the cumulative CSMF accuracy, PCCC and sensitivity scores. The results demonstrate that our approach improves the performance of classification (sensitivity) from 6% to 8% when compared against current leading VA classifiers. Population-level agreements for OAA-NBC and NBC were found to be similar or higher than the other algorithms used in the experiments. Although OAA-NBC still requires improvement for individual-level COD assignment, the one-against-all approach improved its ability to assign CODs that more closely resemble physician or clinical COD classifications compared to some of the other leading VA classifiers.


mBio ◽  
2017 ◽  
Vol 8 (2) ◽  
Author(s):  
Meredith S. Wright ◽  
Michael R. Jacobs ◽  
Robert A. Bonomo ◽  
Mark D. Adams

ABSTRACT Acinetobacter baumannii is an increasingly common multidrug-resistant pathogen in health care settings. Although the genetic basis of antibiotic resistance mechanisms has been extensively studied, much less is known about how genetic variation contributes to other aspects of successful infections. Genetic changes that occur during host infection and treatment have the potential to remodel gene expression patterns related to resistance and pathogenesis. Longitudinal sets of multidrug-resistant A. baumannii isolates from eight patients were analyzed by RNA sequencing (RNA-seq) to identify differentially expressed genes and link them to genetic changes contributing to transcriptional variation at both within-patient and population levels. The number of differentially expressed genes among isolates from the same patient ranged from 26 (patient 588) to 145 (patient 475). Multiple patients had isolates with differential gene expression patterns related to mutations in the pmrAB and adeRS two-component regulatory system genes, as well as significant differences in genes related to antibiotic resistance, iron acquisition, amino acid metabolism, and surface-associated proteins. Population level analysis revealed 39 genetic regions with clade-specific differentially expressed genes, for which 19, 8, and 3 of these could be explained by insertion sequence mobilization, recombination-driven sequence variation, and intergenic mutations, respectively. Multiple types of mutations that arise during infection can significantly remodel the expression of genes that are known to be important in pathogenesis. IMPORTANCE Health care-associated multidrug-resistant Acinetobacter baumannii can cause persistent infections in patients, but bacterial cells must overcome host defenses and antibiotic therapies to do so. Genetic variation arises during host infection, and new mutations are often enriched in genes encoding transcriptional regulators, iron acquisition systems, and surface-associated structures. In this study, genetic variation was shown to result in transcriptome remodeling at the level of individual patients and across phylogenetic groups. Differentially expressed genes include those related to capsule modification, iron acquisition, type I pili, and antibiotic resistance. Population level transcriptional variation reflects genome dynamics over longer evolutionary time periods, and convergent transcriptional changes support the adaptive significance of these regions. Transcriptional changes can be attributed to multiple types of genomic change, but insertion sequence mobilization had a predominant effect. The transcriptional effects of mutations that arise during infection highlight the rapid adaptation of A. baumannii during host exposure. IMPORTANCE Health care-associated multidrug-resistant Acinetobacter baumannii can cause persistent infections in patients, but bacterial cells must overcome host defenses and antibiotic therapies to do so. Genetic variation arises during host infection, and new mutations are often enriched in genes encoding transcriptional regulators, iron acquisition systems, and surface-associated structures. In this study, genetic variation was shown to result in transcriptome remodeling at the level of individual patients and across phylogenetic groups. Differentially expressed genes include those related to capsule modification, iron acquisition, type I pili, and antibiotic resistance. Population level transcriptional variation reflects genome dynamics over longer evolutionary time periods, and convergent transcriptional changes support the adaptive significance of these regions. Transcriptional changes can be attributed to multiple types of genomic change, but insertion sequence mobilization had a predominant effect. The transcriptional effects of mutations that arise during infection highlight the rapid adaptation of A. baumannii during host exposure.


Sign in / Sign up

Export Citation Format

Share Document