scholarly journals DEgenes Hunter - A Flexible R Pipeline for Automated RNA-seq Studies in Organisms without Reference Genome

2017 ◽  
Vol 3 (3) ◽  
pp. 31 ◽  
Author(s):  
Isabel González Gayte ◽  
Rocío Bautista Moreno ◽  
Pedro Seoane Zonjic ◽  
M. Gonzalo Claros

Differential gene expression based on RNA-seq is widely used. Bioinformatics skills are required since no algorithm is appropriate for all experimental designs. Moreover, when working with organisms without reference genome, functional analysis is less than straightforward in most situations. DEgenes Hunter, an attempt to automate the process, is based on two independent scripts, one for differential expression and one for functional interpretation. Based on replicates, the R script decides which of the edgeR, DEseq2, NOISeq and limma algorithms are appropriate. It performs quality control calculations and provides the prevalent, most reliable, set of differentially expressed genes, and lists all other possible candidates for further functional interpretation. It also provides a combined P-value that allows differentially expressed genes ranking. It has been tested with synthetic and real-world datasets, showing in both cases ease of use and reliable results. With real data, DEgenes Hunter offers straightforward functional interpretation.

2020 ◽  
Vol 26 (Supplement_1) ◽  
pp. S32-S32
Author(s):  
Reza Yarani ◽  
Oana Palasca ◽  
Nadezhda Tsankova Doncheva ◽  
Christian Anthon ◽  
Bartosz Pilecki ◽  
...  

Abstract Background Dextran sulfate sodium (DSS) ulcerative colitis (UC) murine models have long been used for in vivo studies. DSS is a negatively charged polysaccharide with colitogenic properties. Although the mechanisms by which DSS induces intestinal inflammation are not fully understood, there are several good reasons why the DSS chemical colitis model for investigating the immunopathogenesis mechanism of UC is widely used. These include strong phenotypic clinical manifestations which emulate numerous clinical and histopathological features of human UC, ease of use, low mortality rate and high reproducibility. Here, by using high-throughput RNA sequencing analysis we set to investigate the major predicted gene regulators (GRs) affected by differentially expressed genes in the DSS treated UC model in order to obtain regulatory insights into the pathogenic mechanisms of UC development. Methods A DSS-induced mouse model of UC was established. Total RNA from colon tissue and blood of 3 healthy and 3 DSS-treated mice was extracted and sequenced by Illumina HiSeq 4000. Gene expression levels were obtained by mapping and quantification to the annotated mouse genome. Subsequently, differential gene expression analysis between DSS-treated and control mice both in colon and blood was performed. Ingenuity pathway analysis software (IPA®, Qiagen) was used to predict/identify major GRs affected by significantly differentially expressed genes (SDEGs, FC > |2|, p ≤0.05) in both colon and blood. Results Our analysis revealed how many and which major GRs are affected in DSS-treated mice and also the direction of change as compared to healthy (untreated) mice. In colon, 595 activated and 198 inhibited major GRs (p-value of overlap ≤0.05) in relation to ∼ 3180 SDEGs were identified, while in blood, we identified 205 activated and 62 inhibited GRs (in relation to ∼650 SDEGs). Colon and blood share 181 activated and 41 inhibited GRs. Identified GRs include transcription regulators, cytokines, transmembrane receptors and enzymes that mainly contribute to the development of inflammatory/immune responses. In colon and blood, the top 10 activated and inhibited regulators with the highest positive and negative activation z-score with target molecules as well as expression in the datasets are indicated in Figure 1a and 1b, respectively. Conclusion In this study, we analyzed linkage of GRs to SDEGs through coordinated expression and identified potential major regulators that have significant effect on UC pathogenic-related gene expression. These GRs seem to be the key regulators of transcriptomic changes induced by inflammation. These findings expand our molecular understanding of putative new targets that may be important in the pathophysiology of UC and provide biological insights into the observed expression changes between the UC and healthy controls.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Yeye Zhao ◽  
Yuanqing Si ◽  
Longfei Mei ◽  
Jiadi Wu ◽  
Jing Shao ◽  
...  

Abstract Objectives The purpose of this experiment is to analyze the changes of transcriptome in Pseudomonas aeruginosa under the action of sodium houttuyfonate (SH) to reveal the possible mechanism of SH inhibiting P. aeruginosa. We analyzed these data in order to compare the transcriptomic differences of P. aeruginosa in SH treatment and blank control groups. Data description In this project, RNA-seq of BGISEQ-500 platform was used to sequence the transcriptome of P. aeruginosa, and sequencing data of 8 samples of P. aeruginosa are generated as follows: SH treatment (SH1, SH2, SH3, SH4), negative control (Control 1, Control 2, Control 3, Control 4). Quality control is carried out on raw reads to determine whether the sequencing data is suitable for subsequent analysis. Totally 170.53 MB of transcriptome sequencing data is obtained. Then the filtered clean reads are aligned and compared to the reference genome to proceed second quality control. After completion, 5938 genes are assembled from sequencing data. Further quantitative analysis of genes and screening of differentially expressed genes based on gene expression level reveals that there are 2047 significantly differentially expressed genes under SH treatment, including 368 up-regulated genes and 1679 down-regulated genes.


Author(s):  
Silver A Wolf ◽  
Lennard Epping ◽  
Sandro Andreotti ◽  
Knut Reinert ◽  
Torsten Semmler

Abstract Summary RNA-sequencing (RNA-Seq) is the current method of choice for studying bacterial transcriptomes. To date, many computational pipelines have been developed to predict differentially expressed genes from RNA-Seq data, but no gold-standard has been widely accepted. We present the Snakemake-based tool Smart Consensus Of RNA Expression (SCORE) which uses a consensus approach founded on a selection of well-established tools for differential gene expression analysis. This allows SCORE to increase the overall prediction accuracy and to merge varying results into a single, human-readable output. SCORE performs all steps for the analysis of bacterial RNA-Seq data, from read preprocessing to the overrepresentation analysis of significantly associated ontologies. Development of consensus approaches like SCORE will help to streamline future RNA-Seq workflows and will fundamentally contribute to the creation of new gold-standards for the analysis of these types of data. Availability and implementation https://github.com/SiWolf/SCORE. Supplementary information Supplementary data are available at Bioinformatics online.


Viruses ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2374
Author(s):  
Joanna Sajewicz-Krukowska ◽  
Jan Paweł Jastrzębski ◽  
Maciej Grzybek ◽  
Katarzyna Domańska-Blicharz ◽  
Karolina Tarasiuk ◽  
...  

Astrovirus infections pose a significant problem in the poultry industry, leading to multiple adverse effects such as a decreased egg production, breeding disorders, poor weight gain, and even increased mortality. The commonly observed chicken astrovirus (CAstV) was recently reported to be responsible for the “white chicks syndrome” associated with an increased embryo/chick mortality. CAstV-mediated pathogenesis in chickens occurs due to complex interactions between the infectious pathogen and the immune system. Many aspects of CAstV–chicken interactions remain unclear, and there is no information available regarding possible changes in gene expression in the chicken spleen in response to CAstV infection. We aim to investigate changes in gene expression triggered by CAstV infection. Ten 21-day-old SPF White Leghorn chickens were divided into two groups of five birds each. One group was inoculated with CAstV, and the other used as the negative control. At 4 days post infection, spleen samples were collected and immediately frozen at −70 °C for RNA isolation. We analyzed the isolated RNA, using RNA-seq to generate transcriptional profiles of the chickens’ spleens and identify differentially expressed genes (DEGs). The RNA-seq findings were verified by quantitative reverse-transcription PCR (qRT-PCR). A total of 31,959 genes was identified in response to CAstV infection. Eventually, 45 DEGs (p-value < 0.05; log2 fold change > 1) were recognized in the spleen after CAstV infection (26 upregulated DEGs and 19 downregulated DEGs). qRT-PCR performed on four genes (IFIT5, OASL, RASD1, and DDX60) confirmed the RNA-seq results. The most differentially expressed genes encode putative IFN-induced CAstV restriction factors. Most DEGs were associated with the RIG-I-like signaling pathway or more generally with an innate antiviral response (upregulated: BLEC3, CMPK2, IFIT5, OASL, DDX60, and IFI6; downregulated: SPIK5, SELENOP, HSPA2, TMEM158, RASD1, and YWHAB). The study provides a global analysis of host transcriptional changes that occur during CAstV infection in vivo and proves that, in the spleen, CAstV infection in chickens predominantly affects the cell cycle and immune signaling.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Brandon Monier ◽  
Jing Zhao ◽  
Qin Ma

AbstractDifferential gene expression (DGE) is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes (DEGs) across two or more conditions. Interpretation of the DGE results can be non-intuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we present an R package, ViDGER (Visualization of Differential Gene Expression Results using R), which contains nine functions that generate information-rich visualizations for the interpretation of DGE results from three widely-used tools, Cuffdiff, DESeq2, and edgeR.


Plants ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 1011
Author(s):  
Junping Xu ◽  
Chang Ho Ahn ◽  
Ju Young Shin ◽  
Pil Man Park ◽  
Hye Ryun An ◽  
...  

Toluene is an industrial raw material and solvent that can be found abundantly in our daily life products. The amount of toluene vapor is one of the most important measurements for evaluating air quality. The evaluation of toluene scavenging ability of different plants has been reported, but the mechanism of plant response to toluene is only partially understood. In this study, we performed RNA sequencing (RNA-seq) analysis to detect differential gene expression in toluene-treated and untreated leaves of Ardisiapusilla. A total of 88,444 unigenes were identified by RNA-seq analysis, of which 49,623 were successfully annotated and 4101 were differentially expressed. Gene ontology analysis revealed several subcategories of genes related to toluene response, including cell part, cellular process, organelle, and metabolic processes. We mapped the main metabolic pathways of genes related to toluene response and found that the differentially expressed genes were mainly involved in glycolysis/gluconeogenesis, starch and sucrose metabolism, glycerophospholipid metabolism, carotenoid biosynthesis, phenylpropanoid biosynthesis, and flavonoid biosynthesis. In addition, 53 transcription factors belonging to 13 transcription factor families were identified. We verified 10 differentially expressed genes related to metabolic pathways using quantitative real-time PCR and found that the results of RNA-seq were positively correlated with them, indicating that the transcriptome data were reliable. This study provides insights into the metabolic pathways involved in toluene response in plants.


2020 ◽  
Vol 6 (4) ◽  
pp. 205521732097851
Author(s):  
IS Brorson ◽  
AM Eriksson ◽  
IS Leikfoss ◽  
V Vitelli ◽  
EG Celius ◽  
...  

Background Genetic and clinical observations have indicated T cells are involved in MS pathology. There is little insight in how T cells are involved and whether or not these can be used as markers for MS. Objectives Analysis of the gene expression profiles of circulating CD8+ T cells of MS patients compared to healthy controls. Methods RNA from purified CD8+ T cells was sequenced and analyzed for differential gene expression. Pathway analyses of genes at several p-value cutoffs were performed to identify putative pathways involved. Results We identified 36 genes with significant differential gene expression in MS patients. Four genes reached at least 2-fold differences in expression. The majority of differentially expressed genes was higher expressed in MS patients. Genes associated to MS in GWAS showed enrichment amongst the differentially expressed genes. We did not identify enrichment of specific pathways amongst the differentially expressed genes in MS patients. Conclusions CD8+ T cells of MS patients show differential gene expression, with predominantly higher activity of genes in MS patients. We do not identify specific biological pathways in our study. More detailed analysis of CD8+ T cells and subtypes of these may increase understanding of how T cells are involved in MS.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 1972-1972
Author(s):  
Abderrahmane Bousta ◽  
Sabrina Bondu ◽  
Alexandre Houy ◽  
Nicolas Cagnard ◽  
Carine Lefevre ◽  
...  

Abstract Introduction SF3B1 hotspot mutations are associated with various cancers like uveal melanoma, chronic lymphocytic leukemia and myelodysplastic syndrome with ring sideroblasts (MDS-RS). These mutations affect RNA splicing by the use of alternative branchpoints resulting in an aberrant 3' splice site (ss) selection. RNA-sequencing (RNA-seq) analyzed to quantify exon-exon junctions identified aberrantly spliced transcripts in target genes, and half of them are predicted to be degraded by non-sense mediated decay. For this reason, target genes in SF3B1-mutated MDS remain partially characterized. In the present study, we performed deep RNA-seq analysis of bone marrow mononuclear cells in low/int-1 MDS with SF3B1 mutations to identify aberrant/cryptic splicing events among up or down-regulated gene sets. Methods SF3B1 MUT MDS (n=21) were compared to 6 SF3B1WT cases and 5 controls. Analysis of RNA-seq read count was performed using the Voom method associated with the Limma empirical Bayes analysis pipeline (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29). Up or downregulated gene sets were identified using Gene Set Enrichment Analysis (GSEA, false discovery rate<0.1). Gene expression profiling data (Affymetrix Hu2.0) were also available for 26/27 patient samples. TopHat (v2.0.6) was used to align the reads against the human reference genome Hg19 RefSeq (RNA sequences, GRCh37) downloaded from the UCSC Genome Browser (http://genome.ucsc.edu). Read counts for splicing junctions from junctions.bed TopHat output were considered for a differential analysis using DESeq2. Only alternative acceptor splice sites (two or more 3′ss with junctions to the same 5′ss) and alternative donor splice sites (two or more 5′ss with junctions to the same 3′ss) with P-values ≤10−5 (Benjamini-Hochberg) and absolute Log2 (fold change) ≥1 were considered. Results Principal component analysis (PCA) nicely discriminated controls from patients, and patients according to the presence of a SF3B1 mutation. A set of 6971 genes was differently expressed (P- value<0.05) between SF3B1MUT and SF3B1WT cases and allows unsupervised clustering in two separated groups (Fig. 1). Distinct gene sets also discriminated SF3B1MUT or SF3B1WT from controls. Consistent with increased amount of erythroblasts in MDS-RS bone marrows, a set of erythroid genes including several genes involved in hemebiosynthesis pathway (ALAD, UROS, ALAS2, UROD) was significantly enriched in SF3B1MUT samples. Genes selected for their involvement in the core iron-sulfur cluster mitochondrial machinery (FXN, BOLA3, FDXR, GLRX5, ISCA2, NFS1, ISCU), the iron binding and trafficking (SLC25A38, ABCB10, TFR2, SLC25A37, ABCB6, FAM132B, SLC25A39, FTH1) and the cellular iron homeostasis (ACO1, ACO2, GLRX3) were also significantly enriched (FDR<10% and nominal P-value<0.05) when input in GSEA. Moreover, other enriched gene sets were G2M checkpoint, MYC targets, oxidative phosphorylation and E2F targets. All of these observations were similarly obtained when analyzing Affymetrix data. Furthermore SF3B1MUT samples with a K700E substitution harbored a specific pattern of deregulated genes, which allowed the ordering of SF3B1MUT samples according to the type of substitution. As previously reported by AlsafadiS et al (2016), analysis of splice junctions using DESeq2 revealed an overall high level of differences between SF3B1MUT and SF3B1WTsamples. Among more than 540 differentially spliced junctions, more than 80% involved an aberrant acceptor (3'ss) site. As determined by PCA, the top 50 genes associated with relevant aberrant junctions were linked to iron metabolism or erythropoiesis and differentially expressed between SF3B1MUT and SF3B1WTsamples. Conclusion In this study, we combined robust analyses of gene expression and aberrantly spliced transcript expression in MDS with SF3B1 mutation. By comparing SF3B1MUTversus SF3B1WT samples, we identified a set of deregulated genes in which both normally and aberrantly spliced transcripts were detected that could contribute to the physiopathology of MDS-RS. Figure 1 Hierarchical clustering and heatmapshowing differentially expressed genes (P-value<0.05) between SF3B1MUT (n=21, black) and SF3B1WT samples (n=6, grey) Ref. Alsafadi S et al. Nat Commun. 2016 Feb 4;7:10615. Figure 1. Hierarchical clustering and heatmapshowing differentially expressed genes (P-value<0.05) between SF3B1MUT (n=21, black) and SF3B1WT samples (n=6, grey) Ref. Alsafadi S et al. Nat Commun. 2016 Feb 4;7:10615. Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Chanwoo Kim ◽  
Hanbin Lee ◽  
Juhee Jeong ◽  
Keehoon Jung ◽  
Buhm Han

ABSTRACTA common approach to analyzing single-cell RNA-sequencing data is to cluster cells first and then identify differentially expressed genes based on the clustering result. However, clustering has an innate uncertainty and can be imperfect, undermining the reliability of differential expression analysis results. To overcome this challenge, we present MarcoPolo, a clustering-free approach to exploring differentially expressed genes. To find informative genes without clustering, MarcoPolo exploits the bimodality of gene expression to learn the group information of the cells with respect to the expression level directly from given data. Using simulations and real data analyses, we showed that our method puts biologically informative genes at higher ranks more accurately and robustly than other existing methods. As our method provides information on how cells can be grouped for each gene, it can help identify cell types that are not separated well in the standard clustering process. Our method can also be used as a feature selection method to improve the robustness against changes in the number of genes used in clustering.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yan Zhou ◽  
Bin Yang ◽  
Junhui Wang ◽  
Jiadi Zhu ◽  
Guoliang Tian

Abstract Background Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. Results In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. Conclusions Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB.


Sign in / Sign up

Export Citation Format

Share Document