scholarly journals Differential variation and expression analysis

2018 ◽  
Author(s):  
Haim Bar ◽  
Elizabeth D. Schifano

AbstractWe propose an empirical Bayes approach using a three-component mixture model, the L2N model, that may be applied to detect both differential expression (mean) and variation. It consists of two log-normal components (L2) for the differentially expressed (dispersed) features (one component for under-expressed [dispersed] and the other for over-expressed [dispersed] features), and a single normal component (N) for the null features (i.e., non-differentially expressed [dispersed] features). Simulation results show that L2N can capture asymmetries in the numbers of over-and under-expressed (dispersed) features (e.g., genes) when they exist, can provide a better fit to data in which the distributions of the null and non-null features are not well-separated, but can also perform well under symmetry and separation. Thus the L2N model is particularly appealing when no a priori biological knowledge about the mixture density is available. The L2N model is implemented in an R package called DVX, for Differential Variation and eXpression analysis. The package also includes an implementation of differential expression analysis via the limma package, and a differential variation and expression analysis using a three-way normal mixture model. DVX is a user-friendly, graphical interface implemented via the ‘Shiny’ package [6], so that a user is not required to have R programming knowledge. It offers a set of diagnostics plots, data transformation tools, and report generation in Microsoft Excel- and Word-compatible formats. The package is available on the web, at https://haim-bar.uconn.edu/software/DVX/.

2020 ◽  
Vol 11 ◽  
Author(s):  
Xiayi Liu ◽  
Zhou Wu ◽  
Junying Li ◽  
Haigang Bao ◽  
Changxin Wu

The feather rate phenotype in chicks, including early-feathering and late-feathering phenotypes, are widely used as a sexing system in the poultry industry. The objective of this study was to obtain candidate genes associated with the feather rate in Shouguang chickens. In the present study, we collected 56 blood samples and 12 hair follicle samples of flight feathers from female Shouguang chickens. Then we identified the chromosome region associated with the feather rate by genome-wide association analysis (GWAS). We also performed RNA sequencing and analyzed differentially expressed genes between the early-feathering and late-feathering phenotypes using HISAT2, StringTie, and DESeq2. We identified a genomic region of 10.0–13.0 Mb of chromosome Z, which is statistically associated with the feather rate of Shouguang chickens at one-day old. After RNA sequencing analysis, 342 differentially expressed known genes between the early-feathering (EF) and late-feathering (LF) phenotypes were screened out, which were involved in epithelial cell differentiation, intermediate filament organization, protein serine kinase activity, peptidyl-serine phosphorylation, retinoic acid binding, and so on. The sperm flagellar 2 gene (SPEF2) and prolactin receptor (PRLR) gene were the only two overlapping genes between the results of GWAS and differential expression analysis, which implies that SPEF2 and PRLR are possible candidate genes for the formation of the chicken feathering phenotype in the present study. Our findings help to elucidate the molecular mechanism of the feather rate in chicks.


BMC Urology ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hongjian Wu ◽  
Wubing Jiang ◽  
Guanghua Ji ◽  
Rong Xu ◽  
Gaobo Zhou ◽  
...  

Abstract Background Bladder cancer (BC) is the second most frequent malignancy of the urinary system. The aim of this study was to identify key microRNAs (miRNAs) and hub genes associated with BC as well as analyse their targeted relationships. Methods According to the microRNA dataset GSE112264 and gene microarray dataset GSE52519, differentially expressed microRNAs (DEMs) and differentially expressed genes (DEGs) were obtained using the R limma software package. The FunRich software database was used to predict the miRNA-targeted genes. The overlapping common genes (OCGs) between miRNA-targeted genes and DEGs were screened to construct the PPI network. Then, gene ontology (GO) analysis was performed through the “cluster Profiler” and “org.Hs.eg.db” R packages. The differential expression analysis and hierarchical clustering of these hub genes were analysed through the GEPIA and UCSC Cancer Genomics Browser databases, respectively. KEGG pathway enrichment analyses of hub genes were performed through gene set enrichment analysis (GSEA). Results A total of 12 DEMs and 10 hub genes were identified. Differential expression analysis of the hub genes using the GEPIA database was consistent with the results for the UCSC Cancer Genomics Browser database. The results indicated that these hub genes were oncogenes, but VCL, TPM2, and TPM1 were tumour suppressor genes. The GSEA also showed that hub genes were most enriched in those pathways that were closely associated with tumour proliferation and apoptosis. Conclusions In this study, we built a miRNA-mRNA regulatory targeted network, which explores an understanding of the pathogenesis of cancer development and provides key evidence for novel targeted treatments for BC.


Vaccines ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1056
Author(s):  
Aijaz Parray ◽  
Fayaz Ahmad Mir ◽  
Asmma Doudin ◽  
Ahmad Iskandarani ◽  
Ibn Mohammed Masud Danjuma ◽  
...  

There is a lack of predictive markers for early and rapid identification of disease progression in COVID-19 patients. Our study aims at identifying microRNAs (miRNAs)/small nucleolar RNAs (snoRNAs) as potential biomarkers of COVID-19 severity. Using differential expression analysis of microarray data (n = 29), we identified hsa-miR-1246, ACA40, hsa-miR-4532, hsa-miR-145-5p, and ACA18 as the top five differentially expressed transcripts in severe versus asymptomatic, and ACA40, hsa-miR-3609, ENSG00000212378 (SNORD78), hsa-miR-1231, hsa-miR-885-3p as the most significant five in severe versus mild cases. Moreover, we found that white blood cell (WBC) count, absolute neutrophil count (ANC), neutrophil (%), lymphocyte (%), red blood cell (RBC) count, hemoglobin, hematocrit, D-Dimer, and albumin are significantly correlated with the identified differentially expressed miRNAs and snoRNAs. We report a unique miRNA and snoRNA profile that is associated with a higher risk of severity in a cohort of SARS-CoV-2 infected patients. Altogether, we present a differential expression analysis of COVID-19-associated microRNA (miRNA)/small nucleolar RNA (snoRNA) signature, highlighting their importance in SARS-CoV-2 infection.


2017 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Katrijn De Paepe ◽  
Celine Everaert ◽  
Pieter Mestdagh ◽  
Olivier Thas ◽  
...  

ABSTRACTBackgroundProtein-coding RNAs (mRNA) have been the primary target of most transcriptome studies in the past, but in recent years, attention has expanded to include long non-coding RNAs (lncRNA). lncRNAs are typically expressed at low levels, and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 14 popular tools for testing DE in RNA-seq data along with their normalization methods is comprehensively evaluated, with a particular focus on lncRNAs and low abundant mRNAs.ResultsThirteen performance metrics were used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Non-parametric procedures are used to simulate gene expression data in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, we kept track of the results for mRNA and lncRNA separately. All statistical models exhibited inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and analysis of benchmark RNA-seq datasets. No single tool uniformly outperformed the others.ConclusionOverall, the linear modeling with empirical Bayes moderation (limma) and the nonparametric approach (SAMSeq) showed best performance: good control of the false discovery rate (FDR) and reasonable sensitivity. However, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in a realistic clinical settings such as in cancer research. About half of the methods showed severe excess of false discoveries, making these methods unreliable for differential expression analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, http://statapps.ugent.be/tools/AppDGE/


2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

Differential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulation. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. Our results suggest an urgent need for a paradigm shift in the methods used to perform differential expression analysis in single-cell data.


2014 ◽  
Author(s):  
Zong Hong Zhang ◽  
Dhanisha J. Jhaveri ◽  
Vikki M. Marshall ◽  
Denis C. Bauer ◽  
Janette Edson ◽  
...  

Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.


2020 ◽  
Author(s):  
Diana Lobo ◽  
Raquel Godinho ◽  
John Archer

Abstract Background In the last decades, the evolution of RNA-Seq has yielded archived datasets that possess the potential for providing unprecedented inter-study insight into transcriptome evolution, once background noise has been reduced. Here we present a method to quantify intra-condition variation and to remove reference-based transcripts associated with highly variable read counts, prior to differential expression analysis. The method utilizes variation within pairwise distances between normalized read counts for each transcript across all included samples of a given condition. As a case study, we demonstrate our approach at an inter and intra-study level using RNA-seq data from brain samples of dogs, wolves, and two strains of fox (aggressive and tame) prior to performing differential expression analysis to identify common genes associated with tame behaviour. Results By applying our method, the distribution of the gene-wise dispersion estimates improved and the number of outliers detected in differential expression analysis decreased. Several genes that initially were differentially expressed in the non-filtered datasets were removed due to high intra-condition variation. Additionally, by optimizing the detection of differentially expressed transcripts, the overall number increased between dogs vs wolves and tame vs aggressive foxes when compared to the non-filtered datasets. Using these filtered sets, we found common over expressed genes in dogs and tame foxes, including those involved in brain development, neurotransmission and immunity, factors known to be involved in domestication. Conclusions We presented a method to quantify and remove intra-condition variation from RNA-seq count data and demonstrate its usage in improving the distribution of gene-wise dispersion estimates and ultimately, reduce the number of false positives in differential gene expression analysis. We provide the method as a freely available tool, to aid studies using RNA-seq to calculate and characterize the variation present within data prior to perform differential expression analysis. Additionally, we identify candidate genes involved with selection for tameness, which seems to have played a crucial role in the canine domestication.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.


Sign in / Sign up

Export Citation Format

Share Document