scholarly journals Effects of technical noise on bulk RNA-seq differential gene expression inference

2019 ◽  
Author(s):  
Dylan Sheerin ◽  
Daniel O’Connor ◽  
Andrew J Pollard ◽  
Irina Mohorianu

AbstractMotivationInconsistent, analytical noise introduced either by the sequencing technology or by the choice of read-processing tools can bias bulk RNA-seq analyses by shifting the focus to the variation in expression of low-abundance transcripts; as a consequence these highly-variable genes are often included the differential expression (DE) call and impact the interpretation of results.ResultsTo illustrate the effects of “noise”, we present simulated datasets following closely the characteristics of a H.sapiens and a M.musculus dataset, respectively, highlighting the extent of technical-noise in both a high inter-individual variability (H. sapiens) and reduced variability (M. Musculus) setup. The sequencing-induced noise is assessed using correlations of distributions of expression across transcripts; analytical noise is evaluated through side-by-side comparisons of several standard choices. The proportion of genes in the noise-range differs for each tool combi-nation. Data-driven, sample-specific noise-thresholds were applied to reduce the impact of low-level variation. Noise-adjustment reduced the number of significantly DE genes and gave rise to convergent calls across tool combinations.AvailabilityThe code for determining the sequence-derived noise is available for download from: https://github.com/yry/noiseAnalysis/tree/master/noiseDetection_mRNA; the code for running the analysis is available for download from: https://github.com/sheerind/noise_detection.

2020 ◽  
Author(s):  
Paulina G. Eusebi ◽  
Natalia Sevane ◽  
Thomas O’Rourke ◽  
Manuel Pizarro ◽  
Cedric Boeckx ◽  
...  

AbstractAggressiveness is one of the most basic behaviors, characterized by targeted intentional actions oriented to cause harm. The reactive type of aggression is regulated mostly by the brain’s prefrontal cortex; however, the molecular changes underlying aggressiveness in adults have not been fully characterized. Here we used an RNA-seq approach to investigate differential gene expression in the prefrontal cortex of bovines from the aggressive Lidia breed at different age stages: young three-year old and adult four-year-old bulls. A total of 50 up and 193 down-regulated genes in the adult group were identified. Furthermore, a cross-species comparative analysis retrieved 29 genes in common with previous studies on aggressive behaviors, representing an above-chance overlap with the differentially expressed genes in adult bulls.Particularly, we detected changes in the regulation of networks such as synaptogenesis, involved in maintenance and refinement of synapses, and the glutamate receptor pathway, which acts as excitatory driver in aggressive responses. Our results provide insights into candidate genes and networks involved in the molecular mechanisms leading to the maturation of the brain. The reduced reactive aggression typical of domestication has been proposed to form part of a retention of juvenile traits as adults (neoteny). The significant age-associated differential expression of genes implicated in aggressive behaviors and concomitant increase in Lidia cattle aggression validates this species as a novel model comparator to explore the impact of behavioral neoteny under domestication.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1255
Author(s):  
Breon Schmidt ◽  
Marek Cmero ◽  
Paul Ekert ◽  
Nadia Davidson ◽  
Alicia Oshlack

Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble sample-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease samples from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour samples in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.


2018 ◽  
Author(s):  
Adam McDermaid ◽  
Xin Chen ◽  
Yiran Zhang ◽  
Juan Xie ◽  
Cankun Wang ◽  
...  

AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Davide Risso ◽  
Stefano M. Pagnotta

AbstractMotivationData transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformations on the outcome of unsupervised clustering procedures is still unclear.ResultsHere, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AvailabilityThe AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.


2016 ◽  
Author(s):  
Atray Dixit

AbstractAs part of the process of preparing scRNA-seq libraries, a diverse template is typically amplified by PCR. During amplification, spurious chimeric molecules can be formed between molecules originating in different cells. While several computational and experimental strategies have been suggested to mitigate the impact of chimeric molecules, they have not been addressed in the context of scRNA-seq experiments. We demonstrate that chimeras become increasingly problematic as samples are sequenced deeply and propose two computational solutions. The first is unsupervised and relies only on cell barcode and UMI information. The second is a supervised approach built on labeled data and a set of molecule specific features. The classifier can accurately identify most of the contaminating molecules in a deeply sequenced species mixing dataset. Code is publicly available at https://github.com/asncd/schimera.


Author(s):  
A K M Firoj Mahmud ◽  
Soumyadeep Nandi ◽  
Maria Fällman

AbstractSummarySince its introduction, RNA-seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for assessing gene expression have been designed around the structures of eukaryotic genes. There are a few stand-alone tools designed for prokaryotes, and they require improvement. A well-defined pipeline for prokaryotes that includes all the necessary tools for quality control, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results, and it produces publication-quality figures.Availability and implementationProkSeq is implemented in Python and is published under the ISC open source license. The tool and a detailed user manual are hosted at Docker: https://hub.docker.com/repository/docker/snandids/prokseq-v2.1, Anaconda: https://anaconda.org/snandiDS/prokseq; Github: https://github.com/snandiDS/prokseq.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhijin Wu ◽  
Kenong Su ◽  
Hao Wu

Single cell RNA-seq data, like data from other sequencing technology, contain systematic technical noise. Such noise results from a combined effect of unequal efficiencies in the capturing and counting of mRNA molecules, such as extraction/amplification efficiency and sequencing depth. We show that such technical effects are not only cell-specific, but also affect genes differently, thus a simple cell-wise size factor adjustment may not be sufficient. We present a non-linear normalization approach that provides a cell- and gene-specific normalization factor for each gene in each cell. We show that the proposed normalization method (implemented in “SC2P" package) reduces more technical variation than competing methods, without reducing biological variation. When technical effects such as sequencing depths are not balanced between cell populations, SC2P normalization also removes the bias due to uneven technical noise. This method is applicable to scRNA-seq experiments that do not use unique molecular identifier (UMI) thus retain amplification biases.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 143
Author(s):  
Travis L. Jensen ◽  
William F. Hooper ◽  
Sami R. Cherikh ◽  
Johannes B. Goll

Ribosomal profiling is an emerging experimental technology to measure protein synthesis by sequencing short mRNA fragments undergoing translation in ribosomes. Applied on the genome wide scale, this is a powerful tool to profile global protein synthesis within cell populations of interest. Such information can be utilized for biomarker discovery and detection of treatment-responsive genes. However, analysis of ribosomal profiling data requires careful preprocessing to reduce the impact of artifacts and dedicated statistical methods for visualizing and modeling the high-dimensional discrete read count data. Here we present Ribosomal Profiling Reports (RP-REP), a new open-source cloud-enabled software that allows users to execute start-to-end gene-level ribosomal profiling and RNA-Seq analysis on a pre-configured Amazon Virtual Machine Image (AMI) hosted on AWS or on the user’s own Ubuntu Linux server. The software works with FASTQ files stored locally, on AWS S3, or at the Sequence Read Archive (SRA). RP-REP automatically executes a series of customizable steps including filtering of contaminant RNA, enrichment of true ribosomal footprints, reference alignment and gene translation quantification, gene body coverage, CRAM compression, reference alignment QC, data normalization, multivariate data visualization, identification of differentially translated genes, and generation of heatmaps, co-translated gene clusters, enriched pathways, and other custom visualizations. RP-REP provides functionality to contrast RNA-SEQ and ribosomal profiling results, and calculates translational efficiency per gene. The software outputs a PDF report and publication-ready table and figure files. As a use case, we provide RP-REP results for a dengue virus study that tested cytosol and endoplasmic reticulum cellular fractions of human Huh7 cells pre-infection and at 6 h, 12 h, 24 h, and 40 h post-infection. Case study results, Ubuntu installation scripts, and the most recent RP-REP source code are accessible at GitHub. The cloud-ready AMI is available at AWS (AMI ID: RPREP RSEQREP (Ribosome Profiling and RNA-Seq Reports) v2.1 (ami-00b92f52d763145d3)).


Sign in / Sign up

Export Citation Format

Share Document