scholarly journals MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Barry Slaff ◽  
Caleb M. Radens ◽  
Paul Jewell ◽  
Anupama Jha ◽  
Nicholas F. Lahens ◽  
...  

AbstractThe effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN’s effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.

2020 ◽  
Author(s):  
Barry Slaff ◽  
Caleb M Radens ◽  
Paul Jewell ◽  
Anupama Jha ◽  
Nicholas F Lahens ◽  
...  

AbstractWhile the effects of confounders on gene expression analysis have been extensively studied there is a lack of equivalent analysis and tools for RNA splicing analysis. Here we assess the effect of confounders in two large public RNA-Seq datasets (TARGET, ENCODE), develop a new method, MOCCASIN, to correct the effect of both known and unknown confounders on RNA splicing quantification, and demonstrate MOCCASIN’s effectiveness on both synthetic and real data.


Author(s):  
Isaac Raplee ◽  
Alexei Evsikov ◽  
Caralina Marín de Evsikova

The rapid expansion of transcriptomics from increased affordability of next-generation sequencing (NGS) technologies generates rocketing amounts of gene expression data across biology and medicine, and notably in cancer research. Concomitantly, many bioinformatics tools were developed to streamline gene expression analysis and quantification. We tested the concordance of NGS RNA sequencing (RNA-seq) analysis outcomes between the two predominant programs for reads alignment, HISAT2 and STAR, and the two most popular programs for quantifying gene expression in NGS experiments, edgeR and DESeq2, using RNA-seq data from a series of breast cancer progression specimens, which include histologically confirmed normal, early neoplasia, ductal carcinoma in situ and infiltrating ductal carcinoma samples microdissected from formalin fixed, paraffin embedded (FFPE) breast tissue blocks. We identified significant differences in aligners’ performance: HISAT2 was prone to misalign reads to retrogene genomic loci, STAR generated more precise alignments, especially for early neoplasia samples. edgeR and DESeq2 produced similar lists of differentially expressed genes in stage comparisons, with edgeR producing more conservative, though shorter, lists of genes. Albeit, Gene Ontology (GO) enrichment analysis revealed no skewness in significant GO categories identified among differentially expressed genes by edgeR vs DESeq2. As transcriptome analysis of archived FFPE samples becomes a vanguard of precision medicine, identification and fine-tuning of bioinformatics tools becomes critical for clinical research. Our results indicate that STAR and edgeR are well-suited tools for differential gene expression analysis from FFPE samples.


GigaScience ◽  
2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Holly C Beale ◽  
Jacquelyn M Roger ◽  
Matthew A Cattle ◽  
Liam T McKay ◽  
Drew K A Thompson ◽  
...  

Abstract Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.


2016 ◽  
Vol 6 (3) ◽  
pp. 144 ◽  
Author(s):  
Takuya Yamane ◽  
Miyuki Kozuka ◽  
Yoshio Yamamoto ◽  
Yoshihisa Nakano ◽  
Takenori Nakagaki ◽  
...  

Background: Aronia berries have many potential effects on health, including an antioxidant effect, effect for antimutagenesis, hepatoprotection and cardioprotection, an antidiabetic effect and inhibition of cancer cell proliferation. Previous human studies have shown that aronia juice may be useful for treatment of obesity disorders.Objective: To reveal relationship between beneficial effect and the gene expression change by aronia berries, we analyzed mice livers using RNA sequencing and RT-qPCR.Method: At 28 days after starting a normal diet, a high fat diet and a high-fat diet containing 10% freeze-dried aronia berries, serum was obtained from veins of mice after isoflurane anesthesia, and liver tissues were isolated and weighed. Triglyceride, total cholesterol and LDL cholesterol levels were measured and total RNAs were extracted. cDNA libraries were prepared according to Illumina protocols and sequenced using an Illumina HiSeq2500 to perform 100 paired-end sequencing. RNA-sequence reads mapping was performed using a DNA nexus. Gene expression analysis was performed. The liver tissue specimens were fixed and embedded in paraffin. After 5-mm-thick paraffin sections had been cut, they were stained with hematoxylin-eosin using the standard procedure and also with Sirius Red.Results: In this study, we found that mild fibrosis induced by a high-fat diet was reduced in livers of mice fed a high-fat diet containing aronia berries. RNA sequencing and RT-qPCR analyses revealed that gene expression levels of Igfbp1 and Gadd45g were increased in livers from mice fed a high-fat diet containing aronia berries. Furthermore, results of an enzyme-linked immunoassay showed that a secreted protein levels of FABP1 and FABP4 were reduced in serum from mice fed a high-fat diet containing aronia berries. The results suggest that aronia berries have beneficial effects on mild fibrosis in liver.Conclusion: Aronia berries have a beneficial effect on liver fibrosis. The recovery from liver fibrosis is associated with expression levels of Gadd45g and Igfbp1 in the liver. The beneficial effects of aronia berries on liver fibrosis reduce the risk of liver cancer diseases and insulin resistance, resulting in reduction of serum FABP1 and FABP4 levels.Keywords: aronia; fibrosis; liver; Igfbp1; Gadd45g


2014 ◽  
Vol 128 (10) ◽  
pp. 848-858 ◽  
Author(s):  
T J Ow ◽  
K Upadhyay ◽  
T J Belbin ◽  
M B Prystowsky ◽  
H Ostrer ◽  
...  

AbstractBackground:Advances in high-throughput molecular biology, genomics and epigenetics, coupled with exponential increases in computing power and data storage, have led to a new era in biological research and information. Bioinformatics, the discipline devoted to storing, analysing and interpreting large volumes of biological data, has become a crucial component of modern biomedical research. Research in otolaryngology has evolved along with these advances.Objectives:This review highlights several modern high-throughput research methods, and focuses on the bioinformatics principles necessary to carry out such studies. Several examples from recent literature pertinent to otolaryngology are provided. The review is divided into two parts; this first part discusses the bioinformatics approaches applied in nucleotide sequencing and gene expression analysis.Conclusion:This paper demonstrates how high-throughput nucleotide sequencing and transcriptomics are changing biology and medicine, and describes how these changes are affecting otorhinolaryngology. Sound bioinformatics approaches are required to obtain useful information from the vast new sources of data.


Sign in / Sign up

Export Citation Format

Share Document