CoCo: RNA-seq read assignment correction for nested genes and multimapped reads

Gabrielle Deschamps-Francoeur; Vincent Boivin; Sherif Abou Elela; Michelle S Scott

doi:10.1093/bioinformatics/btz433

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads

Bioinformatics ◽

10.1093/bioinformatics/btz433 ◽

2019 ◽

Vol 35 (23) ◽

pp. 5039-5047 ◽

Cited By ~ 6

Author(s):

Gabrielle Deschamps-Francoeur ◽

Vincent Boivin ◽

Sherif Abou Elela ◽

Michelle S Scott

Keyword(s):

Supplementary Information ◽

Rna Seq ◽

Non Coding Rna ◽

Abundance Estimates ◽

Gene Coverage ◽

Nested Genes ◽

Quantification Accuracy ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome ◽

Generation Sequencing

Abstract Motivation Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. Results Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. Availability and implementation The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CoCo: RNA-seq Read Assignment Correction for Nested Genes and Multimapped Reads

10.1101/477869 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gabrielle Deschamps-Francoeur ◽

Vincent Boivin ◽

Sherif Abou Elela ◽

Michelle S Scott

Keyword(s):

Rna Seq ◽

Non Coding Rna ◽

Abundance Estimates ◽

Gene Coverage ◽

Nested Genes ◽

Quantification Accuracy ◽

Higher Eukaryotes ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome ◽

Generation Sequencing

AbstractMotivationNext generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage.ResultsHere we present CoCo, a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bed-graph comparisons.AvailabilityThe CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/[email protected]

Download Full-text

184 Whole-transcriptome analysis by RNA-Seq for genetic diagnosis of Mendelian skin disorders in the context of consanguinity

Journal of Investigative Dermatology ◽

10.1016/j.jid.2021.02.204 ◽

2021 ◽

Vol 141 (5) ◽

pp. S32

Author(s):

L. Youssefian ◽

A. Saeidian ◽

P. Fortina ◽

A. South ◽

J. Uitto ◽

...

Keyword(s):

Transcriptome Analysis ◽

Genetic Diagnosis ◽

Rna Seq ◽

Skin Disorders ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome

Download Full-text

Whole Transcriptome Analysis: Implication to Estrous Cycle Regulation

10.21203/rs.3.rs-292826/v1 ◽

2021 ◽

Author(s):

Xiaopeng An ◽

Yue Zhang ◽

Fu Li ◽

Zhanhang Wang ◽

Shaohua Yang ◽

...

Keyword(s):

Estrous Cycle ◽

Transcriptome Analysis ◽

Circular Rna ◽

Differentially Expressed ◽

Non Coding Rna ◽

Non Coding Rnas ◽

Whole Transcriptome Analysis ◽

Cycle Regulation ◽

Whole Transcriptome ◽

Goat Ovary

Abstract BackgroundEstrous cycle is one of female characteristics after sexual maturity, including estrus (ES) and diestrus (DS) stages. Estrous cycle is important in female physiology and its disorder may lead to diseases. In the latest years, effects of non-coding RNAs and mRNA on estrous cycle start to arouse much concern, however, a whole transcriptome analysis among non-coding RNAs and mRNA has not been reported.ResultsHere we report a whole transcriptome analysis of goat ovary in estrus and diestrus periods. Estrus synchronization was conducted to induce the estrus phase and on day 32, the goats naturally shifted into diestrus stage. The ovary RNA of estrus and diestrus stages was respectively collected to perform RNA-sequencing. Then the circular RNA; microRNA; long non-coding RNA; mRNA databases of goat ovary were acquired, and the differentially expressions between estrus and diestrus stages were screened to construct circRNA-miRNA-mRNA/lncRNA and lncRNA-miRNA/mRNA networks, thus providing potential pathways that involved in the regulation of estrous cycle. Differentially expressed mRNAs, such as MMP9, TIMP1, 3BHSD and PTGIS, and differentially expressed microRNAs, such as miR-21-3p，miR-202-3p and miR-223-3p, which play key roles in estrous cycle regulation were extracted from the network.ConclusionsOur data provided the miRNA, circRNA, lncRNA and mRNA databases of goat ovary and each differentially expressed profile between ES and DS. Networks among differentially expressed miRNAs, circRNAs, lncRNAs and mRNAs were constructed to provide valuable resources for the study of estrous cycle and related diseases.

Download Full-text

BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa051 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2705-2711 ◽

Cited By ~ 2

Author(s):

Gianvito Urgese ◽

Emanuele Parisi ◽

Orazio Scicolone ◽

Santa Di Cataldo ◽

Elisa Ficarra

Keyword(s):

Sequence Analysis ◽

Supplementary Information ◽

Sorting Algorithm ◽

Rna Seq ◽

Compact Sets ◽

Analysis Pipeline ◽

Alignment Algorithms ◽

External Sorting ◽

Computational Resources ◽

Generation Sequencing

Abstract Motivation High-throughput next-generation sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the 2-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times. Method BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single- or multi-sample datasets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state. Results Our extensive experiments on RNA-Seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billion of reads into 963 million of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least. Availability and implementation BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Comprehensive molecular insights into the stress response dynamics of rice (Oryza sativa L.) during rice tungro disease by RNA-seq-based comparative whole transcriptome analysis

Journal of Biosciences ◽

10.1007/s12038-020-9996-x ◽

2020 ◽

Vol 45 (1) ◽

Author(s):

Gaurav Kumar ◽

Indranil Dasgupta

Keyword(s):

Oryza Sativa ◽

Stress Response ◽

Transcriptome Analysis ◽

Oryza Sativa L ◽

Rna Seq ◽

Response Dynamics ◽

Rice Tungro Disease ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome

Download Full-text

Whole Transcriptome Analysis of Bleomycin(BLM) Induced Pulmonary Fibrosis(PF) of Rhesus Using RNA-seq

10.1183/1393003.congress-2017.pa920 ◽

2017 ◽

Author(s):

Lian Liu ◽

Zhicheng Yuan ◽

Tao Wang ◽

Dan Xu ◽

Lei Chen ◽

...

Keyword(s):

Pulmonary Fibrosis ◽

Transcriptome Analysis ◽

Rna Seq ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome

Download Full-text

Anti‐Fibrotic Effects of HDAC8 Inhibitor in Bleomycin‐Induced Pulmonary Fibrosis Mouse Model–Whole Transcriptome Analysis Using RNA‐seq

The FASEB Journal ◽

10.1096/fasebj.2019.33.1_supplement.474.12 ◽

2019 ◽

Vol 33 (S1) ◽

Author(s):

Shigeki Saito ◽

Yan Zhuang ◽

Joseph Lasky ◽

Yaozhong Liu

Keyword(s):

Pulmonary Fibrosis ◽

Mouse Model ◽

Transcriptome Analysis ◽

Rna Seq ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome

Download Full-text

Holistic optimization of an RNA-seq workflow for multi-threaded environments

Bioinformatics ◽

10.1093/bioinformatics/btz169 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4173-4175 ◽

Cited By ~ 3

Author(s):

Ling-Hong Hung ◽

Wes Lloyd ◽

Radhika Agumbe Sridhar ◽

Saranya Devi Athmalingam Ravishankar ◽

Yuguang Xiong ◽

...

Keyword(s):

Parallel Implementation ◽

Reference Sequence ◽

Supplementary Information ◽

Rna Seq ◽

Computationally Intensive ◽

Holistic Optimization ◽

Parallel Workflow ◽

Unique Molecular Identifier ◽

Generation Sequencing ◽

Alignment Step

Abstract Summary For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads. Availability and implementation Code (M.I.T. license), supporting scripts and Dockerfiles are available at https://github.com/BioDepot/LINCS_RNAseq_cpp and Docker images at https://hub.docker.com/r/biodepot/rnaseq-umi-cpp/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Abstract P5-07-01: Successful whole transcriptome analysis of 25-year-old breast tumor samples from the phase III trial SWOG-8814 by next generation sequencing (NGS): Standardized analytical methods for exploratory and validation studies

10.1158/1538-7445.sabcs15-p5-07-01 ◽

2016 ◽

Author(s):

DB Cherbavaz ◽

DF Hayes ◽

K Qu ◽

MR Crager ◽

WR Barlow ◽

...

Keyword(s):

Next Generation Sequencing ◽

Transcriptome Analysis ◽

Validation Studies ◽

Breast Tumor ◽

Phase Iii ◽

Phase Iii Trial ◽

Whole Transcriptome Analysis ◽

Next Generation Sequencing Ngs ◽

Whole Transcriptome ◽

Generation Sequencing

Download Full-text

Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research

Plant Molecular Biology ◽

10.1007/s11103-013-0025-4 ◽

2013 ◽

Vol 83 (1-2) ◽

pp. 77-87 ◽

Cited By ~ 38

Author(s):

Jiajia Xu ◽

Yuanyuan Li ◽

Xiuling Ma ◽

Jianfeng Ding ◽

Kai Wang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Transcriptome Analysis ◽

C4 Photosynthesis ◽

Setaria Viridis ◽

Next Generation ◽

Model Species ◽

Photosynthesis Research ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome ◽

Generation Sequencing

Download Full-text