transcript assembly Latest Research Papers

Accurate annotation of transcript isoforms is crucial for functional genomics research, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data are imprecise. We developed a generalized transcript assembly framework called Bookend that incorporates data from multiple modes of RNA-seq, with a focus on identifying, labeling, and deconvoluting RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correctly modeling transcript start and end sites is essential for precise transcript assembly. Furthermore, we discover that reads from full-length single-cell RNA-seq (scRNA-seq) methods are sparsely end-labeled, and that these ends are sufficient to dramatically improve precision of assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq in the model plant Arabidopsis and meta-assembly of single mouse embryonic stem cells (mESCs) are both capable of producing tissue-specific end-to-end transcript annotations of comparable or superior quality to existing reference isoforms.

Download Full-text

Jumper enables discontinuous transcript assembly in coronaviruses

Nature Communications ◽

10.1038/s41467-021-26944-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Palash Sashittal ◽

Chuanyi Zhang ◽

Jian Peng ◽

Mohammed El-Kebir

Keyword(s):

Alternative Splicing ◽

Maximum Likelihood ◽

Rna Polymerase ◽

Drug Response ◽

Transcript Level ◽

Rna Dependent Rna Polymerase ◽

Short Read ◽

Likelihood Model ◽

Discontinuous Transcription ◽

Transcript Assembly

AbstractGenes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.

Download Full-text

Scallop2 enables accurate assembly of multiple-end RNA-seq data

10.1101/2021.09.03.458862 ◽

2021 ◽

Author(s):

Qimin Zhang ◽

Qian Shi ◽

Mingfu Shao

Keyword(s):

False Positive ◽

False Positive Rate ◽

Rna Seq ◽

Sequencing Errors ◽

True Path ◽

Assembly Accuracy ◽

Splice Graph ◽

Positive Rate ◽

Frame Work ◽

Transcript Assembly

AbstractTranscript assembly (i.e., to reconstruct the full-length expressed transcripts from RNA-seq data) has been a critical but yet unsolved step in RNA-seq analysis. Modern RNA-seq protocols can produce paired-/multiple-end RNA-seq reads, where information is available that two or more reads originate from the same transcript. The long-range constraints implied in these paired-/multiple-end reads can be much beneficial in correctly phasing the complicated spliced isoforms. However, there often exist gaps among individual ends, which may even contain junctions, making the efficient use of such constraints algorithmically challenging. Here we introduce Scallop2, a new reference-based transcript assembler optimized for multiple-end (including paired-end) RNA-seq data. Scallop2 uses an algorithmic frame-work that first represents reads from the same molecule as the so-called multiple-end phasing paths in the context of a splice graph, then “bridges” each multiple-end phasing path into a long, single-end phasing path, and finally decomposes the splice graph into paths (i.e., transcripts) guided by the bridged phasing paths. An efficient bridging algorithm is designed to infer the true path connecting two consecutive ends following a novel formulation that is robust to sequencing errors and transcript noises. By observing that failing to bridge two ends is mainly due to incomplete splice graphs, we propose a new method to determine false starting/ending vertices of the splice graphs which has been showed efficient in reducing false positive rate. Evaluations on both (multiple-end) single-cell RNA-seq datasets from Smart-seq3 protocol and Illumina paired-end RNA-seq samples demonstrate that Scallop2 vastly outperforms recent assemblers including StringTie2, Scallop, and CLASS2 in assembly accuracy.

Download Full-text

JUMPER Enables Discontinuous Transcript Assembly in Coronaviruses

10.21203/rs.3.rs-600334/v1 ◽

2021 ◽

Author(s):

Palash Sashittal ◽

Chuanyi Zhang ◽

Jian Peng ◽

Mohammed El-Kebir

Keyword(s):

Alternative Splicing ◽

Hamiltonian Path ◽

Transcript Level ◽

Mixed Integer ◽

Mixed Integer Linear Program ◽

Splice Graph ◽

Long Read ◽

Likelihood Model ◽

Characterization Of Solutions ◽

Transcript Assembly

Abstract Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes and produces subgenomic RNAs that express different viral genes. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLY problem of finding transcripts T and their abundances c given an alignment R of paired end short reads under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient progressive heuristic that uses mixed integer linear program. We show using simulations that our method, JUMPER, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.

Download Full-text

Jumper Enables Discontinuous Transcript Assembly in Coronaviruses

10.1101/2021.02.12.431026 ◽

2021 ◽

Author(s):

Palash Sashittal ◽

Chuanyi Zhang ◽

Jian Peng ◽

Mohammed El-Kebir

Keyword(s):

Alternative Splicing ◽

Hamiltonian Path ◽

Mixed Integer ◽

Mixed Integer Linear Program ◽

Splice Graph ◽

Long Read ◽

Likelihood Model ◽

Characterization Of Solutions ◽

Conserved Core ◽

Transcript Assembly

AbstractGenes in SARS-CoV-2 and, more generally, in viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes, rendering current transcript assembly methods unsuitable to Nidovirales sequencing samples. Here, we introduce the Discontinuous Transcript Assembly problem of finding transcripts and their abundances c given an alignment under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient mixed integer linear program. We show using simulations that our method, Jumper, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1 and SARS-CoV-2 samples, we find that Jumper not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Jumper enables detailed analyses of Nidovirales transcriptomes.Code availabilitySoftware is available at https://github.com/elkebir-group/Jumper

Download Full-text

Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data

Genome Research ◽

10.1101/gr.265173.120 ◽

2020 ◽

Vol 31 (1) ◽

pp. 88-100

Author(s):

Wanqing Shao ◽

Ting Wang

Keyword(s):

Transposable Elements ◽

Single Cell ◽

Rna Seq ◽

Transcript Assembly ◽

Expression Quantification

Download Full-text

Reference-Aided Full-length Transcript Assembly, cDNA Cloning, and Molecular Characterization of Coronatine-insensitive 1b (COI1b) Gene in Coconut (Cocos nucifera L.)

10.1101/2020.11.23.395202 ◽

2020 ◽

Author(s):

Frenzee Kroeizha L. Pammit ◽

Anand Noel C. Manohar ◽

Darlon V. Lantican ◽

Roanne R. Gardoce ◽

Hayde F. Galvez

Keyword(s):

Amino Acids ◽

Defense Response ◽

Homology Modelling ◽

Cocos Nucifera ◽

The Philippines ◽

Full Length ◽

Full Length Cdna ◽

Insect Defense ◽

Transcript Assembly

AbstractIn the Philippines, 26% of the total agricultural land is devoted to coconut production making coconut one of the most valuable industrial crop in the country. However, the country’s multimillion-dollar coconut industry is threatened by the outbreak of coconut-scale insect (CSI) and other re-emerging insect pests promoting national research institutes to work jointly on developing new tolerant coconut varieties. Here, we report the cloning and characterization of coronatine-insensitive 1 (COI1) gene, one of the candidate insect defense genes, using ‘Catigan Green Dwarf’ (CATD) genome sequence assembly as reference. Two (2) splicing variants were identified and annotated – CnCOI1b-1 and CnCOI1b-2. The full-length cDNA of CnCOI1b-1 was 7,919 bp with an ORF of 1,176 bp encoding for a deduced protein of 391 amino acids while CnCOI1b-2 has 2,360 bp full-length cDNA with an ORF of 1,743 bp encoding a deduced protein of 580 amino acids. The 3D structural model for the two (2) isoforms were generated through homology modelling. Functional analysis revealed that both isoforms are involved in various physiological and developmental plant processes including defense response of plants to insects and pathogens. Phylogenetic analysis confirms high degree of COI1 protein conservation during evolution, especially among monocot species.Key MessageThis paper reports the molecular cloning and characterization of corononatine-insensitive I (COI1) gene in coconut using reference-guided transcript assembly approach. As a well-known insect defense-response gene in other crops, the results of this study are expected to assist in the development of new resistant coconut varieties as one of the strategies to address threats in coconut production.

Download Full-text

Combining protein-based transcriptome assembly, and efficient MinION long read sequencing for targeted transcript sequencing in orphan species. Validation on herbicide targets and low copy number genes in Gymnosperms, Juncaceae and Pteridophyta

10.1101/2020.10.24.353441 ◽

2020 ◽

Author(s):

Dyfed Lloyd Evans

Keyword(s):

Protein Sequence ◽

Copy Number ◽

Transcriptome Assembly ◽

Direct Sequencing ◽

Reference Sequence ◽

Sequence Information ◽

Azolla Filiculoides ◽

A Genome ◽

Low Copy Number ◽

Transcript Assembly

AbstractOrphan species that are evolutionarily distant from their closest sequenced/assembled neighbour provide a significant challenge in terms of gene or transcript assembly for functional analysis. This is because 30% sequence divergence from the closest available reference sequence means that, even with a complete genome or transcriptome sequence, mapping-based or reference-based approaches to gene assembly and gene identification break down.A new approach is required for reference-guided gene and transcript assembly in such orphan species, or species that are evolutionarily very divergent from their closest relatives. When annotating genes, the protein sequence is often preferred as it diverges less than the DNA/RNA sequence and it is often simpler to find meaningful homology at the protein level. This greater conservation of protein sequence across evolutionary time also makes proteins a prime candidate for use as the basis for sequence assembly. A protein-based pipeline was developed for transcript assembly between distantly related species. This was tested on three evolutionarily divergent species with little sequence information available for them and for which the closest genome representatives were at least 40 million years divergent as well as one species (Azolla filiculoides) for which a genome assembly is available. All the species have the potential to be weeds and herbicide targets were chosen as functional genes, whilst low copy number genes were chosen for evolutionary studies. Transcriptomic sequences were assembled using a bait and assemble strategy and final assemblies were verified by direct sequencing.

Download Full-text

Evaluation of transcript assembly in multiple porcine tissues suggests optimal sequencing depth for RNA-Seq using total RNA library

Animal Gene ◽

10.1016/j.angen.2020.200105 ◽

2020 ◽

Vol 17-18 ◽

pp. 200105

Author(s):

Brittney N. Keel ◽

William T. Oliver ◽

John W. Keele ◽

Amanda K. Lindholm-Perry

Keyword(s):

Sequencing Depth ◽

Rna Seq ◽

Total Rna ◽

Optimal Sequencing ◽

Transcript Assembly ◽

Porcine Tissues

Download Full-text

Transcript Assembly and Quantification by RNA-Seq Reveals Significant Differences in Gene Expression and Genetic Variants in Mosquitoes of the Culex pipiens (Diptera: Culicidae) Complex

Journal of Medical Entomology ◽

10.1093/jme/tjaa167 ◽

2020 ◽

Author(s):

David S Kang ◽

Sungshil Kim ◽

Michael A Cotten ◽

Cheolho Sim

Keyword(s):

Gene Expression ◽

Culex Pipiens ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Culex Pipiens Quinquefasciatus ◽

Niche Differences ◽

Differential Gene ◽

Culex Pipiens Molestus ◽

Transcript Assembly ◽

Blood Meals

Abstract The taxonomy of Culex pipiens complex of mosquitoes is still debated, but in North America it is generally regarded to include Culex pipiens pipiens, Culex pipiens molestus, and Culex quinquefasciatus (or Culex pipiens quinquefasciatus). Although these mosquitoes have very similar morphometry, they each have unique life strategies specifically adapted to their ecological niche. Differences include the capability for overwintering diapause, bloodmeal preference, mating behaviors, and reliance on blood meals to produce eggs. Here, we used RNA-seq transcriptome analysis to investigate the differential gene expression and nucleotide polymorphisms that may link to the divergent traits specifically between Cx. pipiens pipiens and Cx. pipiens molestus.

Download Full-text

transcript assembly
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Precise Transcript Reconstruction with End-Guided Assembly

Jumper enables discontinuous transcript assembly in coronaviruses

Scallop2 enables accurate assembly of multiple-end RNA-seq data

JUMPER Enables Discontinuous Transcript Assembly in Coronaviruses

Jumper Enables Discontinuous Transcript Assembly in Coronaviruses

Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data

Reference-Aided Full-length Transcript Assembly, cDNA Cloning, and Molecular Characterization of Coronatine-insensitive 1b (COI1b) Gene in Coconut (Cocos nucifera L.)

Combining protein-based transcriptome assembly, and efficient MinION long read sequencing for targeted transcript sequencing in orphan species. Validation on herbicide targets and low copy number genes in Gymnosperms, Juncaceae and Pteridophyta

Evaluation of transcript assembly in multiple porcine tissues suggests optimal sequencing depth for RNA-Seq using total RNA library

Transcript Assembly and Quantification by RNA-Seq Reveals Significant Differences in Gene Expression and Genetic Variants in Mosquitoes of the Culex pipiens (Diptera: Culicidae) Complex

Export Citation Format

transcript assemblyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Precise Transcript Reconstruction with End-Guided Assembly

Jumper enables discontinuous transcript assembly in coronaviruses

Scallop2 enables accurate assembly of multiple-end RNA-seq data

JUMPER Enables Discontinuous Transcript Assembly in Coronaviruses

Jumper Enables Discontinuous Transcript Assembly in Coronaviruses

Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data

Reference-Aided Full-length Transcript Assembly, cDNA Cloning, and Molecular Characterization of Coronatine-insensitive 1b (COI1b) Gene in Coconut (Cocos nucifera L.)

Combining protein-based transcriptome assembly, and efficient MinION long read sequencing for targeted transcript sequencing in orphan species. Validation on herbicide targets and low copy number genes in Gymnosperms, Juncaceae and Pteridophyta

Evaluation of transcript assembly in multiple porcine tissues suggests optimal sequencing depth for RNA-Seq using total RNA library

Transcript Assembly and Quantification by RNA-Seq Reveals Significant Differences in Gene Expression and Genetic Variants in Mosquitoes of the Culex pipiens (Diptera: Culicidae) Complex

transcript assembly
Recently Published Documents