transcript assembly
Recently Published Documents


TOTAL DOCUMENTS

48
(FIVE YEARS 25)

H-INDEX

10
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Michael A Schon ◽  
Stefan Lutzmayer ◽  
Falko Hofmann ◽  
Michael D Nodine

Accurate annotation of transcript isoforms is crucial for functional genomics research, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data are imprecise. We developed a generalized transcript assembly framework called Bookend that incorporates data from multiple modes of RNA-seq, with a focus on identifying, labeling, and deconvoluting RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correctly modeling transcript start and end sites is essential for precise transcript assembly. Furthermore, we discover that reads from full-length single-cell RNA-seq (scRNA-seq) methods are sparsely end-labeled, and that these ends are sufficient to dramatically improve precision of assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq in the model plant Arabidopsis and meta-assembly of single mouse embryonic stem cells (mESCs) are both capable of producing tissue-specific end-to-end transcript annotations of comparable or superior quality to existing reference isoforms.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

AbstractGenes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.


2021 ◽  
Author(s):  
Qimin Zhang ◽  
Qian Shi ◽  
Mingfu Shao

AbstractTranscript assembly (i.e., to reconstruct the full-length expressed transcripts from RNA-seq data) has been a critical but yet unsolved step in RNA-seq analysis. Modern RNA-seq protocols can produce paired-/multiple-end RNA-seq reads, where information is available that two or more reads originate from the same transcript. The long-range constraints implied in these paired-/multiple-end reads can be much beneficial in correctly phasing the complicated spliced isoforms. However, there often exist gaps among individual ends, which may even contain junctions, making the efficient use of such constraints algorithmically challenging. Here we introduce Scallop2, a new reference-based transcript assembler optimized for multiple-end (including paired-end) RNA-seq data. Scallop2 uses an algorithmic frame-work that first represents reads from the same molecule as the so-called multiple-end phasing paths in the context of a splice graph, then “bridges” each multiple-end phasing path into a long, single-end phasing path, and finally decomposes the splice graph into paths (i.e., transcripts) guided by the bridged phasing paths. An efficient bridging algorithm is designed to infer the true path connecting two consecutive ends following a novel formulation that is robust to sequencing errors and transcript noises. By observing that failing to bridge two ends is mainly due to incomplete splice graphs, we propose a new method to determine false starting/ending vertices of the splice graphs which has been showed efficient in reducing false positive rate. Evaluations on both (multiple-end) single-cell RNA-seq datasets from Smart-seq3 protocol and Illumina paired-end RNA-seq samples demonstrate that Scallop2 vastly outperforms recent assemblers including StringTie2, Scallop, and CLASS2 in assembly accuracy.


2021 ◽  
Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

Abstract Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes and produces subgenomic RNAs that express different viral genes. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLY problem of finding transcripts T and their abundances c given an alignment R of paired end short reads under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient progressive heuristic that uses mixed integer linear program. We show using simulations that our method, JUMPER, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.


2021 ◽  
Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

AbstractGenes in SARS-CoV-2 and, more generally, in viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes, rendering current transcript assembly methods unsuitable to Nidovirales sequencing samples. Here, we introduce the Discontinuous Transcript Assembly problem of finding transcripts and their abundances c given an alignment under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient mixed integer linear program. We show using simulations that our method, Jumper, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1 and SARS-CoV-2 samples, we find that Jumper not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Jumper enables detailed analyses of Nidovirales transcriptomes.Code availabilitySoftware is available at https://github.com/elkebir-group/Jumper


2020 ◽  
Author(s):  
Frenzee Kroeizha L. Pammit ◽  
Anand Noel C. Manohar ◽  
Darlon V. Lantican ◽  
Roanne R. Gardoce ◽  
Hayde F. Galvez

AbstractIn the Philippines, 26% of the total agricultural land is devoted to coconut production making coconut one of the most valuable industrial crop in the country. However, the country’s multimillion-dollar coconut industry is threatened by the outbreak of coconut-scale insect (CSI) and other re-emerging insect pests promoting national research institutes to work jointly on developing new tolerant coconut varieties. Here, we report the cloning and characterization of coronatine-insensitive 1 (COI1) gene, one of the candidate insect defense genes, using ‘Catigan Green Dwarf’ (CATD) genome sequence assembly as reference. Two (2) splicing variants were identified and annotated – CnCOI1b-1 and CnCOI1b-2. The full-length cDNA of CnCOI1b-1 was 7,919 bp with an ORF of 1,176 bp encoding for a deduced protein of 391 amino acids while CnCOI1b-2 has 2,360 bp full-length cDNA with an ORF of 1,743 bp encoding a deduced protein of 580 amino acids. The 3D structural model for the two (2) isoforms were generated through homology modelling. Functional analysis revealed that both isoforms are involved in various physiological and developmental plant processes including defense response of plants to insects and pathogens. Phylogenetic analysis confirms high degree of COI1 protein conservation during evolution, especially among monocot species.Key MessageThis paper reports the molecular cloning and characterization of corononatine-insensitive I (COI1) gene in coconut using reference-guided transcript assembly approach. As a well-known insect defense-response gene in other crops, the results of this study are expected to assist in the development of new resistant coconut varieties as one of the strategies to address threats in coconut production.


2020 ◽  
Author(s):  
Dyfed Lloyd Evans

AbstractOrphan species that are evolutionarily distant from their closest sequenced/assembled neighbour provide a significant challenge in terms of gene or transcript assembly for functional analysis. This is because 30% sequence divergence from the closest available reference sequence means that, even with a complete genome or transcriptome sequence, mapping-based or reference-based approaches to gene assembly and gene identification break down.A new approach is required for reference-guided gene and transcript assembly in such orphan species, or species that are evolutionarily very divergent from their closest relatives. When annotating genes, the protein sequence is often preferred as it diverges less than the DNA/RNA sequence and it is often simpler to find meaningful homology at the protein level. This greater conservation of protein sequence across evolutionary time also makes proteins a prime candidate for use as the basis for sequence assembly. A protein-based pipeline was developed for transcript assembly between distantly related species. This was tested on three evolutionarily divergent species with little sequence information available for them and for which the closest genome representatives were at least 40 million years divergent as well as one species (Azolla filiculoides) for which a genome assembly is available. All the species have the potential to be weeds and herbicide targets were chosen as functional genes, whilst low copy number genes were chosen for evolutionary studies. Transcriptomic sequences were assembled using a bait and assemble strategy and final assemblies were verified by direct sequencing.


Animal Gene ◽  
2020 ◽  
Vol 17-18 ◽  
pp. 200105
Author(s):  
Brittney N. Keel ◽  
William T. Oliver ◽  
John W. Keele ◽  
Amanda K. Lindholm-Perry

Author(s):  
David S Kang ◽  
Sungshil Kim ◽  
Michael A Cotten ◽  
Cheolho Sim

Abstract The taxonomy of Culex pipiens complex of mosquitoes is still debated, but in North America it is generally regarded to include Culex pipiens pipiens, Culex pipiens molestus, and Culex quinquefasciatus (or Culex pipiens quinquefasciatus). Although these mosquitoes have very similar morphometry, they each have unique life strategies specifically adapted to their ecological niche. Differences include the capability for overwintering diapause, bloodmeal preference, mating behaviors, and reliance on blood meals to produce eggs. Here, we used RNA-seq transcriptome analysis to investigate the differential gene expression and nucleotide polymorphisms that may link to the divergent traits specifically between Cx. pipiens pipiens and Cx. pipiens molestus.


Sign in / Sign up

Export Citation Format

Share Document