short reads
Recently Published Documents


TOTAL DOCUMENTS

305
(FIVE YEARS 108)

H-INDEX

36
(FIVE YEARS 8)

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Nadia M. Davidson ◽  
Ying Chen ◽  
Teresa Sadras ◽  
Georgina L. Ryland ◽  
Piers Blombery ◽  
...  

AbstractIn cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at https://github.com/Oshlack/JAFFA/wiki.


2022 ◽  
Author(s):  
Karl Johan Westrin ◽  
Warren W Kretzschmar ◽  
Olof Emanuelsson

Motivation: Transcriptome assembly from RNA sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms. Result: We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome. Availability and implementation: The code and usage instructions are available at https://github.com/karljohanw/clustrast.


Author(s):  
Steven O. Sewe ◽  
Gonçalo Silva ◽  
Paulo Sicat ◽  
Susan E. Seal ◽  
Paul Visendi
Keyword(s):  
Rna Seq ◽  

2021 ◽  
Author(s):  
Alaina Shumate ◽  
Brandon Wong ◽  
Geo Pertea ◽  
Mihaela Pertea

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are unable to span multiple exons. Long-read technology can capture full-length transcripts, but its high error rate often leads to mis-identified splice sites, and its low throughput makes quantification difficult. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus,and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.


2021 ◽  
Vol 12 ◽  
Author(s):  
Panpan Zhang ◽  
Haoran Peng ◽  
Christel Llauro ◽  
Etienne Bucher ◽  
Marie Mirouze

Extrachromosomal circular DNA (eccDNA) has been observed in different species for decades, and more and more evidence shows that this specific type of DNA molecules may play an important role in rapid adaptation. Therefore, characterizing the full landscape of eccDNA has become critical, and there are several protocols for enriching eccDNAs and performing short-read or long-read sequencing. However, there is currently no available bioinformatic tool to identify eccDNAs from Nanopore reads. More importantly, the current tools based on Illumina short reads lack an efficient standardized pipeline notably to identify eccDNA originating from repeated loci and cannot be applied to very large genomes. Here, we introduce a comprehensive tool to solve both of these two issues.1 Applying ecc_finder to eccDNA-seq data (either mobilome-seq, Circle-Seq and CIDER-seq) from Arabidopsis, human, and wheat (with genome sizes ranging from 120Mb to 17 Gb), we document the improvement of computational time, sensitivity, and accuracy and demonstrate ecc_finder wide applicability and functionality.


2021 ◽  
Author(s):  
Miquel Angel Schikora-Tamarit ◽  
Toni Gabaldon

Structural variants (SVs) like translocations, deletions, and other rearrangements underlie genetic and phenotypic variation. SVs are often overlooked due to difficult detection from short-read sequencing. Most algorithms yield low recall on humans, but the performance in other organisms is unclear. Similarly, despite remarkable differences across species genomes, most approaches use parameters optimized for humans. To overcome this and enable species-tailored approaches, we developed perSVade (personalized Structural Variation Detection), a pipeline that identifies SVs in a way that is optimized for any input sample. Starting from short reads, perSVade uses simulations on the reference genome to choose the best SV calling parameters. The output includes the optimally-called SVs and the accuracy, useful to assess the confidence in the results. In addition, perSVade can call small variants and copy-number variations. In summary, perSVade automatically identifies several types of genomic variation from short reads using sample-optimized parameters. We validated that perSVade increases the SV calling accuracy on simulated variants for six diverse eukaryotes, and on datasets of validated human variants. Importantly, we found no universal set of optimal parameters, which underscores the need for species-specific parameter optimization. PerSVade will improve our understanding about the role of SVs in non-human organisms.


2021 ◽  
Vol 10 (42) ◽  
Author(s):  
Yoshi Yamano ◽  
Sachiko Sugimoto ◽  
Katsuyoshi Matsunami

Here, we described the closed complete genome sequence of Actinoplanes sp. strain L3-i22, which was obtained from the assembly with long reads and subsequent polishing with short reads. The complete genome consists of a 12,014,766-bp chromosome, with a GC content of 71.4%, and contains no plasmids.


2021 ◽  
Author(s):  
Ryan R Wick ◽  
Kathryn E Holt

Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. In benchmarking tests using both simulated and real reads, we find that Polypolish performs well, and the best results are achieved by using Polypolish in combination with other short-read polishers.


2021 ◽  
Vol 22 (18) ◽  
pp. 9842
Author(s):  
Zheng-Shan He ◽  
Andan Zhu ◽  
Jun-Bo Yang ◽  
Weishu Fan ◽  
De-Zhu Li

Posttranscriptional modifications, including intron splicing and RNA editing, are common processes during regulation of gene expression in plant organelle genomes. However, the intermediate products of intron-splicing, and the interplay between intron-splicing and RNA-editing were not well studied. Most organelle transcriptome analyses were based on the Illumina short reads which were unable to capture the full spectrum of transcript intermediates within an organelle. To fully investigate the intermediates during intron splicing and the underlying relationships with RNA editing, we used PacBio DNA-seq and Iso-seq, together with Illumina short reads genome and transcriptome sequencing data to assemble the chloroplast and mitochondrial genomes of Nymphaea ‘Joey Tomocik’ and analyze their posttranscriptional features. With the direct evidence from Iso-seq, multiple intermediates partially or fully intron-spliced were observed, and we also found that both cis- and trans-splicing introns were spliced randomly. Moreover, by using rRNA-depleted and non-Oligo(dT)-enrichment strand-specific RNA-seq data and combining direct SNP-calling and transcript-mapping methods, we identified 98 and 865 RNA-editing sites in the plastome and mitogenome of N. ‘Joey Tomocik’, respectively. The target codon preference, the tendency of increasing protein hydrophobicity, and the bias distribution of editing sites are similar in both organelles, suggesting their common evolutionary origin and shared editing machinery. The distribution of RNA editing sites also implies that the RNA editing sites in the intron and exon regions may splice synchronously, except those exonic sites adjacent to intron which could only be edited after being intron-spliced. Our study provides solid evidence for the multiple intermediates co-existing during intron-splicing and their interplay with RNA editing in organelle genomes of a basal angiosperm.


2021 ◽  
Author(s):  
Ridvan Eksi ◽  
Daiyao Yi ◽  
Hongyang Li ◽  
Bradley Godfrey ◽  
Lisa R. Mathew ◽  
...  

AbstractStudying isoform expression at the microscopic level has always been a challenging task. A classical example is kidney, where glomerular and tubulo-insterstitial compartments carry out drastically different physiological functions and thus presumably their isoform expression also differs. We aim at developing an experimental and computational pipeline for identifying isoforms at microscopic structure-level. We microdissed glomerular and tubulo-interstitial compartments from healthy human kidney tissues from two cohorts. The two compartments were separately sequenced with the PacBio RS II platform. These transcripts were then validated using transcripts of the same samples by the traditional Illumina RNA-Seq protocol, distinct Illumina RNA-Seq short reads from European Renal cDNA Bank (ERCB) samples, and annotated GENCODE transcript list, thus identifying novel transcripts. We identified 14,739 and 14,259 annotated transcripts, and 17,268 and 13,118 potentially novel transcripts in the glomerular and tubulo-interstitial compartments, respectively. Of note, relying solely on either short or long reads would have resulted in many erroneous identifications. We identified distinct pathways involved in glomerular and tubulointerstitial compartments at the isoform level.We demonstrated the possibility of micro-dissecting a tissue, incorporating both long- and short-read sequencing to identify isoforms for each compartment.


Sign in / Sign up

Export Citation Format

Share Document