scholarly journals Graphmap2 - splice-aware RNA-seq mapper for long reads

2019 ◽  
Author(s):  
Josip Marić ◽  
Ivan Sović ◽  
Krešimir Križanović ◽  
Niranjan Nagarajan ◽  
Mile Šikić

AbstractIn this paper we present Graphmap2, a splice-aware mapper built on our previously developed DNA mapper Graphmap. Graphmap2 is tailored for long reads produced by Pacific Biosciences and Oxford Nanopore devices. It uses several newly developed algorithms which enable higher precision and recall of correctly detected transcripts and exon boundaries. We compared its performance with the state-of-the-art tools Minimap2 and Gmap. On both simulated and real datasets Graphmap2 achieves higher mappability and more correctly recognized exons and their ends. In addition we present an analysis of potential of splice aware mappers and long reads for the identification of previously unknown isoforms and even genes. The Graphmap2 tool is publicly available at https://github.com/lbcb-sci/graphmap2.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner



2016 ◽  
Author(s):  
Clemens Messerschmidt ◽  
Manuel Holtgrewe ◽  
Dieter Beule

AbstractSummaryWe propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method Opti-Type. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. This approach is reliable for sample re-identification even for samples with high mutational loads, e.g., caused by microsatellite instability or POLE1 defects.Availability and ImplementationThe software is implemented In Python 3 and freely available under the MIT license at https://github.com/bihealth/hlama and via [email protected]



Author(s):  
Fabricio Almeida-Silva ◽  
Kanhu C Moharana ◽  
Thiago M Venancio

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.



F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 100 ◽  
Author(s):  
Jason L Weirather ◽  
Mariateresa de Cesare ◽  
Yunhao Wang ◽  
Paolo Piazza ◽  
Vittorio Sebastiano ◽  
...  

Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of PacBio, ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies.



2019 ◽  
Author(s):  
David Pellow ◽  
Itzik Mizrahi ◽  
Ron Shamir

AbstractBackgroundMany bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice.ResultsWe present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, while using less time and memory.ConclusionsPlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available from: https://github.com/Shamir-Lab/PlasClass



2013 ◽  
Vol 10 (12) ◽  
pp. 1165-1166 ◽  
Author(s):  
Ian Korf
Keyword(s):  


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 100 ◽  
Author(s):  
Jason L Weirather ◽  
Mariateresa de Cesare ◽  
Yunhao Wang ◽  
Paolo Piazza ◽  
Vittorio Sebastiano ◽  
...  

Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies.



2019 ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

AbstractGenome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability Package managerhttps://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner



2021 ◽  
Author(s):  
Brandon D. Pickett ◽  
Jessica R. Glass ◽  
Perry G. Ridge ◽  
John S. K. Kauwe

ABSTRACTCaranx ignobilis, commonly known as the kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted heavily throughout its tropical and subtropical range in the Indian and Pacific Oceans, and it has drawn significant interest in aquaculture due to an unusual tolerance for freshwater. In this study, we present a high-quality nuclear genome assembly of a C. ignobilis individual from Hawaiian waters, which have recently been shown to host a genetically distinct population. The assembly has a contig NG50 of 7.3Mbp and scaffold NG50 of 46.3Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present the raw Pacific Biosciences continuous long-reads from which the assembly was created. A Hi-C dataset (Dovetail Genomics Omni-C) and Illumina-based RNA-seq from eight tissues are also presented; the latter of which can be particularly useful for annotation and studies of freshwater tolerance. Overall, this genome assembly and supporting data is a valuable tool for ecological and comparative genomics studies of kingfish and other carangoid fishes.



2021 ◽  
Author(s):  
Nathalie Lehmann ◽  
Sandrine Perrin ◽  
Claire Wallon ◽  
Xavier Bauquet ◽  
Vivien Deshaies ◽  
...  

Motivation: Core sequencing facilities produce huge amounts of sequencing data that need to be analysed with automated workflows to ensure reproducibility and traceability. Eoulsan is a versatile open-source workflow engine meeting the needs of core facilities, by automating the analysis of a large number of samples. Its core design separates the description of the workflow from the actual commands to be run. This originality simplifies its usage as the user does not need to handle code, while ensuring reproducibility. Eoulsan was initially developed for bulk RNA-seq data, but the transcriptomics applications have recently widened with the advent of long-read sequencing and single-cell technologies, calling for the development of new workflows. Result: We present Eoulsan 2, a major update that (i) enhances the workflow manager itself, (ii) facilitates the development of new modules, and (iii) expands its applications to long reads RNA-seq (Oxford Nanopore Technologies) and scRNA-seq (Smart-seq2 and 10x Genomics). The workflow manager has been rewritten, with support for execution on a larger choice of computational infrastructure (workstations, Hadoop clusters, and various job schedulers for cluster usage). Eoulsan now facilitates the development of new modules, by reusing wrappers developed for the Galaxy platform, with support for container images (Docker or Singularity) packaging tools to execute. Finally, Eoulsan natively integrates novel modules for bulk RNA-seq, as well as others specifically designed for processing long read RNA-seq and scRNA-seq. Eoulsan 2 is distributed with ready-to-use workflows and companion tutorials. Availability and implementation: Eoulsan is implemented in Java, supported on Linux systems and distributed under the LGPL and CeCILL-C licenses at: http://outils.genomique.biologie.ens.fr/eoulsan/. The source code and sample workflows are available on GitHub: https://github.com/GenomicParisCentre/eoulsan. A GitHub repository for modules using the Galaxy tool XML syntax is further provided at: https://github.com/GenomicParisCentre/eoulsan-tools



Sign in / Sign up

Export Citation Format

Share Document