genome assembler Latest Research Papers

Abstract Background Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, “hybrid” methods that integrate short and long read data have been devised to address this need. Results is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of , entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. Conclusions is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. Availability The prototype is available at https://github.com/TGatter/LazyB.

Download Full-text

ESCA pipeline: Easy-to-use SARS-CoV-2 genome Assembler

10.1101/2021.05.21.445156 ◽

2021 ◽

Author(s):

Martina Rueca ◽

Emanuela Giombini ◽

Francesco Messina ◽

Barbara Bartolini ◽

Antonino Di Caro ◽

...

Keyword(s):

Amino Acid ◽

Genome Assembly ◽

Global Level ◽

Sequencing Data ◽

High Quality ◽

Rapid Succession ◽

Novel Variants ◽

Low Coverage ◽

High Quality Genome ◽

Genome Assembler

Early sequencing and quick analysis of SARS-CoV-2 genome are contributing to un-derstand the dynamics of COVID19 epidemics and to countermeasures design at global level. Amplicon-based NGS methods are widely used to sequence the SARS-CoV-2 genome and to identify novel variants that are emerging in rapid succession, harboring multiple deletions and amino acid changing mutations. To facilitate the analysis of NGS sequencing data obtained from amplicon-based sequencing methods, here we propose an easy-to-use SARS-CoV-2 genome Assembler: the ESCA pipeline. Results showed that ESCA can perform high quality genome assembly from IonTor-rent and Illumina raw data, and help the user in easily correct low-coverage regions. Moreover, ESCA includes the possibility to compare assembled genomes of multi sample runs through an easy table format.

Download Full-text

Assembling Long Accurate Reads Using de Bruijn Graphs

10.1101/2020.12.10.420448 ◽

2020 ◽

Author(s):

Anton Bankevich ◽

Andrey Bzikadze ◽

Mikhail Kolmogorov ◽

Pavel A. Pevzner

Keyword(s):

Human Genome ◽

De Bruijn Graph ◽

High Fidelity ◽

High Quality ◽

De Bruijn Graphs ◽

String Graph ◽

De Bruijn ◽

Genome Assembler ◽

Large Genomes

AbstractAlthough the de Bruijn graphs represent the basis of many genome assemblers, it remains unclear how to construct these graphs for large genomes and large k-mer sizes. This algorithmic challenge has become particularly important with the emergence of long and accurate high-fidelity (HiFi) reads that were recently utilized to generate a semi-manual telomere-to-telomere assembly of the human genome using the alternative string graph assembly approach. To enable fully automated high-quality HiFi assemblies of various genomes, we developed an efficient jumboDB algorithm for constructing the de Bruijn graph for large genomes and large k-mer sizes and the LJA genome assembler that error-corrects HiFi reads and uses jumboDB to construct the de Bruijn graph on the error-corrected reads. Since the de Bruijn graph constructed for a fixed k-mer size is typically either too tangled or too fragmented, LJA uses a new concept of a multiplex de Bruijn graph with varying k-mer sizes. We demonstrate that LJA produces contiguous assemblies of complex repetitive regions in genomes including automated assemblies of various highly-repetitive human centromeres.

Download Full-text

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads

BMC Bioinformatics ◽

10.1186/s12859-020-03779-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Xingyu Liao ◽

Xin Gao ◽

Xiankai Zhang ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

High Frequency ◽

Structural Variation ◽

De Novo ◽

Repetitive Sequences ◽

Data Sets ◽

Sequence Coverage ◽

Coverage Ratio ◽

Next Generation Sequencing Ngs ◽

Genome Assembler ◽

Generation Sequencing

Abstract Background Repetitive sequences account for a large proportion of eukaryotes genomes. Identification of repetitive sequences plays a significant role in many applications, such as structural variation detection and genome assembly. Many existing de novo repeat identification pipelines or tools make use of assembly of the high-frequency k-mers to obtain repeats. However, a certain degree of sequence coverage is required for assemblers to get the desired assemblies. On the other hand, assemblers cut the reads into shorter k-mers for assembly, which may destroy the structure of the repetitive regions. For the above reasons, it is difficult to obtain complete and accurate repetitive regions in the genome by using existing tools. Results In this study, we present a new method called RepAHR for de novo repeat identification by assembly of the high-frequency reads. Firstly, RepAHR scans next-generation sequencing (NGS) reads to find the high-frequency k-mers. Secondly, RepAHR filters the high-frequency reads from whole NGS reads according to certain rules based on the high-frequency k-mer. Finally, the high-frequency reads are assembled to generate repeats by using SPAdes, which is considered as an outstanding genome assembler with NGS sequences. Conlusions We test RepAHR on five data sets, and the experimental results show that RepAHR outperforms RepARK and REPdenovo for detecting repeats in terms of N50, reference alignment ratio, coverage ratio of reference, mask ratio of Repbase and some other metrics.

Download Full-text

Raven: a de novo genome assembler for long reads

10.1101/2020.08.07.242461 ◽

2020 ◽

Cited By ~ 5

Author(s):

Robert Vaser ◽

Mile Šikić

Keyword(s):

Human Genome ◽

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly ◽

New Methods ◽

Long Reads ◽

Long Read ◽

Comparable Accuracy ◽

Genome Assembler ◽

Genome Dataset

We present new methods for the improvement of long-read de novo genome assembly incorporated into a straightforward tool called Raven (https://github.com/lbcb-sci/raven). Compared with other assemblers, Raven is one of two fastest, it reconstructs the sequenced genome in the least amount of fragments, has better or comparable accuracy, and maintains similar performance for various genomes. Raven takes 500 CPU hours to assemble a 44x human genome dataset in only 259 fragments.

Download Full-text

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

GigaScience ◽

10.1093/gigascience/giz100 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 52

Author(s):

Elena Bushmanova ◽

Dmitry Antipov ◽

Alla Lapidus ◽

Andrey D Prjibelski

Keyword(s):

Rna Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

The Novel ◽

Rna Seq ◽

De Novo Transcriptome ◽

Weak Points ◽

Transcriptome Reconstruction ◽

Evaluation Approaches ◽

Genome Assembler

Abstract Background The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.

Download Full-text

Yet another de novo genome assembler

2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA) ◽

10.1109/ispa.2019.8868909 ◽

2019 ◽

Cited By ~ 6

Author(s):

Robert Vaser ◽

Mile Sikic

Keyword(s):

De Novo ◽

Genome Assembler

Download Full-text

Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

PLoS ONE ◽

10.1371/journal.pone.0221858 ◽

2019 ◽

Vol 14 (8) ◽

pp. e0221858 ◽

Cited By ~ 2

Author(s):

Giltae Song ◽

Jongin Lee ◽

Juyeon Kim ◽

Seokwoo Kang ◽

Hoyong Lee ◽

...

Keyword(s):

De Novo ◽

Assembly Pipeline ◽

Genome Assembler ◽

Chromosome Level

Download Full-text

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

10.1101/420208 ◽

2018 ◽

Cited By ~ 13

Author(s):

Elena Bushmanova ◽

Dmitry Antipov ◽

Alla Lapidus ◽

Andrey D. Prjibelski

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Challenging Problem ◽

Rna Seq ◽

De Novo Transcriptome ◽

Weak Points ◽

Transcriptome Reconstruction ◽

Evaluation Approaches ◽

Genome Assembler

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.

Download Full-text

Complete mitochondrial genome sequence of the “copper moss” Mielichhoferia elongata reveals independent nad7 gene functionality loss

PeerJ ◽

10.7717/peerj.4350 ◽

2018 ◽

Vol 6 ◽

pp. e4350 ◽

Cited By ~ 9

Author(s):

Denis V. Goruynov ◽

Svetlana V. Goryunova ◽

Oxana I. Kuznetsova ◽

Maria D. Logacheva ◽

Irina A. Milyutina ◽

...

Keyword(s):

Mitochondrial Genome ◽

Complete Mitochondrial Genome ◽

Phylogenetic Reconstruction ◽

Data Sets ◽

Base Pairs ◽

Protein Coding ◽

Repeat Content ◽

Copper Moss ◽

Genome Assembler ◽

Simple Sequence

The mitochondrial genome of moss Mielichhoferia elongata has been sequenced and assembled with Spades genome assembler. It consists of 100,342 base pairs and has practically the same gene set and order as in other known bryophyte chondriomes. The genome contains 66 genes including three rRNAs, 24 tRNAs, and 40 conserved mitochondrial proteins genes. Unlike the majority of previously sequenced bryophyte mitogenomes, it lacks the functional nad7 gene. The phylogenetic reconstruction and scrutiny analysis of the primary structure of nad7 gene carried out in this study suggest its independent pseudogenization in different bryophyte lineages. Evaluation of the microsatellite (simple sequence repeat) content of the M. elongata mitochondrial genome indicates that it could be used as a tool in further studies as a phylogenetic marker. The strongly supported phylogenetic tree presented here, derived from 33 protein coding sequences of 40 bryophyte species, is consistent with other reconstructions based on a number of different data sets.

Download Full-text

genome assembler
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

LazyB: fast and cheap genome assembly

ESCA pipeline: Easy-to-use SARS-CoV-2 genome Assembler

Assembling Long Accurate Reads Using de Bruijn Graphs

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads

Raven: a de novo genome assembler for long reads

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

Yet another de novo genome assembler

Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

Complete mitochondrial genome sequence of the “copper moss” Mielichhoferia elongata reveals independent nad7 gene functionality loss

Export Citation Format

genome assemblerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

LazyB: fast and cheap genome assembly

ESCA pipeline: Easy-to-use SARS-CoV-2 genome Assembler

Assembling Long Accurate Reads Using de Bruijn Graphs

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads

Raven: a de novo genome assembler for long reads

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

Yet another de novo genome assembler

Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

Complete mitochondrial genome sequence of the “copper moss” Mielichhoferia elongata reveals independent nad7 gene functionality loss

genome assembler
Recently Published Documents