scholarly journals NucMerge: Genome assembly quality improvement assisted by alternative assemblies and paired-end Illumina reads

2018 ◽  
Author(s):  
Ksenia Khelik ◽  
Alexander Johan Nederbragt ◽  
Geir Kjetil Sandve ◽  
Torbjørn Rognes

AbstractBackgroundIn spite of the major breakthroughs in the second-generation sequencing technologies and the developments of a plethora of assemblers over the last ten years, the resulting genome assemblies may still be fragmented and contain errors. It is typical in genome projects with second-generation reads involved to run multiple assemblers with different parameters and choose the best assembly. However, such an approach is always a trade-off between the strengths and weaknesses of the assemblies. To exploit the advantages of different assemblers, an alternative approach that combines the best parts of several assemblies into one may be applied. The existing tools based on such an approach assist in elongation of assembly fragments and/or improvement of assembly accuracy. Though there has been progress with such a strategy, there is still room for improvement of the existing tools.ResultsWe present NucMerge, a tool for improving genome assembly accuracy by incorporating information derived from an alternative assembly and paired-end Illumina reads from the same genome. The tool corrects insertion, deletion, substitution, and inversion errors and locates different inter- and intra-chromosomal rearrangement errors. NucMerge was compared to two existing alternatives, namely Metassembler and GAM-NGS.ConclusionsThe benchmarking results show that NucMerge has generally better performance than the other tools tested, providing accuracy improvement of more assemblies. NucMerge is freely available at https://github.com/uio-bmi/NucMerge under the MPL license.


Author(s):  
Nadège Guiglielmoni ◽  
Ramón Rivera-Vicéns ◽  
Romain Koszul ◽  
Jean-François Flot

Non-vertebrate species represent about ~95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expanding our current knowledge of evolution, ecology and biodiversity. Following the continuous improvements and decreasing costs of sequencing technologies, many genome assembly tools have been released, leading to a significant amount of genome projects being completed in recent years. In this review, we examine the current state of genome projects of non-vertebrate animal species. We present an overview of available sequencing technologies, assembly approaches, as well as pre and post-processing steps, genome assembly evaluation methods, and their application to non-vertebrate animal genomes.



2016 ◽  
Author(s):  
Anna Kuosmanen ◽  
Veli Mäkinen

AbstractMotivationTranscript prediction can be modelled as a graph problem where exons are modelled as nodes and reads spanning two or more exons are modelled as exon chains. PacBio third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions.ResultsWe survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity / precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy.AvailabilityThe simulated data and in-house scripts used for this article are available at http://cs.helsinki.fi/u/aekuosma/exon_chain_evaluation_publish.tar.gz.



2019 ◽  
Vol 102 (4) ◽  
pp. 351-376
Author(s):  
Alexander WY Chan ◽  
James Naphtali ◽  
Herb E Schellhorn

Conventional microbiological water monitoring uses culture-dependent techniques to screen indicator microbial species such as Escherichia coli and fecal coliforms. With high-throughput, second-generation sequencing technologies becoming less expensive, water quality monitoring programs can now leverage the massively parallel nature of second-generation sequencing technologies for batch sample processing to simultaneously obtain compositional and functional information of culturable and as yet uncultured microbial organisms. This review provides an introduction to the technical capabilities and considerations necessary for the use of second-generation sequencing technologies, specifically 16S rDNA amplicon and whole-metagenome sequencing, to investigate the composition and functional potential of microbiomes found in water and wastewater systems.



Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 124
Author(s):  
Alessio Iannucci ◽  
Alexey I. Makunin ◽  
Artem P. Lisachov ◽  
Claudio Ciofi ◽  
Roscoe Stanyon ◽  
...  

The study of vertebrate genome evolution is currently facing a revolution, brought about by next generation sequencing technologies that allow researchers to produce nearly complete and error-free genome assemblies. Novel approaches however do not always provide a direct link with information on vertebrate genome evolution gained from cytogenetic approaches. It is useful to preserve and link cytogenetic data with novel genomic discoveries. Sequencing of DNA from single isolated chromosomes (ChromSeq) is an elegant approach to determine the chromosome content and assign genome assemblies to chromosomes, thus bridging the gap between cytogenetics and genomics. The aim of this paper is to describe how ChromSeq can support the study of vertebrate genome evolution and how it can help link cytogenetic and genomic data. We show key examples of ChromSeq application in the refinement of vertebrate genome assemblies and in the study of vertebrate chromosome and karyotype evolution. We also provide a general overview of the approach and a concrete example of genome refinement using this method in the species Anolis carolinensis.



Author(s):  
Valentina Peona ◽  
Mozes P.K. Blom ◽  
Luohao Xu ◽  
Reto Burri ◽  
Shawn Sullivan ◽  
...  

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.



2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Yumeng Li

Organ transplantation has become a powerful strategy for the treatment of malignant diseases. Nevertheless, graft rejection is one of the main factors affecting graft survival after organ transplantation. Under this circumstance, the transplant-related mortality still keeps up. This invention includes the precise medication guidance of Tacrolimus (FK506) inapplicable population, against the side-effects of this drug. This invention, based on second-generation sequencing, has the advantages of relatively low cost and high sequencing throughput. During the design process, we collect the data of single nucleotide polymorphism (SNP) concerning the adverse drug reactions of Tacrolimus. Then we filter and summarize fifteen SNPs basing on importance degree (level >key enzyme>race). Thenceforth, after the process of analyzing the raw extract by operating BWA, Picard-tools, GATK, and Perl, we annotate SNPs by Annovar. Through this innovation, people can obtain further feedback on drugs that targets different genes in order to achieve the purpose of precision medication and minimizing the risks of misusing Tacrolimus.



2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Huifang Zhang ◽  
Chunyan He ◽  
Rui Tian ◽  
Ruilan Wang

Abstract Background Cellulosimicrobium cellulans is a gram-positive filamentous bacterium found primarily in soil and sewage that rarely causes human infection, especially in previously healthy adults, but when it does, it often indicates a poor prognosis. Case presentation We report a case of endocarditis and intracranial infection caused by C. cellulans in a 52-year-old woman with normal immune function and no implants in vivo. The patient started with a febrile headache that progressed to impaired consciousness after 20 days, and she finally died after treatment with vancomycin combined with rifampicin. C. cellulans was isolated from her blood cultures for 3 consecutive days after her admission; however, there was only evidence of C. cellulans sequences for two samples in the second-generation sequencing data generated from her peripheral blood, which were ignored by the technicians. No C. cellulans bands were detected in her cerebrospinal fluid by second-generation sequencing. Conclusions Second-generation sequencing seems to have limitations for certain specific strains of bacteria.



PLoS ONE ◽  
2010 ◽  
Vol 5 (5) ◽  
pp. e10612 ◽  
Author(s):  
Raphael Bueno ◽  
Assunta De Rienzo ◽  
Lingsheng Dong ◽  
Gavin J. Gordon ◽  
Colin F. Hercus ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document