scholarly journals DATMA: Distributed AuTomatic Metagenomic Assembly and annotation framework

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9762
Author(s):  
Andres Benavides ◽  
Friman Sanchez ◽  
Juan F. Alzate ◽  
Felipe Cabarcas

Background A prime objective in metagenomics is to classify DNA sequence fragments into taxonomic units. It usually requires several stages: read’s quality control, de novo assembly, contig annotation, gene prediction, etc. These stages need very efficient programs because of the number of reads from the projects. Furthermore, the complexity of metagenomes requires efficient and automatic tools that orchestrate the different stages. Method DATMA is a pipeline for fast metagenomic analysis that orchestrates the following: sequencing quality control, 16S rRNA-identification, reads binning, de novo assembly and evaluation, gene prediction, and taxonomic annotation. Its distributed computing model can use multiple computing resources to reduce the analysis time. Results We used a controlled experiment to show DATMA functionality. Two pre-annotated metagenomes to compare its accuracy and speed against other metagenomic frameworks. Then, with DATMA we recovered a draft genome of a novel Anaerolineaceae from a biosolid metagenome. Conclusions DATMA is a bioinformatics tool that automatically analyzes complex metagenomes. It is faster than similar tools and, in some cases, it can extract genomes that the other tools do not. DATMA is freely available at https://github.com/andvides/DATMA.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Danilo Guillermo Ceschin ◽  
Natalia Susana Pires ◽  
Mariana Noelia Mardirosian ◽  
Cecilia Inés Lascano ◽  
Andrés Venturino


2017 ◽  
Author(s):  
Robert M. Waterhouse ◽  
Mathieu Seppey ◽  
Felipe A. Simão ◽  
Mosè Manni ◽  
Panagiotis Ioannidis ◽  
...  

ABSTRACTGenomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). Now in its third release, BUSCO utilities extend beyond quality control to applications in comparative genomics, gene predictor training, metagenomics, and phylogenomics.



2022 ◽  
Author(s):  
Shinichi Morita ◽  
Tomoko F. Shibata ◽  
Tomoaki Nishiyama ◽  
Yuuki Kobayashi ◽  
Katsushi Yamaguchi ◽  
...  

Beetles are the largest insect order and one of the most successful animal groups in terms of number of species. The Japanese rhinoceros beetle Trypoxylus dichotomus (Coleoptera, Scarabaeidae, Dynastini) is a giant beetle with distinctive exaggerated horns present on the head and prothoracic regions of the male. T. dichotomus has been used as research model in various fields such as evolutionary developmental biology, ecology, ethology, biomimetics, and drug discovery. In this study, de novo assembly of 615 Mb, representing 80% of the genome estimated by flow cytometry, was obtained using the 10x Chromium platform. The scaffold N50 length of the genome assembly was 8.02 Mb, with repetitive elements predicted to comprise 49.5% of the assembly. In total, 23,987 protein-coding genes were predicted in the genome. In addition, de novo assembly of the mitochondrial genome yielded a contig of 20,217 bp. We also analyzed the transcriptome by generating 16 RNA-seq libraries from a variety of tissues of both sexes and developmental stages, which allowed us to identify 13 co-expressed gene modules. The detailed genomic and transcriptomic information of T. dichotomus is the most comprehensive among those reported for any species of Dynastinae. This genomic information will be an excellent resource for further functional and evolutionary analyses, including the evolutionary origin and genetic regulation of beetle horns and the molecular mechanisms underlying sexual dimorphism.



2021 ◽  
Author(s):  
Richard Finkers ◽  
Martijn P.W. van Kaauwen ◽  
Kai Ament ◽  
Karin Burger-Meijer ◽  
Raymond J. Egging ◽  
...  

Onion is an important vegetable crop with an estimated genome size of 16GB. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with a N50 of 461 Kb. Of which 2.2 Gb was ordered into 8 pseudomolecules using five genetic linkage maps. The remainder of the genome is available in 89.8 K scaffolds. Analysis of this genome shows that at least 72.4% of the genome is repetitive and consists, to a large extent, of (retro) transposons. Many (retro) transposons were already quite old as they had accumulated many mutations, facilitating their assembly, however, hampering their identification. The draft ab initio gene prediction indicated 540 925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. 86,073 models showed similarity to published proteins (UNIPROT). No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with A. sativum (garlic) showed collinearity but also major rearrangements between both species. Not-withstanding, this assembly is the first high-quality draft genome sequence available for the study of onion and will be a valuable resource for further research.



2019 ◽  
Vol 20 (18) ◽  
pp. 4334 ◽  
Author(s):  
Fradj ◽  
Gonçalves dos Santos ◽  
de Montigny ◽  
Awwad ◽  
Boumghar ◽  
...  

Chaga (Inonotus obliquus) is a medicinal fungus used in traditional medicine of Native American and North Eurasian cultures. Several studies have demonstrated the medicinal properties of chaga’s bioactive molecules. For example, several terpenoids (e.g., betulin, betulinic acid and inotodiol) isolated from I. obliquus cells have proven effectiveness in treating different types of tumor cells. However, the molecular mechanisms and regulation underlying the biosynthesis of chaga terpenoids remain unknown. In this study, we report on the optimization of growing conditions for cultured I. obliquus in presence of different betulin sources (e.g., betulin or white birch bark). It was found that better results were obtained for a liquid culture pH 6.2 at 28 °C. In addition, a de novo assembly and characterization of I. obliquus transcriptome in these growth conditions using Illumina technology was performed. A total of 219,288,500 clean reads were generated, allowing for the identification of 20,072 transcripts of I. obliquus including transcripts involved in terpenoid biosynthesis. The differential expression of these genes was confirmed by quantitative-PCR. This study provides new insights on the molecular mechanisms and regulation of I. obliquus terpenoid production. It also contributes useful molecular resources for gene prediction or the development of biotechnologies for the alternative production of terpenoids.



2018 ◽  
Vol 6 (16) ◽  
pp. e00265-18 ◽  
Author(s):  
Stewart T. G. Burgess ◽  
Kathryn Bartley ◽  
Edward J. Marr ◽  
Harry W. Wright ◽  
Robert J. Weaver ◽  
...  

ABSTRACT Sheep scab, caused by infestation with Psoroptes ovis, is highly contagious, results in intense pruritus, and represents a major welfare and economic concern. Here, we report the first draft genome assembly and gene prediction of P. ovis based on PacBio de novo sequencing. The ∼63.2-Mb genome encodes 12,041 protein-coding genes.



BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zack Saud ◽  
Alexandra M. Kortsinoglou ◽  
Vassili N. Kouvelis ◽  
Tariq M. Butt

Abstract Background More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum. Results The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. Conclusions The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation.



2020 ◽  
Author(s):  
Zack Saud ◽  
Alexandra M. Kortsinoglou ◽  
Vassili N. Kouvelis ◽  
Tariq M. Butt

Abstract Background More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum . Results The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. Conclusions The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation.



2018 ◽  
Author(s):  
Justin Jiang ◽  
Andrea M. Quattrini ◽  
Warren R. Francis ◽  
Joseph F. Ryan ◽  
Estefanía Rodríguez ◽  
...  

AbstractBackgroundOver 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests” which provide unique niches and three-dimensional living space for other organisms. The octocoral genusRenillainhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans.Renillais especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several cnidarian genomes are currently available, the majority are from hexacorals. Here, we present ade novoassembly of theR. muellerigenome, making this the first complete draft genome from an octocoral.FindingsWe generated a hybridde novoassembly using the Maryland Super-Read Celera Assembler v.3.2.6 (MaSuRCA). The final assembly included 4,825 scaffolds and a haploid genome size of 172 Mb. A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustusab initiogene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone,Nematostella vectensis,or to the Uniprot database. Although theR. muellerigenome is smaller (172 Mb) than other publicly available, hexacoral genomes (256-448 Mb), theR. muellerigenome is similar to the hexacoral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.ConclusionsTheR. muellerihybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity.



2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Aya Satoh ◽  
Miwako Takasu ◽  
Kentaro Yano ◽  
Yohey Terai

Abstract Objectives The mangrove cricket, Apteronemobius asahinai, shows endogenous activity rhythms that synchronize with the tidal cycle (i.e., a free-running rhythm with a period of ~ 12.4 h [the circatidal rhythm]). Little is known about the molecular mechanisms underlying the circatidal rhythm. We present the draft genome of the mangrove cricket to facilitate future molecular studies of the molecular mechanisms behind this rhythm. Data description The draft genome contains 151,060 scaffolds with a total length of 1.68 Gb (N50: 27 kb) and 92% BUSCO completeness. We obtained 28,831 predicted genes, of which 19,896 (69%) were successfully annotated using at least one of two databases (UniProtKB/SwissProt database and Pfam database).



Sign in / Sign up

Export Citation Format

Share Document