scholarly journals BUSCO applications from quality assessments to gene prediction and phylogenomics

2017 ◽  
Author(s):  
Robert M. Waterhouse ◽  
Mathieu Seppey ◽  
Felipe A. Simão ◽  
Mosè Manni ◽  
Panagiotis Ioannidis ◽  
...  

ABSTRACTGenomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). Now in its third release, BUSCO utilities extend beyond quality control to applications in comparative genomics, gene predictor training, metagenomics, and phylogenomics.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9762
Author(s):  
Andres Benavides ◽  
Friman Sanchez ◽  
Juan F. Alzate ◽  
Felipe Cabarcas

Background A prime objective in metagenomics is to classify DNA sequence fragments into taxonomic units. It usually requires several stages: read’s quality control, de novo assembly, contig annotation, gene prediction, etc. These stages need very efficient programs because of the number of reads from the projects. Furthermore, the complexity of metagenomes requires efficient and automatic tools that orchestrate the different stages. Method DATMA is a pipeline for fast metagenomic analysis that orchestrates the following: sequencing quality control, 16S rRNA-identification, reads binning, de novo assembly and evaluation, gene prediction, and taxonomic annotation. Its distributed computing model can use multiple computing resources to reduce the analysis time. Results We used a controlled experiment to show DATMA functionality. Two pre-annotated metagenomes to compare its accuracy and speed against other metagenomic frameworks. Then, with DATMA we recovered a draft genome of a novel Anaerolineaceae from a biosolid metagenome. Conclusions DATMA is a bioinformatics tool that automatically analyzes complex metagenomes. It is faster than similar tools and, in some cases, it can extract genomes that the other tools do not. DATMA is freely available at https://github.com/andvides/DATMA.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Raíssa Silva ◽  
Kleber Padovani ◽  
Fabiana Góes ◽  
Ronnie Alves

Abstract Background Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. Results We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. Conclusions We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.


2020 ◽  
Vol 6 (7) ◽  
Author(s):  
Ethan T. Hillman ◽  
Ariangela J. Kozik ◽  
Casey A. Hooker ◽  
John L. Burnett ◽  
Yoojung Heo ◽  
...  

Roseburia species are important denizens of the human gut microbiome that ferment complex polysaccharides to butyrate as a terminal fermentation product, which influences human physiology and serves as an energy source for colonocytes. Previous comparative genomics analyses of the genus Roseburia have examined polysaccharide degradation genes. Here, we characterize the core and pangenomes of the genus Roseburia with respect to central carbon and energy metabolism, as well as biosynthesis of amino acids and B vitamins using orthology-based methods, uncovering significant differences among species in their biosynthetic capacities. Variation in gene content among Roseburia species and strains was most significant for cofactor biosynthesis. Unlike all other species of Roseburia that we analysed, Roseburia inulinivorans strains lacked biosynthetic genes for riboflavin or pantothenate but possessed folate biosynthesis genes. Differences in gene content for B vitamin synthesis were matched with differences in putative salvage and synthesis strategies among species. For example, we observed extended biotin salvage capabilities in R. intestinalis strains, which further suggest that B vitamin acquisition strategies may impact fitness in the gut ecosystem. As differences in the functional potential to synthesize components of biomass (e.g. amino acids, vitamins) can drive interspecies interactions, variation in auxotrophies of the Roseburia spp. genomes may influence in vivo gut ecology. This study serves to advance our understanding of the potential metabolic interactions that influence the ecology of Roseburia spp. and, ultimately, may provide a basis for rational strategies to manipulate the abundances of these species.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yan-Yan Guo ◽  
Jia-Xing Yang ◽  
Ming-Zhu Bai ◽  
Guo-Qiang Zhang ◽  
Zhong-Jian Liu

Abstract Background Paphiopedilum is the largest genus of slipper orchids. Previous studies showed that the phylogenetic relationships of this genus are not well resolved, and sparse taxon sampling documented inverted repeat (IR) expansion and small single copy (SSC) contraction of the chloroplast genomes of Paphiopedilum. Results Here, we sequenced, assembled, and annotated 77 plastomes of Paphiopedilum species (size range of 152,130 – 164,092 bp). The phylogeny based on the plastome resolved the relationships of the genus except for the phylogenetic position of two unstable species. We used phylogenetic and comparative genomic approaches to elucidate the plastome evolution of Paphiopedilum. The plastomes of Paphiopedilum have a conserved genome structure and gene content except in the SSC region. The large single copy/inverted repeat (LSC/IR) boundaries are relatively stable, while the boundaries of the inverted repeat and small single copy region (IR/SSC) varied among species. Corresponding to the IR/SSC boundary shifts, the chloroplast genomes of the genus experienced IR expansion and SSC contraction. The IR region incorporated one to six genes of the SSC region. Unexpectedly, great variation in the size, gene order, and gene content of the SSC regions was found, especially in the subg. Parvisepalum. Furthermore, Paphiopedilum provides evidence for the ongoing degradation of the ndh genes in the photoautotrophic plants. The estimated substitution rates of the protein coding genes show accelerated rates of evolution in clpP, psbH, and psbZ. Genes transferred to the IR region due to the boundary shift also have higher substitution rates. Conclusions We found IR expansion and SSC contraction in the chloroplast genomes of Paphiopedilum with dense sampling, and the genus shows variation in the size, gene order, and gene content of the SSC region. This genus provides an ideal system to investigate the dynamics of plastome evolution.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Guilherme B. Dias ◽  
Musaad A. Altammami ◽  
Hamadttu A. F. El-Shafie ◽  
Fahad M. Alhoshani ◽  
Mohamed B. Al-Fageeh ◽  
...  

AbstractThe red palm weevil Rhynchophorus ferrugineus (Coleoptera: Curculionidae) is an economically-important invasive species that attacks multiple species of palm trees around the world. A better understanding of gene content and function in R. ferrugineus has the potential to inform pest control strategies and thereby mitigate economic and biodiversity losses caused by this species. Using 10x Genomics linked-read sequencing, we produced a haplotype-resolved diploid genome assembly for R. ferrugineus from a single heterozygous individual with modest sequencing coverage ($$\sim$$ ∼ 62x). Benchmarking against conserved single-copy Arthropod orthologs suggests both pseudo-haplotypes in our R. ferrugineus genome assembly are highly complete with respect to gene content, and do not suffer from haplotype-induced duplication artifacts present in a recently published hybrid assembly for this species. Annotation of the larger pseudo-haplotype in our assembly provides evidence for 23,413 protein-coding loci in R. ferrugineus, including over 13,000 predicted proteins annotated with Gene Ontology terms and over 6000 loci independently supported by high-quality Iso-Seq transcriptomic data. Our assembly also includes 95% of R. ferrugineus chemosensory, detoxification and neuropeptide-related transcripts identified previously using RNA-seq transcriptomic data, and provides a platform for the molecular analysis of these and other functionally-relevant genes that can help guide management of this widespread insect pest.


2020 ◽  
Author(s):  
Vimaladhasan Senthamizhan ◽  
Balaraman Ravindran ◽  
Karthik Raman

AbstractEssential gene prediction models built so far are heavily reliant on sequence-based features and the scope of network-based features has been narrow. Previous work from our group demonstrated the importance of using network-based features for predicting essential genes with high accuracy. Here, we applied our approach for the prediction of essential genes to organisms from the STRING database and hosted the results in a standalone website. Our database, NetGenes, contains essential gene predictions for 2700+ bacteria predicted using features derived from STRING protein-protein functional association networks. Housing a total of 3.5M+ genes, NetGenes offers various features like essentiality scores, annotations and feature vectors for each gene. NetGenes is available at https://rbc-dsai.iitm.github.io/NetGenes/


Author(s):  
Frédéric Lemoine ◽  
Luc Blassel ◽  
Jakub Voznica ◽  
Olivier Gascuel

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.


2015 ◽  
Vol 53 (12) ◽  
pp. 3719-3722
Author(s):  
Susan E. Sharp ◽  
Melissa B. Miller ◽  
Janet Hindler

The Center for Medicaid and Medicare Services (CMS) recently published their Individualized Quality Control Plan (IQCP [https://www.cms.gov/regulations-and-guidance/legislation/CLIA/Individualized_Quality_Control_Plan_IQCP.html]), which will be the only option for quality control (QC) starting in January 2016 if laboratories choose not to perform Clinical Laboratory Improvement Act (CLIA) [U.S. Statutes at Large 81(1967):533] default QC. Laboratories will no longer be able to use “equivalent QC” (EQC) or the Clinical and Laboratory Standards Institute (CLSI) standards alone for quality control of their microbiology systems. The implementation of IQCP in clinical microbiology laboratories will most certainly be an added burden, the benefits of which are currently unknown.


2020 ◽  
Vol 33 (8) ◽  
pp. 1022-1024
Author(s):  
Giovanni Cafà ◽  
Thaís Regina Boufleur ◽  
Renata Rebellato Linhares de Castro ◽  
Nelson Sidnei Massola ◽  
Riccardo Baroncelli

The genus Stagonosporopsis is classified within the Didymellaceae family and has around 40 associated species. Among them, several species are important plant pathogens responsible for significant losses in economically important crops worldwide. Stagonosporopsis vannaccii is a newly described species pathogenic to soybean. Here, we present the draft whole-genome sequence, gene prediction, and annotation of S. vannaccii isolate LFN0148 (also known as IMI 507030). To our knowledge, this is the first genome sequenced of this species and represents a new useful source for future research on fungal comparative genomics studies.


2013 ◽  
Vol 6 (273) ◽  
pp. ec96-ec96
Author(s):  
L. Bryan Ray

Damaged mitochondria are removed from cells in a process known as mitophagy. Failure of this quality-control mechanism contributes to Parkinson’s disease. When damaged mitochondria lose membrane depolarization, the protein kinase, PINK1, accumulates on the mitochondrial surface, recruits Parkin, and promotes mitophagy. Chen and Dorn describe another component of this process, mitofusin 2, which appears to function as the receptor for Parkin on the surface of damaged mitochondria.Y. Chen, G. W. Dorn II, PINK1-phosphorylated mitofusin 2 is a Parkin receptor for culling damaged mitochondria. Science340, 471–475 (2013). [Abstract] [Full Text]


Sign in / Sign up

Export Citation Format

Share Document