scholarly journals The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments

2021 ◽  
Author(s):  
Yosuke Nishimura ◽  
Susumu Yoshizawa

Marine microorganisms are immensely diverse and play fundamental roles in global geochemical cycling. Recent metagenome-assembled genome studies, with special attention to large-scale projects such as Tara Oceans, have expanded the genomic repertoire of marine microorganisms. However, published marine metagenome data has not been fully explored yet. Here, we collected 2,057 marine metagenomes (>29 Tera bps of sequences) covering various marine environments and developed a new genome reconstruction pipeline. We reconstructed 52,325 qualified genomes composed of 8,466 prokaryotic species-level clusters spanning 59 phyla, including genomes from deep-sea deeper than 1,000 m (n=3,337), low-oxygen zones of <90 μmol O2 per kg water (n=7,884), and polar regions (n=7,752). Novelty evaluation using a genome taxonomy database shows that 6,256 species (73.9%) are novel and include genomes of high taxonomic novelty such as new class candidates. These genomes collectively expanded the known phylogenetic diversity of marine prokaryotes by 34.2% and the species representatives cover 26.5 - 42.0% of prokaryote-enriched metagenomes. This genome resource, thoroughly leveraging accumulated metagenomic data, illuminates uncharacterized marine microbial dark matter lineages.

Author(s):  
Donovan H Parks ◽  
Michael Imelfort ◽  
Connor T Skennerton ◽  
Philip Hugenholtz ◽  
Gene W Tyson

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of ‘marker’ genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate, single cell and metagenome derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination, and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.


Author(s):  
Donovan H Parks ◽  
Michael Imelfort ◽  
Connor T Skennerton ◽  
Philip Hugenholtz ◽  
Gene W Tyson

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of ‘marker’ genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree along with information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate, single cell and metagenome derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination, and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. CheckM is open source software available at http://ecogenomics.github.io/CheckM.


Metabolites ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 22 ◽  
Author(s):  
Partho Sen ◽  
Matej Orešič

There is growing interest in the metabolic interplay between the gut microbiome and host metabolism. Taxonomic and functional profiling of the gut microbiome by next-generation sequencing (NGS) has unveiled substantial richness and diversity. However, the mechanisms underlying interactions between diet, gut microbiome and host metabolism are still poorly understood. Genome-scale metabolic modeling (GSMM) is an emerging approach that has been increasingly applied to infer diet–microbiome, microbe–microbe and host–microbe interactions under physiological conditions. GSMM can, for example, be applied to estimate the metabolic capabilities of microbes in the gut. Here, we discuss how meta-omics datasets such as shotgun metagenomics, can be processed and integrated to develop large-scale, condition-specific, personalized microbiota models in healthy and disease states. Furthermore, we summarize various tools and resources available for metagenomic data processing and GSMM, highlighting the experimental approaches needed to validate the model predictions.


Author(s):  
Donovan H Parks ◽  
Michael Imelfort ◽  
Connor T Skennerton ◽  
Philip Hugenholtz ◽  
Gene W Tyson

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of ‘marker’ genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate, single cell and metagenome derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination, and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.


2020 ◽  
Vol 17 (5) ◽  
pp. 716-724
Author(s):  
Yan A. Ivanenkov ◽  
Renat S. Yamidanov ◽  
Ilya A. Osterman ◽  
Petr V. Sergiev ◽  
Vladimir A. Aladinskiy ◽  
...  

Background: The key issue in the development of novel antimicrobials is a rapid expansion of new bacterial strains resistant to current antibiotics. Indeed, World Health Organization has reported that bacteria commonly causing infections in hospitals and in the community, e.g. E. Coli, K. pneumoniae and S. aureus, have high resistance vs the last generations of cephalosporins, carbapenems and fluoroquinolones. During the past decades, only few successful efforts to develop and launch new antibacterial medications have been performed. This study aims to identify new class of antibacterial agents using novel high-throughput screening technique. Methods: We have designed library containing 125K compounds not similar in structure (Tanimoto coeff.< 0.7) to that published previously as antibiotics. The HTS platform based on double reporter system pDualrep2 was used to distinguish between molecules able to block translational machinery or induce SOS-response in a model E. coli system. MICs for most active chemicals in LB and M9 medium were determined using broth microdilution assay. Results: In an attempt to discover novel classes of antibacterials, we performed HTS of a large-scale small molecule library using our unique screening platform. This approach permitted us to quickly and robustly evaluate a lot of compounds as well as to determine the mechanism of action in the case of compounds being either translational machinery inhibitors or DNA-damaging agents/replication blockers. HTS has resulted in several new structural classes of molecules exhibiting an attractive antibacterial activity. Herein, we report as promising antibacterials. Two most active compounds from this series showed MIC value of 1.2 (5) and 1.8 μg/mL (6) and good selectivity index. Compound 6 caused RFP induction and low SOS response. In vitro luciferase assay has revealed that it is able to slightly inhibit protein biosynthesis. Compound 5 was tested on several archival strains and exhibited slight activity against gram-negative bacteria and outstanding activity against S. aureus. The key structural requirements for antibacterial potency were also explored. We found, that the unsubstituted carboxylic group is crucial for antibacterial activity as well as the presence of bulky hydrophobic substituents at phenyl fragment. Conclusion: The obtained results provide a solid background for further characterization of the 5'- (carbonylamino)-2,3'-bithiophene-4'-carboxylate derivatives discussed herein as new class of antibacterials and their optimization campaign.


Genetics ◽  
2001 ◽  
Vol 159 (4) ◽  
pp. 1765-1778
Author(s):  
Gregory J Budziszewski ◽  
Sharon Potter Lewis ◽  
Lyn Wegrich Glover ◽  
Jennifer Reineke ◽  
Gary Jones ◽  
...  

Abstract We have undertaken a large-scale genetic screen to identify genes with a seedling-lethal mutant phenotype. From screening ~38,000 insertional mutant lines, we identified &gt;500 seedling-lethal mutants, completed cosegregation analysis of the insertion and the lethal phenotype for &gt;200 mutants, molecularly characterized 54 mutants, and provided a detailed description for 22 of them. Most of the seedling-lethal mutants seem to affect chloroplast function because they display altered pigmentation and affect genes encoding proteins predicted to have chloroplast localization. Although a high level of functional redundancy in Arabidopsis might be expected because 65% of genes are members of gene families, we found that 41% of the essential genes found in this study are members of Arabidopsis gene families. In addition, we isolated several interesting classes of mutants and genes. We found three mutants in the recently discovered nonmevalonate isoprenoid biosynthetic pathway and mutants disrupting genes similar to Tic40 and tatC, which are likely to be involved in chloroplast protein translocation. Finally, we directly compared T-DNA and Ac/Ds transposon mutagenesis methods in Arabidopsis on a genome scale. In each population, we found only about one-third of the insertion mutations cosegregated with a mutant phenotype.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Martin Johnsson ◽  
Andrew Whalen ◽  
Roger Ros-Freixedes ◽  
Gregor Gorjanc ◽  
Ching-Yi Chen ◽  
...  

Abstract Background Meiotic recombination results in the exchange of genetic material between homologous chromosomes. Recombination rate varies between different parts of the genome, between individuals, and is influenced by genetics. In this paper, we assessed the genetic variation in recombination rate along the genome and between individuals in the pig using multilocus iterative peeling on 150,000 individuals across nine genotyped pedigrees. We used these data to estimate the heritability of recombination and perform a genome-wide association study of recombination in the pig. Results Our results confirmed known features of the recombination landscape of the pig genome, including differences in genetic length of chromosomes and marked sex differences. The recombination landscape was repeatable between lines, but at the same time, there were differences in average autosome-wide recombination rate between lines. The heritability of autosome-wide recombination rate was low but not zero (on average 0.07 for females and 0.05 for males). We found six genomic regions that are associated with recombination rate, among which five harbour known candidate genes involved in recombination: RNF212, SHOC1, SYCP2, MSH4 and HFM1. Conclusions Our results on the variation in recombination rate in the pig genome agree with those reported for other vertebrates, with a low but nonzero heritability, and the identification of a major quantitative trait locus for recombination rate that is homologous to that detected in several other species. This work also highlights the utility of using large-scale livestock data to understand biological processes.


Author(s):  
Lina Kloub ◽  
Sean Gosselin ◽  
Matthew Fullmer ◽  
Joerg Graf ◽  
J Peter Gogarten ◽  
...  

Abstract Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the observed relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.


2012 ◽  
Vol 15 (3) ◽  
pp. 442-452 ◽  
Author(s):  
Thomas Espeseth ◽  
Andrea Christoforou ◽  
Astri J. Lundervold ◽  
Vidar M. Steen ◽  
Stephanie Le Hellard ◽  
...  

Data collection for the Norwegian Cognitive NeuroGenetics sample (NCNG) was initiated in 2003 with a research grant (to Ivar Reinvang) to study cognitive aging, brain function, and genetic risk factors. The original focus was on the effects of aging (from middle age and up) and candidate genes (e.g., APOE, CHRNA4) in cross-sectional and longitudinal designs, with the cognitive and MRI-based data primarily being used for this purpose. However, as the main topic of the project broadened from cognitive aging to imaging and cognitive genetics more generally, the sample size, age range of the participants, and scope of available phenotypes and genotypes, have developed beyond the initial project. In 2009, a genome-wide association (GWA) study was undertaken, and the NCNG proper was established to study the genetics of cognitive and brain function more comprehensively. The NCNG is now controlled by the NCNG Study Group, which consists of the present authors. Prominent features of the NCNG are the adult life-span coverage of healthy participants with high-dimensional imaging, and cognitive data from a genetically homogenous sample. Another unique property is the large-scale (sample size 300–700) use of experimental cognitive tasks focusing on attention and working memory. The NCNG data is now used in numerous ongoing GWA-based studies and has contributed to several international consortia on imaging and cognitive genetics. The objective of the following presentation is to give other researchers the information necessary to evaluate possible contributions from the NCNG to various multi-sample data analyses.


Author(s):  
Andrew J. Watson ◽  
Timothy M. Lenton ◽  
Benjamin J. W. Mills

The major biogeochemical cycles that keep the present-day Earth habitable are linked by a network of feedbacks, which has led to a broadly stable chemical composition of the oceans and atmosphere over hundreds of millions of years. This includes the processes that control both the atmospheric and oceanic concentrations of oxygen. However, one notable exception to the generally well-behaved dynamics of this system is the propensity for episodes of ocean anoxia to occur and to persist for 10 5 –10 6 years, these ocean anoxic events (OAEs) being particularly associated with warm ‘greenhouse’ climates. A powerful mechanism responsible for past OAEs was an increase in phosphorus supply to the oceans, leading to higher ocean productivity and oxygen demand in subsurface water. This can be amplified by positive feedbacks on the nutrient content of the ocean, with low oxygen promoting further release of phosphorus from ocean sediments, leading to a potentially self-sustaining condition of deoxygenation. We use a simple model for phosphorus in the ocean to explore this feedback, and to evaluate the potential for humans to bring on global-scale anoxia by enhancing P supply to the oceans. While this is not an immediate global change concern, it is a future possibility on millennial and longer time scales, when considering both phosphate rock mining and increased chemical weathering due to climate change. Ocean deoxygenation, once begun, may be self-sustaining and eventually could result in long-lasting and unpleasant consequences for the Earth's biosphere. This article is part of the themed issue ‘Ocean ventilation and deoxygenation in a warming world’.


Sign in / Sign up

Export Citation Format

Share Document