scholarly journals Critical assessment of pan-genomics of metagenome-assembled genomes

2022 ◽  
Author(s):  
Tang Li ◽  
Yanbin Yin

Background: Large scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years. As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow. However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination, due to mis-assembly and mis-binning. Here, we conducted a critical assessment of including MAGs in pan-genome analysis, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. Results: We found that incompleteness led to more significant core gene loss than fragmentation. Contamination had little effect on core genome size but had major influence on accessory genomes. The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes. Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Conclusions: We conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvio does with Prodigal) are necessary in pan-genome analysis of MAGs to alleviate the accuracy loss. Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.

2015 ◽  
Author(s):  
Andrew J Page ◽  
Carla A Cummins ◽  
Martin Hunt ◽  
Vanessa K Wong ◽  
Sandra Reuter ◽  
...  

A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.


GigaScience ◽  
2018 ◽  
Vol 7 (4) ◽  
Author(s):  
Harry A Thorpe ◽  
Sion C Bayliss ◽  
Samuel K Sheppard ◽  
Edward J Feil

2019 ◽  
Vol 7 (11) ◽  
pp. 487
Author(s):  
Samrat Ghosh ◽  
Aditya Narayan Sarangi ◽  
Mayuri Mukherjee ◽  
Swati Bhowmick ◽  
Sucheta Tripathy

Lactobacillus paracasei are diverse Gram-positive bacteria that are very closely related to Lactobacillus casei, belonging to the Lactobacillus casei group. Due to extreme genome similarities between L. casei and L. paracasei, many strains have been cross placed in the other group. We had earlier sequenced and analyzed the genome of Lactobacillus paracasei Lbs2, but mistakenly identified it as L. casei. We re-analyzed Lbs2 reads into a 2.5 MB genome that is 91.28% complete with 0.8% contamination, which is now suitably placed under L. paracasei based on Average Nucleotide Identity and Average Amino Acid Identity. We took 74 sequenced genomes of L. paracasei from GenBank with assembly sizes ranging from 2.3 to 3.3 MB and genome completeness between 88% and 100% for comparison. The pan-genome of 75 L. paracasei strains hold 15,945 gene families (21,5232 genes), while the core genome contained about 8.4% of the total genes (243 gene families with 18,225 genes) of pan-genome. Phylogenomic analysis based on core gene families revealed that the Lbs2 strain has a closer relationship with L. paracasei subsp. tolerans DSM20258. Finally, the in-silico analysis of the L. paracasei Lbs2 genome revealed an important pathway that could underpin the production of thiamin, which may contribute to the host energy metabolism.


2020 ◽  
Vol 14 ◽  
pp. 117793222093806
Author(s):  
Sávio Souza Costa ◽  
Luís Carlos Guimarães ◽  
Artur Silva ◽  
Siomar Castro Soares ◽  
Rafael Azevedo Baraúna

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Sonam Gaba ◽  
Abha Kumari ◽  
Marnix Medema ◽  
Rajeev Kaushik

AbstractHalobacteria, a class of Euryarchaeota are extremely halophilic archaea that can adapt to a wide range of salt concentration generally from 10% NaCl to saturated salt concentration of 32% NaCl. It consists of the orders: Halobacteriales, Haloferaciales and Natriabales. Pan-genome analysis of class Halobacteria was done to explore the core (300) and variable components (Softcore: 998, Cloud:36531, Shell:11784). The core component revealed genes of replication, transcription, translation and repair, whereas the variable component had a major portion of environmental information processing. The pan-gene matrix was mapped onto the core-gene tree to find the ancestral (44.8%) and derived genes (55.1%) of the Last Common Ancestor of Halobacteria. A High percentage of derived genes along with presence of transformation and conjugation genes indicate the occurrence of horizontal gene transfer during the evolution of Halobacteria. A Core and pan-gene tree were also constructed to infer a phylogeny which implicated on the new super-order comprising of Natrialbales and Halobacteriales.


2020 ◽  
Vol 1 (1) ◽  
pp. 23-25
Author(s):  
Nguyen Thanh Luan

Aquatic diseases caused by the massive wealth of pathogenic bacteria pose major challenges to the development of a sustainable bio-control method, such as antimicrobial measures and vaccine strategies. Recent advances in genome sequencing technology have revolutionized the field of pathogenic pan-genomics and have also influenced disease management in aqua farms. In this study, Edwardsiella strains were differentially classified into four species by a phylogenomics construction based on the pan-genome analysis. Edwardsiella species were correctly classified by pan-genome analysis (core gene, dispensable gene, singleton gene) of 15 complete genomes. Based on the presence of the gene repertoires, gene encoding extracellular protein, outer membrane protein, adhesion ability and antigenic sites, 9 genes (E. ictaluri), 13 genes (E. anguilarium), 9 genes (E. piscicida), 12 genes (E. tadar), and 14 genes (all species) screened from core-gene of 2686, 2673, 2877, 2920, and 1957 gene, respectively have potential in developing reverse vaccinology strategy to the prevention of Edwarsiellosis. The in-silico analysis will also help to optimize the time and improve the cross-serotype reaction of vaccines in farmed fish. The RV research implementing pan-genome analysis will be a strategy that is applicable to pathogens in both aquatic and terrestrial animals.


2015 ◽  
Vol 31 (22) ◽  
pp. 3691-3693 ◽  
Author(s):  
Andrew J. Page ◽  
Carla A. Cummins ◽  
Martin Hunt ◽  
Vanessa K. Wong ◽  
Sandra Reuter ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document