scholarly journals Roary: Rapid large-scale prokaryote pan genome analysis

2015 ◽  
Author(s):  
Andrew J Page ◽  
Carla A Cummins ◽  
Martin Hunt ◽  
Vanessa K Wong ◽  
Sandra Reuter ◽  
...  

A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.

2022 ◽  
Author(s):  
Tang Li ◽  
Yanbin Yin

Background: Large scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years. As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow. However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination, due to mis-assembly and mis-binning. Here, we conducted a critical assessment of including MAGs in pan-genome analysis, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. Results: We found that incompleteness led to more significant core gene loss than fragmentation. Contamination had little effect on core genome size but had major influence on accessory genomes. The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes. Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Conclusions: We conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvio does with Prodigal) are necessary in pan-genome analysis of MAGs to alleviate the accuracy loss. Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.


GigaScience ◽  
2018 ◽  
Vol 7 (4) ◽  
Author(s):  
Harry A Thorpe ◽  
Sion C Bayliss ◽  
Samuel K Sheppard ◽  
Edward J Feil

2020 ◽  
Vol 14 ◽  
pp. 117793222093806
Author(s):  
Sávio Souza Costa ◽  
Luís Carlos Guimarães ◽  
Artur Silva ◽  
Siomar Castro Soares ◽  
Rafael Azevedo Baraúna

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.


2011 ◽  
Vol 130-134 ◽  
pp. 2010-2014
Author(s):  
Wen Chang Tsai ◽  
Bill Chen

The paper intends to study the core losses of non-oriented electrical steel laminations under high frequency voltage excitations. The measurement of core losses of the electrical steel laminations in Epstein Frame is implemented step by step from 50 Hz to 5000Hz. The accuracy of results evaluated from Expanded GSE method is compared with those tested by Epstein Frame. The core loss database for three different kinds of medium, medium-high and high quality electrical steel laminations (50CS350, 50CS470, 50CS600) is completed in the paper. Also, 85 test points of core loss data are established in the flux density ranging from 0.3T to 1.8T and in the power supply frequency ranging from 50Hz to 5000Hz. Maximum core loss value is close to 443W/kg. The tested core loss data and the Expanded GSE models are useful and may cover for the applications of large-scale wind-driven generators and general motors. They also enough provide designers with the accurate information to minimize the core losses of wind turbine generators.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Sonam Gaba ◽  
Abha Kumari ◽  
Marnix Medema ◽  
Rajeev Kaushik

AbstractHalobacteria, a class of Euryarchaeota are extremely halophilic archaea that can adapt to a wide range of salt concentration generally from 10% NaCl to saturated salt concentration of 32% NaCl. It consists of the orders: Halobacteriales, Haloferaciales and Natriabales. Pan-genome analysis of class Halobacteria was done to explore the core (300) and variable components (Softcore: 998, Cloud:36531, Shell:11784). The core component revealed genes of replication, transcription, translation and repair, whereas the variable component had a major portion of environmental information processing. The pan-gene matrix was mapped onto the core-gene tree to find the ancestral (44.8%) and derived genes (55.1%) of the Last Common Ancestor of Halobacteria. A High percentage of derived genes along with presence of transformation and conjugation genes indicate the occurrence of horizontal gene transfer during the evolution of Halobacteria. A Core and pan-gene tree were also constructed to infer a phylogeny which implicated on the new super-order comprising of Natrialbales and Halobacteriales.


2015 ◽  
Vol 31 (22) ◽  
pp. 3691-3693 ◽  
Author(s):  
Andrew J. Page ◽  
Carla A. Cummins ◽  
Martin Hunt ◽  
Vanessa K. Wong ◽  
Sandra Reuter ◽  
...  

Radiation ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 79-94
Author(s):  
Peter K. Rogan ◽  
Eliseos J. Mucaki ◽  
Ben C. Shirley ◽  
Yanxin Li ◽  
Ruth C. Wilkins ◽  
...  

The dicentric chromosome (DC) assay accurately quantifies exposure to radiation; however, manual and semi-automated assignment of DCs has limited its use for a potential large-scale radiation incident. The Automated Dicentric Chromosome Identifier and Dose Estimator (ADCI) software automates unattended DC detection and determines radiation exposures, fulfilling IAEA criteria for triage biodosimetry. This study evaluates the throughput of high-performance ADCI (ADCI-HT) to stratify exposures of populations in 15 simulated population scale radiation exposures. ADCI-HT streamlines dose estimation using a supercomputer by optimal hierarchical scheduling of DC detection for varying numbers of samples and metaphase cell images in parallel on multiple processors. We evaluated processing times and accuracy of estimated exposures across census-defined populations. Image processing of 1744 samples on 16,384 CPUs required 1 h 11 min 23 s and radiation dose estimation based on DC frequencies required 32 sec. Processing of 40,000 samples at 10 exposures from five laboratories required 25 h and met IAEA criteria (dose estimates were within 0.5 Gy; median = 0.07). Geostatistically interpolated radiation exposure contours of simulated nuclear incidents were defined by samples exposed to clinically relevant exposure levels (1 and 2 Gy). Analysis of all exposed individuals with ADCI-HT required 0.6–7.4 days, depending on the population density of the simulation.


Genetics ◽  
1997 ◽  
Vol 147 (2) ◽  
pp. 643-655 ◽  
Author(s):  
Kenneth G Ross ◽  
Michael J B Krieger ◽  
D DeWayne Shoemaker ◽  
Edward L Vargo ◽  
Laurent Keller

We describe genetic structure at various scales in native populations of the fire ant Solenopsis invicta using two classes of nuclear markers, allozymes and microsatellites, and markers of the mitochondrial genome. Strong structure was found at the nest level in both the monogyne (single queen) and polygyne (multiple queen) social forms using allozymes. Weak but significant microgeographic structure was detected above the nest level in polygyne populations but not in monogyne populations using both classes of nuclear markers. Pronounced mitochondrial DNA (mtDNA) differentiation was evident also at this level in the polygyne form only. These microgeographic patterns are expected because polygyny in ants is associated with restricted local gene flow due mainly to limited vagility of queens. Weak but significant nuclear differentiation was detected between sympatric social forms, and strong mtDNA differentiation also was found at this level. Thus, queens of each form seem unable to establish themselves in nests of the alternate type, and some degree of assortative mating by form may exist as well. Strong differentiation was found between the two study regions usinga all three sets of markers. Phylogeographic analyses of the mtDNA suggest that recent limitations on gene flow rather than longstanding barriers to dispersal are responsible for this large-scale structure.


Sign in / Sign up

Export Citation Format

Share Document