Reconstruction of ancestral genomes in presence of gene gain and loss.

Mapping Intimacies ◽

10.1101/040196 ◽

2016 ◽

Author(s):

Pavel Avdeyev ◽

Shuai Jiang ◽

Sergey Aganezov ◽

Fei Hu ◽

Max A. Alekseyev

Keyword(s):

Software Tool ◽

Single Copy ◽

Genome Rearrangements ◽

Superior Performance ◽

Gene Gain ◽

Gene Duplications ◽

Genomic Changes ◽

Breakpoint Reuse ◽

Gain Loss ◽

The Given

Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genomes reconstruction tools. The MGRA2 tool is distributed as an open-source software and can be downloaded from GitHub repository http://github.com/ablab/mgra/. It is also available in the form of a web-server at http://mgra.cblab.org, which makes it readily accessible for inexperienced users.

What can we learn from over 100,000 Escherichia coli genomes?

10.1101/708131 ◽

2019 ◽

Cited By ~ 4

Author(s):

Kaleb Abram ◽

Zulema Udaondo ◽

Carissa Bleker ◽

Visanu Wanchai ◽

Trudy M. Wassenaar ◽

...

Keyword(s):

Escherichia Coli ◽

Large Scale ◽

Phylogenetic Analyses ◽

Bacterial Species ◽

Single Copy ◽

Gene Gain ◽

E Coli ◽

Phylogenetic Profiles ◽

Genomic Studies ◽

Gain Loss

ABSTRACTThe explosion of microbial genome sequences in public databases allows for large-scale population genomic studies of bacterial species, such as Escherichia coli. In this study, we examine and classify more than one hundred thousand E. coli and Shigella genomes. After removing outliers, a semi-automated Mash-based analysis of 10,667 assembled genomes reveals 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup serves as a proxy to classify more than 95,000 unassembled genomes. This analysis shows that most sequenced E. coli genomes belong to 4 phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups described is supported by pangenomic and phylogenetic analyses, which show differences in gene preservation between phylogroups. A phylogenetic tree constructed with 2,613 single copy core genes along with a matrix of phylogenetic profiles is used to confirm that the 14 phylogroups change at different rates of gene gain/loss/duplication. The methodology used in this work is able to identify previously uncharacterized phylogroups in E. coli species. Some of these new phylogroups harbor clonal strains that have undergone a process of genomic adaptation to the acquisition of new genomic elements related to virulence or antibiotic resistance. This is, to our knowledge, the largest E. coli genome dataset analyzed to date and provides valuable insights into the population structure of the species.

Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

Communications Biology ◽

10.1038/s42003-020-01626-5 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Kaleb Abram ◽

Zulema Udaondo ◽

Carissa Bleker ◽

Visanu Wanchai ◽

Trudy M. Wassenaar ◽

...

Keyword(s):

Escherichia Coli ◽

Phylogenetic Tree ◽

Single Copy ◽

Gene Gain ◽

E Coli ◽

Sequence Read Archive ◽

Gain Loss ◽

Lines Of Evidence ◽

Core Genes ◽

Genome Dataset

AbstractIn this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.

Single-Trait and Multiple-Trait Genomic Prediction From Multi-Class Bayesian Alphabet Models Using Biological Information

Frontiers in Genetics ◽

10.3389/fgene.2021.717457 ◽

2021 ◽

Vol 12 ◽

Author(s):

Zigui Wang ◽

Hao Cheng

Keyword(s):

Molecular Markers ◽

Genomic Prediction ◽

Simulated Data ◽

Software Tool ◽

Biological Information ◽

Superior Performance ◽

Multiple Trait ◽

Bayesian Lasso ◽

Trait Analysis ◽

Causal Variants

Genomic prediction has been widely used in multiple areas and various genomic prediction methods have been developed. The majority of these methods, however, focus on statistical properties and ignore the abundant useful biological information like genome annotation or previously discovered causal variants. Therefore, to improve prediction performance, several methods have been developed to incorporate biological information into genomic prediction, mostly in single-trait analysis. A commonly used method to incorporate biological information is allocating molecular markers into different classes based on the biological information and assigning separate priors to molecular markers in different classes. It has been shown that such methods can achieve higher prediction accuracy than conventional methods in some circumstances. However, these methods mainly focus on single-trait analysis, and available priors of these methods are limited. Thus, in both single-trait and multiple-trait analysis, we propose the multi-class Bayesian Alphabet methods, in which multiple Bayesian Alphabet priors, including RR-BLUP, BayesA, BayesB, BayesCΠ, and Bayesian LASSO, can be used for markers allocated to different classes. The superior performance of the multi-class Bayesian Alphabet in genomic prediction is demonstrated using both real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.

A Screen for Gene Paralogies Delineating Evolutionary Branching Order of Early Metazoa

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400951 ◽

2019 ◽

Vol 10 (2) ◽

pp. 811-826 ◽

Cited By ~ 5

Author(s):

Albert Erives ◽

Bernd Fritzsch

Keyword(s):

Phylogenetic Analyses ◽

Gene Families ◽

Single Copy ◽

Gene Duplications ◽

Nervous Systems ◽

Evolutionary Diversification ◽

Transmembrane Channel ◽

Sensory Epithelia ◽

Protein X ◽

Ancient Gene

The evolutionary diversification of animals is one of Earth’s greatest marvels, yet its earliest steps are shrouded in mystery. Animals, the monophyletic clade known as Metazoa, evolved wildly divergent multicellular life strategies featuring ciliated sensory epithelia. In many lineages epithelial sensoria became coupled to increasingly complex nervous systems. Currently, different phylogenetic analyses of single-copy genes support mutually-exclusive possibilities that either Porifera or Ctenophora is sister to all other animals. Resolving this dilemma would advance the ecological and evolutionary understanding of the first animals and the evolution of nervous systems. Here we describe a comparative phylogenetic approach based on gene duplications. We computationally identify and analyze gene families with early metazoan duplications using an approach that mitigates apparent gene loss resulting from the miscalling of paralogs. In the transmembrane channel-like (TMC) family of mechano-transducing channels, we find ancient duplications that define separate clades for Eumetazoa (Placozoa + Cnidaria + Bilateria) vs. Ctenophora, and one duplication that is shared only by Eumetazoa and Porifera. In the Max-like protein X (MLX and MLXIP) family of bHLH-ZIP regulators of metabolism, we find that all major lineages from Eumetazoa and Porifera (sponges) share a duplicated gene pair that is sister to the single-copy gene maintained in Ctenophora. These results suggest a new avenue for deducing deep phylogeny by choosing rather than avoiding ancient gene paralogies.

Life and Death of Selfish Genes: Comparative Genomics Reveals the Dynamic Evolution of Cytoplasmic Incompatibility

Molecular Biology and Evolution ◽

10.1093/molbev/msaa209 ◽

2020 ◽

Vol 38 (1) ◽

pp. 2-15 ◽

Cited By ~ 3

Author(s):

Julien Martinez ◽

Lisa Klasson ◽

John J Welch ◽

Francis M Jiggins

Keyword(s):

Cytoplasmic Incompatibility ◽

Gene Gain ◽

Evolutionary Models ◽

Loss Of Function ◽

Genome Sequences ◽

Life And Death ◽

Selfish Genes ◽

Wolbachia Genome ◽

The Cross ◽

Gain Loss

Abstract Cytoplasmic incompatibility is a selfish reproductive manipulation induced by the endosymbiont Wolbachia in arthropods. In males Wolbachia modifies sperm, leading to embryonic mortality in crosses with Wolbachia-free females. In females, Wolbachia rescues the cross and allows development to proceed normally. This provides a reproductive advantage to infected females, allowing the maternally transmitted symbiont to spread rapidly through host populations. We identified homologs of the genes underlying this phenotype, cifA and cifB, in 52 of 71 new and published Wolbachia genome sequences. They are strongly associated with cytoplasmic incompatibility. There are up to seven copies of the genes in each genome, and phylogenetic analysis shows that Wolbachia frequently acquires new copies due to pervasive horizontal transfer between strains. In many cases, the genes have subsequently acquired loss-of-function mutations to become pseudogenes. As predicted by theory, this tends to occur first in cifB, whose sole function is to modify sperm, and then in cifA, which is required to rescue the cross in females. Although cif genes recombine, recombination is largely restricted to closely related homologs. This is predicted under a model of coevolution between sperm modification and embryonic rescue, where recombination between distantly related pairs of genes would create a self-incompatible strain. Together, these patterns of gene gain, loss, and recombination support evolutionary models of cytoplasmic incompatibility.

geneCo: a visualized comparative genomic method to analyze multiple genome structures

Bioinformatics ◽

10.1093/bioinformatics/btz596 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5303-5305 ◽

Cited By ~ 4

Author(s):

Jaehee Jung ◽

Jong Im Kim ◽

Gangman Yi

Keyword(s):

Genome Structure ◽

Software Tool ◽

Detailed Comparison ◽

Supplementary Information ◽

Comparative Genomic ◽

Web Based ◽

Computational Environment ◽

Gene Comparison ◽

User Data ◽

Gain Loss

Abstract Summary In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. Availability and implementation Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. Supplementary information Supplementary data are available at Bioinformatics online.

A Genomic Survey of Signalling in the Myxococcaceae

Microorganisms ◽

10.3390/microorganisms8111739 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1739

Author(s):

David E. Whitworth ◽

Allison Zwarycz

Keyword(s):

Core Genome ◽

Gene Gain ◽

Fruiting Body Formation ◽

Two Component System ◽

Genome Sequences ◽

The Core ◽

Accessory Genes ◽

Two Component ◽

Gain Loss ◽

Core Genes

As prokaryotes diverge by evolution, essential ‘core’ genes required for conserved phenotypes are preferentially retained, while inessential ‘accessory’ genes are lost or diversify. We used the recently expanded number of myxobacterial genome sequences to investigate the conservation of their signalling proteins, focusing on two sister genera (Myxococcus and Corallococcus), and on a species within each genus (Myxococcus xanthus and Corallococcus exiguus). Four new C. exiguus genome sequences are also described here. Despite accessory genes accounting for substantial proportions of each myxobacterial genome, signalling proteins were found to be enriched in the core genome, with two-component system genes almost exclusively so. We also investigated the conservation of signalling proteins in three myxobacterial behaviours. The linear carotenogenesis pathway was entirely conserved, with no gene gain/loss observed. However, the modular fruiting body formation network was found to be evolutionarily plastic, with dispensable components in all modules (including components required for fruiting in the model myxobacterium M. xanthus DK1622). Quorum signalling (QS) is thought to be absent from most myxobacteria, however, they generally appear to be able to produce CAI-I (cholerae autoinducer-1), to sense other QS molecules, and to disrupt the QS of other organisms, potentially important abilities during predation of other prokaryotes.

progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

PLoS ONE ◽

10.1371/journal.pone.0011147 ◽

2010 ◽

Vol 5 (6) ◽

pp. e11147 ◽

Cited By ~ 2179

Author(s):

Aaron E. Darling ◽

Bob Mau ◽

Nicole T. Perna

Keyword(s):

Gene Gain ◽

Genome Alignment ◽

Multiple Genome ◽

Gain Loss ◽

Multiple Genome Alignment

Wilkinson Support Calculated with Exact Probabilities: An Example Using Floricaula/LEAFY Amino Acid Sequences that Compares Three Hypotheses Involving Gene Gain/Loss in Seed Plants

Molecular Biology and Evolution ◽

10.1093/oxfordjournals.molbev.a026293 ◽

2000 ◽

Vol 17 (12) ◽

pp. 1914-1925 ◽

Cited By ~ 4

Author(s):

Michael W. Frohlich ◽

George F. Estabrook

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Seed Plants ◽

Gene Gain ◽

Gain Loss

SENSITIVITY ANALYSIS FOR REVERSAL DISTANCE AND BREAKPOINT REUSE IN GENOME REARRANGEMENTS

Biocomputing 2008 ◽

10.1142/9789812776136_0005 ◽

2007 ◽

Cited By ~ 1

Author(s):

AMIT U SINHA ◽

JAROSLAW MELLER

Keyword(s):

Sensitivity Analysis ◽

Genome Rearrangements ◽

Breakpoint Reuse