scholarly journals A spectrum of verticality across genes

PLoS Genetics ◽  
2020 ◽  
Vol 16 (11) ◽  
pp. e1009200 ◽  
Author(s):  
Falk S. P. Nagies ◽  
Julia Brueckner ◽  
Fernando D. K. Tria ◽  
William F. Martin

Lateral gene transfer (LGT) has impacted prokaryotic genome evolution, yet the extent to which LGT compromises vertical evolution across individual genes and individual phyla is unknown, as are the factors that govern LGT frequency across genes. Estimating LGT frequency from tree comparisons is problematic when thousands of genomes are compared, because LGT becomes difficult to distinguish from phylogenetic artefacts. Here we report quantitative estimates for verticality across all genes and genomes, leveraging a well-known property of phylogenetic inference: phylogeny works best at the tips of trees. From terminal (tip) phylum level relationships, we calculate the verticality for 19,050,992 genes from 101,422 clusters in 5,655 prokaryotic genomes and rank them by their verticality. Among functional classes, translation, followed by nucleotide and cofactor biosynthesis, and DNA replication and repair are the most vertical. The most vertically evolving lineages are those rich in ecological specialists such as Acidithiobacilli, Chlamydiae, Chlorobi and Methanococcales. Lineages most affected by LGT are the α-, β-, γ-, and δ- classes of Proteobacteria and the Firmicutes. The 2,587 eukaryotic clusters in our sample having prokaryotic homologues fail to reject eukaryotic monophyly using the likelihood ratio test. The low verticality of α-proteobacterial and cyanobacterial genomes requires only three partners—an archaeal host, a mitochondrial symbiont, and a plastid ancestor—each with mosaic chromosomes, to directly account for the prokaryotic origin of eukaryotic genes. In terms of phylogeny, the 100 most vertically evolving prokaryotic genes are neither representative nor predictive for the remaining 97% of an average genome. In search of factors that govern LGT frequency, we find a simple but natural principle: Verticality correlates strongly with gene distribution density, LGT being least likely for intruding genes that must replace a preexisting homologue in recipient chromosomes. LGT is most likely for novel genetic material, intruding genes that encounter no competing copy.

2016 ◽  
Vol 113 (41) ◽  
pp. 11399-11407 ◽  
Author(s):  
Itamar Sela ◽  
Yuri I. Wolf ◽  
Eugene V. Koonin

Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.


Author(s):  
Yan-Ting Jin ◽  
Cong Ma ◽  
Xin Wang ◽  
Shu-Xuan Wang ◽  
Kai-Yue Zhang ◽  
...  

AbstractIn 2002, our research group observed a gene clustering pattern based on the base frequency of A versus T at the second codon position in the genome of Vibrio cholera and found that the functional category distribution of genes in the two clusters was different. With the availability of a large number of sequenced genomes, we performed a systematic investigation of A2–T2 distribution and found that 2694 out of 2764 prokaryotic genomes have an optimal clustering number of two, indicating a consistent pattern. Analysis of the functional categories of the coding genes in each cluster in 1483 prokaryotic genomes indicated, that 99.33% of the genomes exhibited a significant difference (p < 0.01) in function distribution between the two clusters. Specifically, functional category P was overrepresented in the small cluster of 98.65% of genomes, whereas categories J, K, and L were overrepresented in the larger cluster of over 98.52% of genomes. Lineage analysis uncovered that these preferences appear consistently across all phyla. Overall, our work revealed an almost universal clustering pattern based on the relative frequency of A2 versus T2 and its role in functional category preference. These findings will promote the understanding of the rationality of theoretical prediction of functional classes of genes from their nucleotide sequences and how protein function is determined by DNA sequence. Graphical abstract


2009 ◽  
Vol 07 (01) ◽  
pp. 19-38 ◽  
Author(s):  
GUOJUN LI ◽  
DONGSHENG CHE ◽  
YING XU

Identification of operons at the genome scale of prokaryotic organisms represents a key step in deciphering of their transcriptional regulation machinery, biological pathways, and networks. While numerous computational methods have been shown to be effective in predicting operons for well-studied organisms such as Escherichia coli K12 and Bacillus subtilis 168, these methods generally do not generalize well to genomes other than the ones used to train the methods, or closely related genomes because they rely on organism–specific information. Several methods have been explored to address this problem through utilizing only genomic structural information conserved across multiple organisms, but they all suffer from the issue of low prediction sensitivity. In this paper, we report a novel operon prediction method that is applicable to any prokaryotic genome with high prediction accuracy. The key idea of the method is to predict operons through identification of conserved gene clusters across multiple genomes and through deriving a key parameter relevant to the distribution of intergenic distances in genomes. We have implemented this method using a graph-theoretic approach, to calculate a set of maximum gene clusters in the target genome that are conserved across multiple reference genomes. Our computational results have shown that this method has higher prediction sensitivity as well as specificity than most of the published methods. We have carried out a preliminary study on operons unique to archaea and bacteria, respectively, and derived a number of interesting new insights about operons between these two kingdoms. The software and predicted operons of 365 prokaryotic genomes are available at .


2018 ◽  
Author(s):  
Itamar Sela ◽  
Yuri I. Wolf ◽  
Eugene V. Koonin

In prokaryotic genomes, the number of genes that belong to distinct functional classes shows apparent universal scaling with the total number of genes [1–5] (Fig. 1). This scaling can be approximated with a power law, where the scaling power can be sublinear, near-linear or super-linear. Scaling laws are robust under various statistical tests [4], across different databases and for different gene classifications [1–5]. Several models aimed at explaining the observed scaling laws have been proposed, primarily, based on the specifics of the respective biological functions [1, 5–8]. However, a coherent theory to explain the emergence of scaling within the framework of population genetics is lacking. We employ a simple mathematical model for prokaryotic genome evolution [9] which, together with the analysis of 34 clusters of closely related microbial genomes [10], allows us to identify the underlying forces that dictate genome content evolution. In addition to the scaling of the number of genes in different functional classes, we explore gene contents divergence to characterize the evolutionary processes acting upon genomes [11]. We find that evolution of the gene content is dominated by two factors that are specific to a functional class, namely, selection landscape and genome plasticity. Selection landscape quantifies the fitness cost that is associated with deletion of a gene in a given functional class or the advantage of successful incorporation of an additional gene. Genome plasticity, that can be considered a measure of evolvability, reflects both the availability of the genes of a given functional class in the external gene pool that is accessible to the evolving microbial population, and the ability of microbial genomes to accommodate these genes. The selection landscape determines the gene loss rate, and genome plasticity is the principal determinant of the gene gain rate.


2015 ◽  
Vol 83 (4) ◽  
pp. 1305-1317 ◽  
Author(s):  
Nicolas Dreux ◽  
Maria del Mar Cendra ◽  
Sébastien Massier ◽  
Arlette Darfeuille-Michaud ◽  
Nicolas Barnich ◽  
...  

A critical step in the life cycle of all organisms is the duplication of the genetic material during cell division. Ribonucleotide reductases (RNRs) are essential enzymes for this step because they control thede novoproduction of the deoxyribonucleotides required for DNA synthesis and repair.Enterobacteriaceaehave three functional classes of RNRs (Ia, Ib, and III), which are transcribed from separate operons and encoded by the genesnrdAB,nrdHIEF, andnrdDG, respectively. Here, we investigated the role of RNRs in the virulence of adherent-invasiveEscherichia coli(AIEC) isolated from Crohn's disease (CD) patients. Interestingly, the LF82 strain of AIEC harbors four different RNRs (two class Ia, one class Ib, and one class III). Although theE. coliRNR enzymes have been extensively characterized both biochemically and enzymatically, little is known about their roles during bacterial infection. We found that RNR expression was modified in AIEC LF82 bacteria during cell infection, suggesting that RNRs play an important role in AIEC virulence. Knockout of thenrdRandnrdDgenes, which encode a transcriptional regulator of RNRs and class III anaerobic RNR, respectively, decreased AIEC LF82's ability to colonize the gut mucosa of transgenic mice that express human CEACAM6 (carcinoembryonicantigen-relatedcelladhesionmolecule 6). Microarray experiments demonstrated that NrdR plays an indirect role in AIEC virulence by interfering with bacterial motility and chemotaxis. Thus, the development of drugs targeting RNR classes, in particular NrdR and NrdD, could be a promising new strategy to control gut colonization by AIEC bacteria in CD patients.


Genes ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 623 ◽  
Author(s):  
María Fernández-Casañas ◽  
Kok-Lung Chan

Accurate duplication and transmission of identical genetic information into offspring cells lies at the heart of a cell division cycle. During the last stage of cellular division, namely mitosis, the fully replicated DNA molecules are condensed into X-shaped chromosomes, followed by a chromosome separation process called sister chromatid disjunction. This process allows for the equal partition of genetic material into two newly born daughter cells. However, emerging evidence has shown that faithful chromosome segregation is challenged by the presence of persistent DNA intertwining structures generated during DNA replication and repair, which manifest as so-called ultra-fine DNA bridges (UFBs) during anaphase. Undoubtedly, failure to disentangle DNA linkages poses a severe threat to mitosis and genome integrity. This review will summarize the possible causes of DNA bridges, particularly sister DNA inter-linkage structures, in an attempt to explain how they may be processed and how they influence faithful chromosome segregation and the maintenance of genome stability.


2018 ◽  
Author(s):  
Na L. Gao ◽  
Jingchao Chen ◽  
Martin J Lercher ◽  
Wei-Hua Chen

AbstractBacteriophages and plasmids can introduce novel DNA into bacterial cells, thereby creating an opportunity for genome expansion; conversely, CRISPR, the prokaryotic adaptive immune system, which targets and eliminates foreign DNAs, may impair genome expansions. Recent studies presented conflicting results over the impact of CRISPR on genome expansion. In this study, we assembled a comprehensive dataset of prokaryotic genomes and identified their associations with phages and plasmids. We found that genomes associated with phages and/or plasmids were significantly larger than those without, indicating that both phages and plasmids contribute to genome expansion. Genomes were increasingly larger with increasing numbers of associated phages or plasmids. Conversely, genomes with CRISPR systems were significantly smaller than those without, indicating that CRISPR has a negative impact on genome size. These results confirmed that on evolutionary timescales, bacteriophages and plasmids facilitate genome expansion, while CRISPR impairs such a process in prokaryotes. Furthermore, our results also revealed that CRISPR systems show a strong preference for targeting phages over plasmids.


2017 ◽  
Author(s):  
Ulrich Omasits ◽  
Adithi R. Varadarajan ◽  
Michael Schmid ◽  
Sandra Goetze ◽  
Damianos Melidis ◽  
...  

AbstractAccurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations.Our strategy towards accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources,ab initiogene prediction algorithms andin silicoORFs in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensiveBartonella henselaeproteomics dataset against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and variants identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin, and release iPtgxDBs forB. henselae,Bradyrhozibium diazoefficiensandEscherichia colias well as the software to generate such proteogenomics search databases for any prokaryote.


2017 ◽  
Vol 114 (28) ◽  
pp. E5616-E5624 ◽  
Author(s):  
Jaime Iranzo ◽  
José A. Cuesta ◽  
Susanna Manrubia ◽  
Mikhail I. Katsnelson ◽  
Eugene V. Koonin

We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements (MGE). An exact solution for the dynamics of gene family size was obtained under a linear duplication–transfer–loss model with selection. With the exception of genes involved in information processing, particularly translation, which are maintained by strong selection, the average selection coefficient for most nonparasitic genes is low albeit positive, compatible with observed positive correlation between genome size and effective population size. Free-living microbes evolve under stronger selection for gene retention than parasites. Different classes of MGE show a broad range of fitness effects, from the nearly neutral transposons to prophages, which are actively eliminated by selection. Genes involved in antiparasite defense, on average, incur a fitness cost to the host that is at least as high as the cost of plasmids. This cost is probably due to the adverse effects of autoimmunity and curtailment of horizontal gene transfer caused by the defense systems and selfish behavior of some of these systems, such as toxin–antitoxin and restriction modification modules. Transposons follow a biphasic dynamics, with bursts of gene proliferation followed by decay in the copy number that is quantitatively captured by the model. The horizontal gene transfer to loss ratio, but not duplication to loss ratio, correlates with genome size, potentially explaining increased abundance of neutral and costly elements in larger genomes.


2017 ◽  
Author(s):  
Alexandre Lomsadze ◽  
Karl Gemayel ◽  
Shiyuyun Tang ◽  
Mark Borodovsky

ABSTRACTIn a conventional view of the prokaryotic genome organization promoters precede operons and RBS sites with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of pre-computed heuristic models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as non-canonical RBS patterns. To assess the accuracy of GeneMarkS-2 we used genes validated by COG annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5,000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts.[Supplemental material is available for this article].


Sign in / Sign up

Export Citation Format

Share Document