scholarly journals Computational design of genes encoding completely overlapping protein domains: Influence of genetic code and taxonomic rank

2020 ◽  
Author(s):  
Stefan Wichmann ◽  
Siegfried Scherer ◽  
Zachary Ardern

AbstractOverlapping genes (OLGs) with long protein-coding overlapping sequences are often excluded by genome annotation programs, with the exception of virus genomes. A recent study used a novel algorithm to construct OLGs from arbitrary protein domain pairs and concluded that virus genes are best suited for creating OLGs, a result which fitted with common assumptions. However, improving sequence evaluation using Hidden Markov Models shows that the previous result is an artifact originating from dataset-database biases. When parameters for OLG design and evaluation are optimized we find that 94.5% of the constructed OLG pairs score at least as highly as naturally occurring sequences, while 9.6% of the artificial OLGs cannot be distinguished from typical sequences in their protein family. Constructed OLG sequences are also indistinguishable from natural sequences in terms of amino acid identity and secondary structure, while the minimum nucleotide change required for overprinting an overlapping sequence can be as low as 1.8% of the sequence. Separate analysis of datasets containing only sequences from either archaea, bacteria, eukaryotes or viruses showed that, surprisingly, virus genes are much less suitable for designing OLGs than bacterial or eukaryotic genes. An important factor influencing OLG design is the structure of the standard genetic code. Success rates in different reading frames strongly correlate with their code-determined respective amino acid constraints. There is a tendency indicating that the structure of the standard genetic code could be optimized in its ability to create OLGs while conserving mutational robustness. The findings reported here add to the growing evidence that OLGs should no longer be excluded in prokaryotic genome annotations. Determining the factors facilitating the computational design of artificial overlapping genes may improve our understanding of the origin of these remarkable genetic constructs and may also open up exciting possibilities for synthetic biology.

Life ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 773
Author(s):  
Ádám Radványi ◽  
Ádám Kun

The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.


2016 ◽  
Vol 14 (3) ◽  
pp. 275-298 ◽  
Author(s):  
Natasa Misic

This paper represents the preliminary results and conclusions on the one of fundamental questions of the genetic code related to the underlying selective mechanisms involved in its origin and evolution, in particular their hypothetical different nature, originally considered in [1,2,3]. A novel approach is introduced, based on known arithmetic regularities inside the genetic code, determined by the nucleon balances of amino acids and their divisibility by the decimal number 37 [4]. As a parameter of the genetic code systematization is introduced an aggregate nucleon number of amino acid and cognate codon, while divisibility test is carried out not only by the number 37, but also by 13.7, the selfsimilarity constant of decimal scaling [5]. Relevant nucleon sums were obtained for the most prominent divisions of the standard genetic code (SGC) according to p-adic model of the vertebrate mitochondrial code (VMC) in [6]. The nucleon number divisibility pattern of 37 and 13.7 for the RNA and DNA codon space, as well as for the amino acid space is also analyzed. The obtained results, particularly a general higher divisibility of the nucleon sums by the numbers 37 and 13.7 in SGC than in VMC, as well as a correspondence between the nucleon number divisibility pattern of both the RNA codon space and the amino acid space of SGC, how separately so conjointly, with the code degeneracy pattern, suggest some conclusions: support the hypothesis [1,2,3,7] that the selective driving forces acting during an emergence (an ancient phase) and an evolution (a modern phase) of the genetic code are different, imply the existence of an environmental-dependent stereochemical mechanism throughout the entire period of the genetic code emergence and support a mineral-mediated origin of the genetic code [7,8].


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 409
Author(s):  
Tamara L. Hendrickson ◽  
Whitney N. Wood ◽  
Udumbara M. Rathnayake

The twenty amino acids in the standard genetic code were fixed prior to the last universal common ancestor (LUCA). Factors that guided this selection included establishment of pathways for their metabolic synthesis and the concomitant fixation of substrate specificities in the emerging aminoacyl-tRNA synthetases (aaRSs). In this conceptual paper, we propose that the chemical reactivity of some amino acid side chains (e.g., lysine, cysteine, homocysteine, ornithine, homoserine, and selenocysteine) delayed or prohibited the emergence of the corresponding aaRSs and helped define the amino acids in the standard genetic code. We also consider the possibility that amino acid chemistry delayed the emergence of the glutaminyl- and asparaginyl-tRNA synthetases, neither of which are ubiquitous in extant organisms. We argue that fundamental chemical principles played critical roles in fixation of some aspects of the genetic code pre- and post-LUCA.


2019 ◽  
Vol 464 ◽  
pp. 21-32 ◽  
Author(s):  
Paweł Błażej ◽  
Małgorzata Wnętrzak ◽  
Dorota Mackiewicz ◽  
Przemysław Gagat ◽  
Paweł Mackiewicz

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ádám Radványi ◽  
Ádám Kun

AbstractThe mutational robustness of the genetic code is rarely discussed in the context of biological diversity, such as codon usage and related factors, often considered as independent of the actual organism’s proteome. Here we put the living beings back to picture and use distortion as a metric of mutational robustness. Distortion estimates the expected severities of non-synonymous mutations measuring it by amino acid physicochemical properties and weighting for codon usage. Using the biological variance of codon frequencies, we interpret the mutational robustness of the standard genetic code with regards to their corresponding environments and genomic compositions (GC-content). Employing phylogenetic analyses, we show that coding fidelity in physicochemical properties can deteriorate with codon usages adapted to extreme environments and these putative effects are not the artefacts of phylogenetic bias. High temperature environments select for codon usages with decreased mutational robustness of hydrophobic, volumetric, and isoelectric properties. Selection at high saline concentrations also leads to reduced fidelity in polar and isoelectric patterns. These show that the genetic code performs best with mesophilic codon usages, strengthening the view that LUCA or its ancestors preferred lower temperature environments. Taxonomic implications, such as rooting the tree of life, are also discussed.


2021 ◽  
Author(s):  
Louis R Nemzer

A three-dimensional representation of the twenty proteinogenic amino acids in a physicochemical space is presented. Vectors corresponding to amino acid substitutions are classified based on whether they are accessible via a single-nucleotide mutation. It is shown that the standard genetic code establishes a "choice architecture" that permits nearly independent tuning of the properties related with size and those related with hydrophobicity. This work sheds light on the metarules of evolvability that may have shaped the standard genetic code to increase the probability that adaptive point mutations will be generated. An illustration of the usefulness of visualizing amino acid substitutions in a 3D physicochemical space is shown using data collected from the SARS-CoV-2 receptor binding domain. The substitutions most responsible for antibody escape are almost always inaccessible via single nucleotide mutation, and also change multiple properties concurrently. The results of this research can extend our understanding of certain hereditary disorders caused by point mutations, as well as guide the development of rational protein and vaccine design.


2018 ◽  
Vol 2 (4) ◽  
pp. 607-618 ◽  
Author(s):  
Jean-François Brugère ◽  
John F. Atkins ◽  
Paul W. O'Toole ◽  
Guillaume Borrel

The 22nd amino acid discovered to be directly encoded, pyrrolysine, is specified by UAG. Until recently, pyrrolysine was only known to be present in archaea from a methanogenic lineage (Methanosarcinales), where it is important in enzymes catalysing anoxic methylamines metabolism, and a few anaerobic bacteria. Relatively new discoveries have revealed wider presence in archaea, deepened functional understanding, shown remarkable carbon source-dependent expression of expanded decoding and extended exploitation of the pyrrolysine machinery for synthetic code expansion. At the same time, other studies have shown the presence of pyrrolysine-containing archaea in the human gut and this has prompted health considerations. The article reviews our knowledge of this fascinating exception to the ‘standard’ genetic code.


Life ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 975
Author(s):  
Alexander Nesterov-Mueller ◽  
Roman Popov

Combinatorial fusion cascade was proposed as a transition stage between prebiotic chemistry and early forms of life. The combinatorial fusion cascade consists of three stages: eight initial complimentary pairs of amino acids, four protocodes, and the standard genetic code. The initial complimentary pairs and the protocodes are divided into dominant and recessive entities. The transitions between these stages obey the same combinatorial fusion rules for all amino acids. The combinatorial fusion cascade mathematically describes the codon assignments in the standard genetic code. It explains the availability of amino acids with the even and odd numbers of codons, the appearance of stop codons, inclusion of novel canonical amino acids, exceptional high numbers of codons for amino acids arginine, leucine, and serine, and the temporal order of amino acid inclusion into the genetic code. The temporal order of amino acids within the cascade is congruent with the consensus temporal order previously derived from the similarities between the available hypotheses. The control over the combinatorial fusion cascades would open the road for a novel technology to develop artificial microorganisms.


2021 ◽  
Author(s):  
Michael Yarus

AbstractMinimally-evolved codes are constructed with randomly chosen Standard Genetic Code (SGC) triplets, and completed with completely random triplet assignments. Such “genetic codes” have not evolved, but retain SGC qualities. Retained qualities are inescapable, part of the logic of code evolution. For example, sensitivity of coding to arbitrary assignments, which must be <≈ 10%, is intrinsic. Such sensitivity comes from elementary combinatorial properties of coding, and constrains any SGC evolution hypothesis. Similarly, evolution of last-evolved functions is difficult, due to late kinetic phenomena, likely common across codes. Census of minimally-evolved code assignments shows that shape and size of wobble domains controls packing into a coding table, shifting the accuracy of codon assignments. Access to the SGC therefore requires a plausible pathway to limited randomness, avoiding difficult completion while packing a highly ordered, degenerate code into a fixed three-dimensional space. Late Crick wobble in a 3-dimensional genetic code previously assembled by lateral transfer satisfies these varied, simultaneous requirements. By allowing parallel evolution of SGC domains, it can yield shortened evolution to SGC-level order, and allow the code to arise in smaller populations. It effectively yields full codes. Less obviously, it unifies well-studied sources for order in amino acid coding, including a minority of stereochemical triplet-amino acid associations. Finally, fusion of its intermediates into the definitive SGC is credible, mirroring broadly-accepted later events in cellular evolution.


Sign in / Sign up

Export Citation Format

Share Document