scholarly journals The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships

2021 ◽  
Author(s):  
Massimo Di Giulio ◽  
Franco Caldararo

We used the Moran's I index of global spatial autocorrelation with the aim of studying the distribution of the physicochemical or biological properties of amino acids within the genetic code table. First, using this index we are able to identify the amino acid property - among the 530 analyzed - that best correlates with the organization of the genetic code in the set of amino acid permutation codes. Considering, then, a model suggested by the coevolution theory of the genetic code origin - which in addition to the biosynthetic relationships between amino acids took into account also their physicochemical properties - we investigated the level of optimization achieved by these properties either on the entire genetic code table, or only on its columns or only on its rows. Specifically, we estimated the optimization achieved in the restricted set of amino acid permutation codes subject to the constraints derived from the biosynthetic classes of amino acids, in which we identify the most optimized amino acid property among all those present in the database. Unlike what has been claimed in the literature, it would appear that it was not the polarity of amino acids that structured the genetic code, but that it could have been their partition energy instead. In actual fact, it would seem to reach an optimization level of about 96% on the whole table of the genetic code and 98% on its columns. Given that this result has been obtained for amino acid permutation codes subject to biosynthetic constraints, that is to say, for a model of the genetic code consistent with the coevolution theory, we should consider the following conclusions reasonable. (i) The coevolution theory might be corroborated by these observations because the model used referred to the biosynthetic relationships between amino acids, which are suggested by this theory as having been fundamental in structuring the genetic code. (ii) The very high optimization on the columns of the genetic code would not only be compatible but would further corroborate the coevolution theory because this suggests that, as the genetic code was structured along its rows by the biosynthetic relationships of amino acids, on its columns strong selective pressure might have been put in place to minimize, for example, the deleterious effects of translation errors. (iii) The finding that partition energy could be the most optimized property of amino acids in the genetic code would in turn be consistent with one of the main predictions of the coevolution theory. In other words, since the partition energy is reflective of the protein structure and therefore of the enzymatic catalysis, the latter might really have been the main selective pressure that would have promoted the origin of the genetic code. Indeed, we observe that the beta-strands show an optimization percentage of 94.45%, so it is possible to hypothesize that they might have become the object of selection during the origin of the genetic code, conditioning the choice of biosynthetic relationships between amino acids. (iv) The finding that the polarity of amino acids is less optimized than their partition energy in the genetic code table might be interpreted against the physicochemical theories of the origin of the genetic code because these would suggest, for example, that a very high optimization of the polarity of amino acids in the code could be an expression of interactions between amino acids and codons or anticodons, which would have promoted their origin. This might now become less sustainable, given the very high optimization that is instead observed in favor of partition energy but not polarity. Finally, (v) the very high optimization of the partition energy of amino acids would seem to make a neutral origin of the ability of the genetic code to buffer, for example, the deleterious effects of translation errors very unlikely. Indeed, an optimization of about 100% would seem that it might not have been achieved by a simple neutral process, but this ability should probably have been generated instead by the intervention of natural selection. In actual fact, we show that the neutral hypothesis of the origin of error minimization has been falsified for the model analyzed here. Therefore, we will discuss our observations within the theories proposed to explain the origin of the organization of the genetic code, reaching the conclusion that the coevolution theory is the most strongly corroborated theory.

1991 ◽  
Vol 46 (3-4) ◽  
pp. 305-312 ◽  
Author(s):  
Massimo Di Giulio

This paper analyzes the relationships between the genetic code coevolution hypothesis and the physicochemical hypothesis by means of a comparative study of the precursor-product amino acid pairs on which the former hypothesis is based. Even if the coevolution between the biosynthetic relationships of amino acids and the organization of the genetic code is not questioned in this paper, the results and the arguments used lead us to believe that the selective pressures considered essential by the physicochemical postulates, played a more active role than that of the precursor-product relationships in defining the allocation of these amino acids in the genetic code. It is furthermore pointed out that the two evolutionary hypothesis might be aspects of the same selective pressure, and thus difficult to differentiate.


2015 ◽  
Vol 9 (1) ◽  
pp. 216-220
Author(s):  
Zhang Dakun ◽  
Song Guozhi ◽  
Huang Cui

All proteins are made up of 20 different amino acids which contain 4 kinds of nucleotides . Three consecutive nucleotides on the gene, called triplet codons, are used to code an amino acid, and 64 triplet codons comprise the genetic code table. Central dogma (DNA-RNA-protein) has been acknowledged, but the process and mechanism of mRNA passing through the nuclear membrane still require further investigation. For these two problems mentioned above, this paper proposed a conjecture of nucleotide free triplet and obtained 20 equivalence classes of mapping from free triplet vertex set to nucleotide set using group theory. Whether the four numbers 3, 4, 20 and 64 have relevance are taken into consideration here. Subsequently, the numbers 3, 4, 20 and 64 were connected together which was important for the analysis of triplet code and protein composition.


Amino Acids ◽  
2020 ◽  
Author(s):  
Thomas L. Williams ◽  
Debra J. Iskandar ◽  
Alexander R. Nödling ◽  
Yurong Tan ◽  
Louis Y. P. Luk ◽  
...  

AbstractGenetic code expansion is a powerful technique for site-specific incorporation of an unnatural amino acid into a protein of interest. This technique relies on an orthogonal aminoacyl-tRNA synthetase/tRNA pair and has enabled incorporation of over 100 different unnatural amino acids into ribosomally synthesized proteins in cells. Pyrrolysyl-tRNA synthetase (PylRS) and its cognate tRNA from Methanosarcina species are arguably the most widely used orthogonal pair. Here, we investigated whether beneficial effect in unnatural amino acid incorporation caused by N-terminal mutations in PylRS of one species is transferable to PylRS of another species. It was shown that conserved mutations on the N-terminal domain of MmPylRS improved the unnatural amino acid incorporation efficiency up to five folds. As MbPylRS shares high sequence identity to MmPylRS, and the two homologs are often used interchangeably, we examined incorporation of five unnatural amino acids by four MbPylRS variants at two temperatures. Our results indicate that the beneficial N-terminal mutations in MmPylRS did not improve unnatural amino acid incorporation efficiency by MbPylRS. Knowledge from this work contributes to our understanding of PylRS homologs which are needed to improve the technique of genetic code expansion in the future.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
J. A. Tenreiro Machado ◽  
António C. Costa ◽  
Maria Dulce Quelhas

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.


2019 ◽  
Author(s):  
Arne Elofsson

1AbstractIt is well known that the GC content varies enormously between organisms; this is believed to be caused by a combination of mutational preferences and selective pressure. Within coding regions, the variation of GC is more substantial in position three and smaller in position one and two. Less well known is that this variation also has an enormous impact on the frequency of amino acids as their codons vary in GC content. For instance, the fraction of alanines in different proteomes varies from 1.1% to 16.5%. In general, the frequency of different amino acids correlates strongly with the number of codons, the GC content of these codons and the genomic GC contents. However, there are clear and systematic deviations from the expected frequencies. Some amino acids are more frequent than expected by chance, while others are less frequent. A plausible model to explain this is that there exist two different selective forces acting on the genes; First, there exists a force acting to maintain the overall GC level and secondly there exists a selective force acting on the amino acid level. Here, we use the divergence in amino acid frequency from what is expected by the GC content to analyze the selective pressure acting on codon frequencies in the three kingdoms of life. We find four major selective forces; First, the frequency of serine is lower than expected in all genomes, but most in prokaryotes. Secondly, there exist a selective pressure acting to balance positively and negatively charged amino acids, which results in a reduction of arginine and negatively charged amino acids. This results in a reduction of arginine and all the negatively charged amino acids. Thirdly, the frequency of the hydrophobic residues encoded by a T in the second codon position does not change with GC. Their frequency is lower in eukaryotes than in prokaryotes. Finally, some amino acids with unique properties, such as proline glycine and proline, are limited in their frequency variation.


2021 ◽  
Author(s):  
Isabella Tolle ◽  
Stefan Oehm ◽  
Michael Georg Hoesl ◽  
Christin Treiber-Kleinke ◽  
Lauri Peil ◽  
...  

ABSTRACTBillions of years of evolution have produced only slight variations in the standard genetic code, and the number and identity of proteinogenic amino acids have remained mostly consistent throughout all three domains of life. These observations suggest a certain rigidity of the genetic code and prompt musings as to the origin and evolution of the code. Here we conducted an adaptive laboratory evolution (ALE) to push the limits of the code restriction, by evolving Escherichia coli to fully replace tryptophan, thought to be the latest addition to the genetic code, with the analog L-β-(thieno[3,2-b]pyrrolyl)alanine ([3,2]Tpa). We identified an overshooting of the stress response system to be the main inhibiting factor for limiting ancestral growth upon exposure to β-(thieno[3,2-b]pyrrole ([3,2]Tp), a metabolic precursor of [3,2]Tpa, and Trp limitation. During the ALE, E. coli was able to “calm down” its stress response machinery, thereby restoring growth. In particular, the inactivation of RpoS itself, the master regulon of the general stress response, was a key event during the adaptation. Knocking out the rpoS gene in the ancestral background independent of other changes conferred growth on [3,2]Tp. Our results add additional evidence that frozen regulatory constraints rather than a rigid protein translation apparatus are Life’s gatekeepers of the canonical amino acid repertoire. This information will not only enable us to design enhanced synthetic amino acid incorporation systems but may also shed light on a general biological mechanism trapping organismal configurations in a status quo.SIGNIFICANCE STATEMENTThe (apparent) rigidity of the genetic code, as well as its universality, have long since ushered explorations into expanding the code with synthetic, new-to-nature building blocks and testing its boundaries. While nowadays even proteome-wide incorporation of synthetic amino acids has been reported on several occasions1–3, little is known about the underlying mechanisms.We here report ALE with auxotrophic E. coli that yielded successful proteome-wide replacement of Trp by its synthetic analog [3,2]Tpa accompanied with the selection for loss of RpoS4 function. Such laboratory domestication of bacteria by the acquisition of rpoS mitigation mutations is beneficial not only to overcome the stress of nutrient (Trp) starvation but also to evolve the paths to use environmental xenobiotics (e.g. [3,2]Tp) as essential nutrients for growth.We pose that regulatory constraints rather than a rigid and conserved protein translation apparatus are Life’s gatekeepers of the canonical amino acid repertoire (at least where close structural analogs are concerned). Our findings contribute a step towards understanding possible environmental causes of genetic changes and their relationship to evolution.Our evolved strain affords a platform for homogenous protein labeling with [3,2]Tpa as well as for the production of biomolecules5, which are challenging to synthesize chemically. Top-down synthetic biology will also benefit greatly from breaking through the boundaries of the frozen bacterial genetic code, as this will enable us to begin creating synthetic cells capable to utilize an expanded range of substrates essential for life.


2017 ◽  
Author(s):  
Miloje M. Rakocevic

In previous two works [1], [2] we have shown the determination of genetic code by golden and harmonic mean within standard Genetic Code Table, i.e. nucleotide triplet table, whereas in this paper we show the same determination through a specific connection between two tables – of nucleotide doublets Table and triplets Table, over polarity of amino acids, measured by Cloister energy.


2018 ◽  
Author(s):  
Jeffrey I. Boucher ◽  
Troy W. Whitfield ◽  
Ann Dauphin ◽  
Gily Nachum ◽  
Carl Hollins ◽  
...  

AbstractThe evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution are interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available dataset of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple base mutations in HIV-1 protease is strongly influenced by mutational sampling.


Author(s):  
Mazhar MW ◽  
◽  
Raza A ◽  
Sikandar M ◽  
Mahmood J ◽  
...  

The COI sequence of O.laetus was submitted to the Genbank database holding an accession number HQ908084 (Figure1). The amino acid sequence of the corresponding COI gene was also updated under the accession number ADZ05746, which turned out to contain 222 amino acids. Base statistics of the O.laetus COI are presented in Figure 2. It can be seen from the table that the fragment is rich in AT content as expected with thymine occurring most frequently followed by the others in the order A, C & G. The AT% stood at 67.2 in comparison to GC% at 32.8. The protein entry was subjected to family confirmation by searching the InterProScan database and the results indicate a very high and significant match confirming our sequence to be a part of Cytochrome C.


2016 ◽  
Author(s):  
Guang-Zhong Wang

AbstractThe transcriptional and translational systems are essentially information processing systems. However, how to quantify the amount of information decoded during expression remains a mystery. Here, we have proposed a simple method to evaluate the amount of information transcribed and translated during gene expression. We found that although proteins with a high copy number have more information translated, the average number of bits per amino acid is not high. The negative correlation between protein copy number and bits per amino acid indicates the selective pressure to reduce translational errors. Moreover, interacting proteins have similar bits per residue translated. All of these findings highlight the importance of understanding transcription and translation from an information processing perspective.


Sign in / Sign up

Export Citation Format

Share Document