Some theoretical aspects of reprogramming the standard genetic code

AbstractReprogramming of the standard genetic code to include non-canonical amino acids (ncAAs) opens new prospects for medicine, industry, and biotechnology. There are several methods of code engineering, which allow us for storing new genetic information in DNA sequences and producing proteins with new properties. Here, we provided a theoretical background for the optimal genetic code expansion, which may find application in the experimental design of the genetic code. We assumed that the expanded genetic code includes both canonical and non-canonical information stored in 64 classical codons. What is more, the new coding system is robust to point mutations and minimizes the possibility of reversion from the new to old information. In order to find such codes, we applied graph theory to analyze the properties of optimal codon sets. We presented the formal procedure in finding the optimal codes with various number of vacant codons that could be assigned to new amino acids. Finally, we discussed the optimal number of the newly incorporated ncAAs and also the optimal size of codon groups that can be assigned to ncAAs.

Download Full-text

Some theoretical aspects of reprogramming the standard genetic code

10.1101/2020.09.12.294553 ◽

2020 ◽

Author(s):

Kuba Nowak ◽

Paweł Błażej ◽

Małgorzata Wnetrzak ◽

Dorota Mackiewicz ◽

Paweł Mackiewicz

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Dna Sequences ◽

Theoretical Perspective ◽

Optimal Number ◽

Optimal Size ◽

Coding System ◽

Standard Genetic Code ◽

Nucleotide Mutation ◽

Code Extension

1AbstractReprogramming of the standard genetic code in order to include non-canonical amino acids (ncAAs) opens a new perspective in medicine, industry and biotechnology. There are several methods of engineering the code, which allow us for storing new genetic information in DNA sequences and transmitting it into the protein world. Here, we investigate the problem of optimal genetic code extension from theoretical perspective. We assume that the new coding system should encode both canonical and new ncAAs using 64 classical codons. What is more, the extended genetic code should be robust to point nucleotide mutation and minimize the possibility of reversion from new to old information. In order to do so, we follow graph theory to study the properties of optimal codon sets, which can encode 20 canonical amino acids and stop coding signal. Finally, we describe the set of vacant codons that could be assigned to new amino acids. Moreover, we discuss the optimal number of the newly incorporated ncAAs and also the optimal size of codon blocks that are assigned to ncAAs.

Download Full-text

Basic principles of the genetic code extension

10.1101/704908 ◽

2019 ◽

Author(s):

Paweł Błażej ◽

Małgorzata Wnetrzak ◽

Dorota Mackiewicz ◽

Paweł Mackiewicz

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Point Mutations ◽

Coding System ◽

Base Pairs ◽

Induced Subgraphs ◽

Single Nucleotide ◽

Basic Principles ◽

Code Extension ◽

Incremental Addition

AbstractCompounds including non-canonical amino acids or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the non-canonical amino acids can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved to minimize harmful consequences of mutations or mistranslations of encoded proteins.

Download Full-text

Basic principles of the genetic code extension

Royal Society Open Science ◽

10.1098/rsos.191384 ◽

2020 ◽

Vol 7 (2) ◽

pp. 191384

Author(s):

Paweł Błażej ◽

Małgorzata Wnetrzak ◽

Dorota Mackiewicz ◽

Paweł Mackiewicz

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Point Mutations ◽

Coding System ◽

Base Pairs ◽

Induced Subgraphs ◽

Single Nucleotide ◽

Basic Principles ◽

Code Extension ◽

Incremental Addition

Compounds including non-canonical amino acids (ncAAs) or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the ncAAs can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by a six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved minimizing harmful consequences of mutations or mistranslations of encoded proteins.

Download Full-text

The Molecular Genetics of Hemophilia A Stylianos

10.1055/s-0038-1643980 ◽

1987 ◽

Author(s):

E Antonarakis

Keyword(s):

Amino Acids ◽

Dna Sequences ◽

Hemophilia A ◽

Restriction Analysis ◽

Point Mutations ◽

Leader Peptide ◽

Molecular Defect ◽

Severe Hemophilia ◽

Cpg Dinucleotides ◽

Oligonucleotide Hybridization

Hemophilia A is a common X linked hereditary disorder of blood coagulation due to deficiency of factor 8. The gene for factor 8 has been cloned and characterized (Nature 312:326-342, 1984). It is divided into 26 exons and 25 introns and spans 186 kb of DNA. The CGNA is 9 kb and codes for 2351 amino acids. The first 19 amino acids comprise the secretory leader peptide and the mature excreted polypeptide consists of 2332 amino acids. The nucleotide sequence of the exons and the exon-intron junctions is known and the complete amino acid sequence has been deducedSeveral laboratories have used cloned factor 8 DNA sequences as probes to characterized mutations that are responsible for hemophilia A in certain pedigrees. These mutations have been characterized by restriction analysis, oligonucleotide hybridization, cloning and sequencing of DNA from appropriate patientsIn about 500 patients with hemophilia A examined, the molecular defect has been recognized in 39. Both gross alterations (mainly deletions) and point mutations of the factor 8 gene have been found.A total of 19 different deletions have been observed. No two unrelated pedigrees share the same exact deletion.The size of the deleted DNA varies from 1.5 kb to more than 210 kb. All but one of these deletions are associated with severe hemophilia A. A deletion of 6 kb that contains exon 22 only is associated with moderate hemophilia. Some deletions are present in patients with inhibitors to factor 8. No correlation of the size or the position of the deletions can be found with the presence of inhibitors to factor 8.A total of 20 point mutations have been characterized. All are recognized by restriction analysis and involve Taq I sites. All are mutations of CpG dinucleotides and generate nonsense or missence codons. Unrelated pedigrees have the same single nucleotide change because of independent origin of the same mutation. In many instances de novo occurrence of a point mutation has been observed. CpG dinucleotides are hot spots for mutation to TG or CA presumably because of spontaneous deamination of methylcytosine. Some point mutations are present in patients with inhibitors but no correlation of the site of mutation and inhibitor formation has been found. The nonsense mutations are present in patients with severe hemophilia A. A missense mutation (Arg Gin) in exon 26 was found in a patient with mild hemophilia while another Arg Gin mutation in exon 24 has been observed in a patient with severe disease. The creation of a donor splice site in IVS 4 of factor 8 gene has been observed in a patient with mild hemophilia.Few DNA polymorphisms within the factor 8 gene and two other closely linked polymorphisms have been used for carrier detection and prenatal diagnosis of hemophilia A. These DNA markers are useful in more than 90% of families at risk for hemophilia A.The author thanks Drs. Gitschier, Din, Olek, Pirastou, Lawn for communication of their data prior to publication.The hemophilia project at Johns Hopkins was supported by an Institutional grant and NIH grant to S.S.A. and Haig H. Kazazian, Jr.

Download Full-text

Computational Analysis of Genetic Code Variations Optimized for the Robustness against Point Mutations with Wobble-like Effects

Life ◽

10.3390/life11121338 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1338

Author(s):

Elena Fimmel ◽

Markus Gumbel ◽

Martin Starman ◽

Lutz Strüngmann

Keyword(s):

Genetic Code ◽

Computational Analysis ◽

Point Mutations ◽

Weighted Graph ◽

Single Nucleotide Variants ◽

Negative Effects ◽

Standard Genetic Code ◽

Single Nucleotide ◽

Random Code ◽

Optimal Weights

It is believed that the codon–amino acid assignments of the standard genetic code (SGC) help to minimize the negative effects caused by point mutations. All possible point mutations of the genetic code can be represented as a weighted graph with weights that correspond to the probabilities of these mutations. The robustness of a code against point mutations can be described then by means of the so-called conductance measure. This paper quantifies the wobble effect, which was investigated previously by applying the weighted graph approach, and seeks optimal weights using an evolutionary optimization algorithm to maximize the code’s robustness. One result of our study is that the robustness of the genetic code is least influenced by mutations in the third position—like with the wobble effect. Moreover, the results clearly demonstrate that point mutations in the first, and even more importantly, in the second base of a codon have a very large influence on the robustness of the genetic code. These results were compared to single nucleotide variants (SNV) in coding sequences which support our findings. Additionally, it was analyzed which structure of a genetic code evolves from random code tables when the robustness is maximized. Our calculations show that the resulting code tables are very close to the standard genetic code. In conclusion, the results illustrate that the robustness against point mutations seems to be an important factor in the evolution of the standard genetic code.

Download Full-text

Golden and Harmonic Mean in the Genetic Code

10.31219/osf.io/2pfe7 ◽

2017 ◽

Author(s):

Miloje M. Rakocevic

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Harmonic Mean ◽

Standard Genetic Code ◽

Code Table ◽

Nucleotide Triplet

In previous two works [1], [2] we have shown the determination of genetic code by golden and harmonic mean within standard Genetic Code Table, i.e. nucleotide triplet table, whereas in this paper we show the same determination through a specific connection between two tables – of nucleotide doublets Table and triplets Table, over polarity of amino acids, measured by Cloister energy.

Download Full-text

Evolution of the standard genetic code

10.1101/2020.02.20.958546 ◽

2020 ◽

Author(s):

Michael Yarus

Keyword(s):

Genetic Code ◽

Coding System ◽

Standard Genetic Code ◽

Evolutionary Mechanisms ◽

Random Code ◽

Chemical Order ◽

Single Origin ◽

Unique Event ◽

Evolutionary Paths ◽

Universal Standard

AbstractA near-universal Standard Genetic Code (SGC) implies a single origin for Earthly life. To study this unique event, I compute paths to the SGC, comparing different plausible histories. Notably, SGC-like coding emerges from traditional evolutionary mechanisms, and a superior path can be identified.To objectively measure evolution, progress values from 0 (random coding) to 1 (SGC-like) are defined: these measure fractions of random-code-to-SGC distance. Progress types are spacing/distance/delta Polar Requirement, detecting space between identical assignments /mutational distance to the SGC/chemical order, respectively. A coding system was based on known RNAs performing aminoacyl-RNA synthetase reactions. Acceptor RNAs exhibit SGC-like wobble; alternatively, non-wobbling triplets uniquely encode 20 amino acids/start/stop. Triplets acquire 22 functions by stereochemistry, selection, coevolution, or randomly. Assignments also propagate to an assigned triplet’s neighborhood via single mutations, but can also decay.Futile evolutionary paths are plentiful due to the vast code universe. Thus SGC evolution is critically sensitive to disorder from random assignments. Evolution also inevitably slows near coding completion. Coding likely avoided these difficulties, and two suitable paths are compared. In late wobble, a majority of non-wobble assignments are made before wobble is adopted. In continuous wobble, a uniquely advantageous early intermediate supplies the gateway to an ordered SGC. Revised coding evolution (limited randomness, late wobble, concentration on amino acid encoding, chemically conservative coevolution with a chemically-ordered elite) produces varied full codes with excellent joint progress values. A population of only 600 independent coding tables includes SGC-like members; a Bayesian path toward more accurate SGC evolution is available.

Download Full-text

Did Amino Acid Side Chain Reactivity Dictate the Composition and Timing of Aminoacyl-tRNA Synthetase Evolution?

Genes ◽

10.3390/genes12030409 ◽

2021 ◽

Vol 12 (3) ◽

pp. 409

Author(s):

Tamara L. Hendrickson ◽

Whitney N. Wood ◽

Udumbara M. Rathnayake

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Genetic Code ◽

Chemical Reactivity ◽

Trna Synthetase ◽

Amino Acid Side Chain ◽

Standard Genetic Code ◽

Last Universal Common Ancestor ◽

Trna Synthetases ◽

Universal Common Ancestor

The twenty amino acids in the standard genetic code were fixed prior to the last universal common ancestor (LUCA). Factors that guided this selection included establishment of pathways for their metabolic synthesis and the concomitant fixation of substrate specificities in the emerging aminoacyl-tRNA synthetases (aaRSs). In this conceptual paper, we propose that the chemical reactivity of some amino acid side chains (e.g., lysine, cysteine, homocysteine, ornithine, homoserine, and selenocysteine) delayed or prohibited the emergence of the corresponding aaRSs and helped define the amino acids in the standard genetic code. We also consider the possibility that amino acid chemistry delayed the emergence of the glutaminyl- and asparaginyl-tRNA synthetases, neither of which are ubiquitous in extant organisms. We argue that fundamental chemical principles played critical roles in fixation of some aspects of the genetic code pre- and post-LUCA.

Download Full-text

Golden and Harmonic Mean in the Genetic Code

10.31219/osf.io/fzgjp ◽

2017 ◽

Author(s):

Miloje M. Rakocevic

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Harmonic Mean ◽

Standard Genetic Code ◽

International Conference ◽

Theoretical Approaches ◽

Code Table ◽

Nucleotide Triplet ◽

Polar Requirement

In previous two works (Rakočević, 1998; 2013), we have shown the determination of genetic code by golden and harmonic mean within standard Genetic Code Table, i.e. nucleotide triplet table, whereas in this paper we show the same determination through a specific connection between two tables – of nucleotide doublets Table and triplets Table, over polarity of amino acids, measured by Cloister energy in general, and by hydropathy and polar requirement, partialy. [This is the expanded version of the article published in Proceedings of the 2nd International Conference “Theoretical Approaches to BioInformation Systems” (TABIS.2013), September 17–22, 2013, Belgrade, Serbia. That first version is also stored, as Version 1, in OSF Preprints.]

Download Full-text

Visualizing Amino Acid Substitutions in a Physicochemical Vector Space

10.1101/2021.07.15.452549 ◽

2021 ◽

Author(s):

Louis R Nemzer

Keyword(s):

Amino Acid ◽

Genetic Code ◽

Three Dimensional ◽

Point Mutations ◽

Amino Acid Substitutions ◽

Standard Genetic Code ◽

Single Nucleotide ◽

Single Nucleotide Mutation ◽

Nucleotide Mutation ◽

Hereditary Disorders

A three-dimensional representation of the twenty proteinogenic amino acids in a physicochemical space is presented. Vectors corresponding to amino acid substitutions are classified based on whether they are accessible via a single-nucleotide mutation. It is shown that the standard genetic code establishes a "choice architecture" that permits nearly independent tuning of the properties related with size and those related with hydrophobicity. This work sheds light on the metarules of evolvability that may have shaped the standard genetic code to increase the probability that adaptive point mutations will be generated. An illustration of the usefulness of visualizing amino acid substitutions in a 3D physicochemical space is shown using data collected from the SARS-CoV-2 receptor binding domain. The substitutions most responsible for antibody escape are almost always inaccessible via single nucleotide mutation, and also change multiple properties concurrently. The results of this research can extend our understanding of certain hereditary disorders caused by point mutations, as well as guide the development of rational protein and vaccine design.

Download Full-text