Computational Analysis of Genetic Code Variations Optimized for the Robustness against Point Mutations with Wobble-like Effects

It is believed that the codon–amino acid assignments of the standard genetic code (SGC) help to minimize the negative effects caused by point mutations. All possible point mutations of the genetic code can be represented as a weighted graph with weights that correspond to the probabilities of these mutations. The robustness of a code against point mutations can be described then by means of the so-called conductance measure. This paper quantifies the wobble effect, which was investigated previously by applying the weighted graph approach, and seeks optimal weights using an evolutionary optimization algorithm to maximize the code’s robustness. One result of our study is that the robustness of the genetic code is least influenced by mutations in the third position—like with the wobble effect. Moreover, the results clearly demonstrate that point mutations in the first, and even more importantly, in the second base of a codon have a very large influence on the robustness of the genetic code. These results were compared to single nucleotide variants (SNV) in coding sequences which support our findings. Additionally, it was analyzed which structure of a genetic code evolves from random code tables when the robustness is maximized. Our calculations show that the resulting code tables are very close to the standard genetic code. In conclusion, the results illustrate that the robustness against point mutations seems to be an important factor in the evolution of the standard genetic code.

Download Full-text

Visualizing Amino Acid Substitutions in a Physicochemical Vector Space

10.1101/2021.07.15.452549 ◽

2021 ◽

Author(s):

Louis R Nemzer

Keyword(s):

Amino Acid ◽

Genetic Code ◽

Three Dimensional ◽

Point Mutations ◽

Amino Acid Substitutions ◽

Standard Genetic Code ◽

Single Nucleotide ◽

Single Nucleotide Mutation ◽

Nucleotide Mutation ◽

Hereditary Disorders

A three-dimensional representation of the twenty proteinogenic amino acids in a physicochemical space is presented. Vectors corresponding to amino acid substitutions are classified based on whether they are accessible via a single-nucleotide mutation. It is shown that the standard genetic code establishes a "choice architecture" that permits nearly independent tuning of the properties related with size and those related with hydrophobicity. This work sheds light on the metarules of evolvability that may have shaped the standard genetic code to increase the probability that adaptive point mutations will be generated. An illustration of the usefulness of visualizing amino acid substitutions in a 3D physicochemical space is shown using data collected from the SARS-CoV-2 receptor binding domain. The substitutions most responsible for antibody escape are almost always inaccessible via single nucleotide mutation, and also change multiple properties concurrently. The results of this research can extend our understanding of certain hereditary disorders caused by point mutations, as well as guide the development of rational protein and vaccine design.

Download Full-text

Refactoring the Genetic Code for Increased Evolvability

10.1101/128058 ◽

2017 ◽

Author(s):

Gur Pines ◽

James D. Winkler ◽

Assaf Pines ◽

Ryan T. Gill

Keyword(s):

Genetic Code ◽

Directed Evolution ◽

Single Gene ◽

Point Mutations ◽

Saturation Mutagenesis ◽

Mutagenic Potential ◽

Single Nucleotide ◽

Common Error ◽

Mutational Landscape ◽

Genetic Codes

AbstractThe standard genetic code is robust to mutations and base-pairing errors during transcription and translation. Point mutations are most likely to be synonymous or preserve the chemical properties of the original amino acid. Saturation mutagenesis experiments suggest that in some cases the best performing mutant requires a replacement of more than a single nucleotide within a codon. These replacements are essentially inaccessible to common error-based laboratory engineering techniques that alter single nucleotide per mutation event, due to the extreme rarity of adjacent mutations. In this theoretical study, we suggest a radical reordering of the genetic code that maximizes the mutagenic potential of single nucleotide replacements. We explore several possible genetic codes that allow a greater degree of accessibility to the mutational landscape and may result in a hyper-evolvable organism serving as an ideal platform for directed evolution experiments. We then conclude by evaluating potential applications for recoded organisms within the synthetic biology field.Significance StatementThe conservative nature of the genetic code prevents bioengineers from efficiently accessing the full mutational landscape of a gene using common error-prone methods. Here we present two computational approaches to generate alternative genetic codes with increased accessibility. These new codes allow mutational transition to a larger pool of amino acids and with a greater degree of chemical differences, using a single nucleotide replacement within the codon, thus increasing evolvability both at the single gene and at the genome levels. Given the widespread use of these techniques for strain and protein improvement along with more fundamental evolutionary biology questions, the use of recoded organisms that maximize evolvability should significantly improve the efficiency of directed evolution, library generation and fitness maximization.

Download Full-text

Cytosine base editor 4 but not adenine base editor generates off-target mutations in mouse embryos

Communications Biology ◽

10.1038/s42003-019-0745-3 ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 15

Author(s):

Hye Kyung Lee ◽

Harold E. Smith ◽

Chengyu Liu ◽

Michaela Willi ◽

Lothar Hennighausen

Keyword(s):

Point Mutations ◽

Mouse Embryos ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Base Editing ◽

Genome Wide ◽

Wide Range ◽

Family Based ◽

Correct Point ◽

Adenine Base

AbstractDeaminase base editing has emerged as a tool to install or correct point mutations in the genomes of living cells in a wide range of organisms. However, the genome-wide off-target effects introduced by base editors in the mammalian genome have been examined in only one study. Here, we have investigated the fidelity of cytosine base editor 4 (BE4) and adenine base editors (ABE) in mouse embryos using unbiased whole-genome sequencing of a family-based trio cohort. The same sgRNA was used for BE4 and ABE. We demonstrate that BE4-edited mice carry an excess of single-nucleotide variants and deletions compared to ABE-edited mice and controls. Therefore, an optimization of cytosine base editors is required to improve its fidelity. While the remarkable fidelity of ABE has implications for a wide range of applications, the occurrence of rare aberrant C-to-T conversions at specific target sites needs to be addressed.

Download Full-text

Basic principles of the genetic code extension

10.1101/704908 ◽

2019 ◽

Author(s):

Paweł Błażej ◽

Małgorzata Wnetrzak ◽

Dorota Mackiewicz ◽

Paweł Mackiewicz

Keyword(s):

Amino Acids ◽

Genetic Code ◽

Point Mutations ◽

Coding System ◽

Base Pairs ◽

Induced Subgraphs ◽

Single Nucleotide ◽

Basic Principles ◽

Code Extension ◽

Incremental Addition

AbstractCompounds including non-canonical amino acids or other artificially designed molecules can find a lot of applications in medicine, industry and biotechnology. They can be produced thanks to the modification or extension of the standard genetic code (SGC). Such peptides or proteins including the non-canonical amino acids can be constantly delivered in a stable way by organisms with the customized genetic code. Among several methods of engineering the code, using non-canonical base pairs is especially promising, because it enables generating many new codons, which can be used to encode any new amino acid. Since even one pair of new bases can extend the SGC up to 216 codons generated by six-letter nucleotide alphabet, the extension of the SGC can be achieved in many ways. Here, we proposed a stepwise procedure of the SGC extension with one pair of non-canonical bases to minimize the consequences of point mutations. We reported relationships between codons in the framework of graph theory. All 216 codons were represented as nodes of the graph, whereas its edges were induced by all possible single nucleotide mutations occurring between codons. Therefore, every set of canonical and newly added codons induces a specific subgraph. We characterized the properties of the induced subgraphs generated by selected sets of codons. Thanks to that, we were able to describe a procedure for incremental addition of the set of meaningful codons up to the full coding system consisting of three pairs of bases. The procedure of gradual extension of the SGC makes the whole system robust to changing genetic information due to mutations and is compatible with the views assuming that codons and amino acids were added successively to the primordial SGC, which evolved to minimize harmful consequences of mutations or mistranslations of encoded proteins.

Download Full-text

Evolution of the standard genetic code

10.1101/2020.02.20.958546 ◽

2020 ◽

Author(s):

Michael Yarus

Keyword(s):

Genetic Code ◽

Coding System ◽

Standard Genetic Code ◽

Evolutionary Mechanisms ◽

Random Code ◽

Chemical Order ◽

Single Origin ◽

Unique Event ◽

Evolutionary Paths ◽

Universal Standard

AbstractA near-universal Standard Genetic Code (SGC) implies a single origin for Earthly life. To study this unique event, I compute paths to the SGC, comparing different plausible histories. Notably, SGC-like coding emerges from traditional evolutionary mechanisms, and a superior path can be identified.To objectively measure evolution, progress values from 0 (random coding) to 1 (SGC-like) are defined: these measure fractions of random-code-to-SGC distance. Progress types are spacing/distance/delta Polar Requirement, detecting space between identical assignments /mutational distance to the SGC/chemical order, respectively. A coding system was based on known RNAs performing aminoacyl-RNA synthetase reactions. Acceptor RNAs exhibit SGC-like wobble; alternatively, non-wobbling triplets uniquely encode 20 amino acids/start/stop. Triplets acquire 22 functions by stereochemistry, selection, coevolution, or randomly. Assignments also propagate to an assigned triplet’s neighborhood via single mutations, but can also decay.Futile evolutionary paths are plentiful due to the vast code universe. Thus SGC evolution is critically sensitive to disorder from random assignments. Evolution also inevitably slows near coding completion. Coding likely avoided these difficulties, and two suitable paths are compared. In late wobble, a majority of non-wobble assignments are made before wobble is adopted. In continuous wobble, a uniquely advantageous early intermediate supplies the gateway to an ordered SGC. Revised coding evolution (limited randomness, late wobble, concentration on amino acid encoding, chemically conservative coevolution with a chemically-ordered elite) produces varied full codes with excellent joint progress values. A population of only 600 independent coding tables includes SGC-like members; a Bayesian path toward more accurate SGC evolution is available.

Download Full-text

Comprehensive Custom NGS Panel Validation for the Improvement of the Stratification of B-Acute Lymphoblastic Leukemia Patients

Journal of Personalized Medicine ◽

10.3390/jpm10030137 ◽

2020 ◽

Vol 10 (3) ◽

pp. 137

Author(s):

Adrián Montaño ◽

Jesús Hernández-Sánchez ◽

Maribel Forero-Castro ◽

María Matorra-Miguel ◽

Eva Lumbreras ◽

...

Keyword(s):

Acute Lymphoblastic Leukemia ◽

Wide Spectrum ◽

Lymphoblastic Leukemia ◽

Point Mutations ◽

Genetic Alterations ◽

Copy Number Variations ◽

Single Step ◽

Fusion Genes ◽

Single Nucleotide Variants ◽

Single Nucleotide

Background: B-acute lymphoblastic leukemia (B-ALL) is a hematological neoplasm of the stem lymphoid cell of the B lineage, characterized by the presence of genetic alterations closely related to the course of the disease. The number of alterations identified in these patients grows as studies of the disease progress, but in clinical practice, the conventional techniques frequently used are only capable of detecting the most common alterations. However, techniques, such as next-generation sequencing (NGS), are being implemented to detect a wide spectrum of new alterations that also include point mutations. Methods: In this study, we designed and validated a comprehensive custom NGS panel to detect the main genetic alterations present in the disease in a single step. For this purpose, 75 B-ALL diagnosis samples from patients previously characterized by standard-of-care diagnostic techniques were sequenced. Results: The use of the custom NGS panel allowed the correct detection of the main genetic alterations present in B-ALL patients, including the presence of an aneuploid clone in 14 of the samples and some of the recurrent fusion genes in 35 of the samples. The panel was also able to successfully detect a number of secondary alterations, such as single nucleotide variants (SNVs) and copy number variations (CNVs) in 66 and 46 of the samples analyzed, respectively, allowing for further refinement of the stratification of patients. The custom NGS panel could also detect alterations with a high level of sensitivity and reproducibility when the findings obtained by NGS were compared with those obtained from other conventional techniques. Conclusions: The use of this custom NGS panel allows us to quickly and efficiently detect the main genetic alterations present in B-ALL patients in a single assay (SNVs and insertions/deletions (INDELs), recurrent fusion genes, CNVs, aneuploidies, and single nucleotide polymorphisms (SNPs) associated with pharmacogenetics). The application of this panel would thus allow us to speed up and simplify the molecular diagnosis of patients, helping patient stratification and management.

Download Full-text

Next-generation forward genetic screens: using simulated data to improve the design of mapping-by-sequencing experiments in Arabidopsis

Nucleic Acids Research ◽

10.1093/nar/gkz806 ◽

2019 ◽

Vol 47 (21) ◽

pp. e140-e140

Author(s):

David Wilson-Sánchez ◽

Samuel Daniel Lup ◽

Raquel Sarmiento-Mañús ◽

María Rosa Ponce ◽

José Luis Micol

Keyword(s):

Point Mutations ◽

Sequencing Depth ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Genetic Screens ◽

Induced Mutations ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Forward Genetic Screens ◽

Mapping By Sequencing

Abstract Forward genetic screens have successfully identified many genes and continue to be powerful tools for dissecting biological processes in Arabidopsis and other model species. Next-generation sequencing technologies have revolutionized the time-consuming process of identifying the mutations that cause a phenotype of interest. However, due to the cost of such mapping-by-sequencing experiments, special attention should be paid to experimental design and technical decisions so that the read data allows to map the desired mutation. Here, we simulated different mapping-by-sequencing scenarios. We first evaluated which short-read technology was best suited for analyzing gene-rich genomic regions in Arabidopsis and determined the minimum sequencing depth required to confidently call single nucleotide variants. We also designed ways to discriminate mutagenesis-induced mutations from background Single Nucleotide Polymorphisms in mutants isolated in Arabidopsis non-reference lines. In addition, we simulated bulked segregant mapping populations for identifying point mutations and monitored how the size of the mapping population and the sequencing depth affect mapping precision. Finally, we provide the computational basis of a protocol that we already used to map T-DNA insertions with paired-end Illumina-like reads, using very low sequencing depths and pooling several mutants together; this approach can also be used with single-end reads as well as to map any other insertional mutagen. All these simulations proved useful for designing experiments that allowed us to map several mutations in Arabidopsis.

Download Full-text

Reproducibility of SNV-calling in multiple sequencing runs from single tumors

PeerJ ◽

10.7717/peerj.1508 ◽

2016 ◽

Vol 4 ◽

pp. e1508 ◽

Cited By ~ 3

Author(s):

Dakota Z. Derryberry ◽

Matthew C. Cowperthwaite ◽

Claus O. Wilke

Keyword(s):

Glioblastoma Multiforme ◽

Large Fraction ◽

Point Mutations ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Single Nucleotide Variants ◽

Specific Point ◽

Single Nucleotide ◽

Cancer Genome Atlas ◽

Genome Atlas

We examined 55 technical sequencing replicates of Glioblastoma multiforme (GBM) tumors from The Cancer Genome Atlas (TCGA) to ascertain the degree of repeatability in calling single-nucleotide variants (SNVs). We used the same mutation-calling pipeline on all pairs of samples, and we measured the extent of the overlap between two replicates; that is, how many specific point mutations were found in both replicates. We further tested whether additional filtering increased or decreased the size of the overlap. We found that about half of the putative mutations identified in one sequencing run of a given sample were also identified in the second, and that this percentage remained steady throughout orders of magnitude of variation in the total number of mutations identified (from 23 to 10,966). We further found that using filtering after SNV-calling removed the overlap completely. We concluded that there is variation in the frequency of mutations in GBMs, and that while some filtering approaches preferentially removed putative mutations found in only one replicate, others removed a large fraction of putative mutations found in both.

Download Full-text

Cytosine but not adenine base editor generates mutations in mice

10.1101/731927 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hye Kyung Lee ◽

Harold E. Smith ◽

Chengyu Liu ◽

Michaela Willi ◽

Lothar Hennighausen

Keyword(s):

Point Mutations ◽

Living Cells ◽

Mouse Embryos ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Base Editing ◽

Wide Range ◽

Family Based ◽

Correct Point ◽

Adenine Base

ABSTRACTDeaminase base editing has emerged as a tool to install or correct point mutations in the genomes of living cells in a wide range of organisms and its ultimate success therapeutically depends on its accuracy. Here we have investigated the fidelity of cytosine base editor 4 (BE4) and adenine base editor (ABE) in mouse embryos using unbiased whole genome sequencing of a family-based trio cohort. We demonstrate that BE4-edited mice carry an excess of single-nucleotide variants and deletions compared to ABE-edited mice and controls.

Download Full-text

Evolution of the Standard Genetic Code

Journal of Molecular Evolution ◽

10.1007/s00239-020-09983-9 ◽

2021 ◽

Vol 89 (1-2) ◽

pp. 19-44 ◽

Cited By ~ 1

Author(s):

Michael Yarus

Keyword(s):

Genetic Code ◽

Coding System ◽

Standard Genetic Code ◽

Evolutionary Mechanisms ◽

Random Code ◽

Chemical Order ◽

Single Origin ◽

Unique Event ◽

Evolutionary Paths ◽

Universal Standard

AbstractA near-universal Standard Genetic Code (SGC) implies a single origin for present Earth life. To study this unique event, I compute paths to the SGC, comparing different plausible histories. Notably, SGC-like coding emerges from traditional evolutionary mechanisms, and a superior route can be identified. To objectively measure evolution, progress values from 0 (random coding) to 1 (SGC-like) are defined: these measure fractions of random-code-to-SGC distance. Progress types are spacing/distance/delta Polar Requirement, detecting space between identical assignments/mutational distance to the SGC/chemical order, respectively. The coding system is based on selected RNAs performing aminoacyl-RNA synthetase reactions. Acceptor RNAs exhibit SGC-like Crick wobble; alternatively, non-wobbling triplets uniquely encode 20 amino acids/start/stop. Triplets acquire 22 functions by stereochemistry, selection, coevolution, or at random. Assignments also propagate to an assigned triplet’s neighborhood via single mutations, but can also decay. A vast code universe makes futile evolutionary paths plentiful. Thus, SGC evolution is critically sensitive to disorder from random assignments. Evolution also inevitably slows near coding completion. The SGC likely avoided these difficulties, and two suitable paths are compared. In late wobble, a majority of non-wobble assignments are made before wobble is adopted. In continuous wobble, a uniquely advantageous early intermediate yields an ordered SGC. Revised coding evolution (limited randomness, late wobble, concentration on amino acid encoding, chemically conservative coevolution with a chemically ordered elite) produces varied full codes with excellent joint progress values. A population of only 600 independent coding tables includes SGC-like members; a Bayesian path toward more accurate SGC evolution is available.

Download Full-text