scholarly journals A computational screen for alternative genetic codes in over 250,000 genomes

2021 ◽  
Author(s):  
Yekaterina Shulgina ◽  
Sean R. Eddy

The genetic code has been proposed to be a "frozen accident", but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Yekaterina Shulgina ◽  
Sean R Eddy

The genetic code has been proposed to be a 'frozen accident', but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.


1999 ◽  
Vol 26 (5) ◽  
pp. 495 ◽  
Author(s):  
Kazumasa Yoshida ◽  
Kiyoshi Tazaki

Three genomic clones (Rplec2, Rplec5 and Rplec6) and a cDNA clone (LECRPA4) that encoded lectin or lectin-related polypeptides were isolated from Robinia pseudoacacia L. A comparison of the nucleotide sequences of Rplec2 and a previously reported cDNA for the subunit indicated that Rplec2 encoded the 29 kDa subunit of the inner-bark lectin RPbAI. Rplec5 encoded a polypeptide whose deduced amino acid sequence was 96.1% identical to that of a subunit of seed lectin. The amino acid sequence deduced from the open reading frame of Rplec6 showed 61.1% identity to that encoded by Rplec5. LECRPA4 was isolated from an inner bark cDNA library and appeared to encode the 26 kDa subunit of inner-bark lectin RPbAII. The expression patterns of the various genes in tissues were examined by the reverse transcriptase-polymerase chain reaction (RT-PCR) with appropriate primers. Rplec2 transcripts were detected in the inner bark and roots. Rplec5 transcripts were detected in the inner bark, seeds and roots. No Rplec6 transcripts were detected in all tissues examined. LECRPA4 transcripts were found in leaves and in the inner bark. The level of expression of Rplec2 in the inner bark appeared to be similar in samples collected in different years and from different trees, whereas levels of expression of Rplec5 and LECRPA4 varied. These results suggest the differential regulation of expression of members of the lectin gene family in tissues of R. pseudoacacia. The nucleotide sequence data reported herein will appear in the DDBJ, EMBL and GenBank Nucleotide Sequence Databases under the accession numbers AB 012632 (Rplec2), AB012633 (Rplec5), AB012634 (Rplec6) and AB012635 (LECRPA4).


2012 ◽  
Vol 9 (3) ◽  
pp. 18-32 ◽  
Author(s):  
David Reboiro-Jato ◽  
Miguel Reboiro-Jato ◽  
Florentino Fdez-Riverola ◽  
Cristina P. Vieira ◽  
Nuno A. Fonseca ◽  
...  

Summary Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites that are responsible for adaptive changes. Nevertheless, in order to use such an approach, software applications are required to align protein and DNA sequences, infer a phylogenetic tree and run the maximum-likelihood models. Therefore, a significant effort is made in order to prepare input files for the different software applications and in the analysis of the output of every analysis. In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software. It was developed with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data. An example of the usefulness of such a pipeline is given by showing, under different conditions, positively selected amino acid sites in a set of 54 Coffea putative S-RNase sequences. ADOPS software is freely available and can be downloaded from http://sing.ei.uvigo.es/ADOPS.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
J. A. Tenreiro Machado ◽  
António C. Costa ◽  
Maria Dulce Quelhas

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.


2019 ◽  
Vol 464 ◽  
pp. 21-32 ◽  
Author(s):  
Paweł Błażej ◽  
Małgorzata Wnętrzak ◽  
Dorota Mackiewicz ◽  
Przemysław Gagat ◽  
Paweł Mackiewicz

2020 ◽  
Vol 32 (1) ◽  
pp. 9-10
Author(s):  
Stephen John Knabel ◽  
Istvan Hargittai

AbstractWe propose to keep the term “genetic code” to describe the nucleotide sequence in DNA and RNA and use the term “genetic cipher” to describe the key for decoding the genetic codes of DNA and RNA into the amino acid sequences of proteins.


2021 ◽  
Author(s):  
Adair L Borges ◽  
Yue Clare Lou ◽  
Rohan Sachdeva ◽  
Basem Al-Shayeb ◽  
Alexander L. Jaffe ◽  
...  

The genetic code is a highly conserved feature of life. However, some alternative genetic codes use reassigned stop codons to code for amino acids. Here, we survey stop codon recoding across bacteriophages (phages) in human and animal gut microbiomes. We find that stop codon recoding has evolved in diverse clades of phages predicted to infect hosts that use the standard code. We provide evidence for an evolutionary path towards recoding involving reduction in the frequency of TGA and TAG stop codons due to low GC content, followed by acquisition of suppressor tRNAs and the emergence of recoded stop codons in structural and lysis genes. In analyses of two distinct lineages of recoded virulent phages, we find that lysis-related genes are uniquely biased towards use of recoded stop codons. This convergence supports the inference that stop codon recoding is a strategy to regulate the expression of late stage genes and control lysis timing. Interestingly, we identified prophages with recoded stop codons integrated into genomes of bacteria that use standard code, and hypothesize that recoding may control the lytic-lysogenic switch. Alternative coding has evolved many times, often in closely related lineages, indicating that genetic code is plastic in bacteriophages and adaptive recoding can occur over very short evolutionary timescales.


1998 ◽  
Vol 64 (7) ◽  
pp. 2473-2478 ◽  
Author(s):  
Ashraf A. Khan ◽  
Eungbin Kim ◽  
Carl E. Cerniglia

ABSTRACT Aeromonas trota AK2, which was derived from ATCC 49659 and produces the extracellular pore-forming hemolytic toxin aerolysin, was mutagenized with the transposon mini-Tn5Km1 to generate a hemolysin-deficient mutant, designated strain AK253. Southern blotting data indicated that an 8.7-kb NotI fragment of the genomic DNA of strain AK253 contained the kanamycin resistance gene of mini-Tn5Km1. The 8.7-kb NotI DNA fragment was cloned into the vector pGEM5Zf(−) by selecting for kanamycin resistance, and the resultant clone, pAK71, showed aerolysin activity in Escherichia coli JM109. The nucleotide sequence of the aerA gene, located on the 1.8-kbApaI-EcoRI fragment, was determined to consist of 1,479 bp and to have an ATG initiation codon and a TAA termination codon. An in vitro coupled transcription-translation analysis of the 1.8-kb region suggested that the aerA gene codes for a 54-kDa protein, in agreement with nucleotide sequence data. The deduced amino acid sequence of the aerA gene product ofA. trota exhibited 99% homology with the amino acid sequence of the aerA product of Aeromonas sobria AB3 and 57% homology with the amino acid sequences of the products of the aerA genes of Aeromonas salmonicida 17-2 and A. sobria 33.


Sign in / Sign up

Export Citation Format

Share Document