Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding

Mapping Intimacies ◽

10.1101/139139 ◽

2017 ◽

Cited By ~ 5

Author(s):

Charles W. Carter ◽

Peter Wills

Keyword(s):

De Novo ◽

Rna World ◽

Impedance Matching ◽

Error Rates ◽

Ancestral Gene ◽

Trna Synthetases ◽

Translation Error ◽

Genetic Coding ◽

Necessary And Sufficient ◽

Genetic Complementarity

ABSTRACTGenetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptide aminoacyl-tRNA synthetases (aaRS). Two discoveries about aaRS and their tRNA substrates now furnish a unifying rationale for the opposite conclusion: that the key processes of the Central Dogma of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA partnership, eliminating the epistemological need for a prior RNA world. First, the two aaRS classes likely arose from opposite strands of the same ancestral gene, implying a simple genetic alphabet. Inversion symmetries in aaRS structural biology arising from genetic complementarity would have stabilized the initial and subsequent differentiation of coding specificities and hence rapidly promoted diversity in the proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexivity in protein aaRS. Bootstrapping of increasingly detailed coding is thus intrinsic to polypeptide aaRS, but impossible in an RNA world. These notions underline the following concepts that contradict gradual replacement of ribozymal aaRS by polypeptide aaRS: (i) any set of aaRS must be interdependent; (ii) reflexivity intrinsic to polypeptide aaRS production dynamics promotes bootstrapping; (iii) takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; (iv) the Central Dogma’s emergence is most probable when replication and translation error rates remain comparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene-replicase-translatase system of genetic coding that would have continuously preserved the functional meaning of genetically encoded protein genes whose phylogenetic relationships match those observed today.

Class I and II aminoacyl-tRNA synthetase tRNA groove discrimination created the first synthetase•tRNA cognate pairs and was therefore essential to the origin of genetic coding

10.1101/593269 ◽

2019 ◽

Author(s):

Charles W. Carter ◽

Peter R. Wills

Keyword(s):

Trna Synthetase ◽

Class I ◽

Stem Loop ◽

Class Division ◽

Trna Synthetases ◽

Regression Methods ◽

Genetic Coding ◽

Base Sequences ◽

Coding Rules ◽

Necessary And Sufficient

ABSTRACTThe genetic code likely arose when a bidirectional gene began to produce ancestral aminoacyl-tRNA synthetases (aaRS) capable of distinguishing between two distinct sets of amino acids. The synthetase Class division therefore necessarily implies a mechanism by which the two ancestral synthetases could also discriminate between two different kinds of tRNA substrates. We used regression methods to uncover the possible patterns of base sequences capable of such discrimination and find that they appear to be related to thermodynamic differences in the relative stabilities of a hairpin necessary for recognition of tRNA substrates by Class I aaRS. The thermodynamic differences appear to be exploited by secondary structural differences between models for the ancestral aaRS called synthetase Urzymes and reinforced by packing of aromatic amino acid side chains against the nonpolar face of the ribose of A76 if and only if the tRNA CCA sequence forms a hairpin. The patterns of bases 1, 2 and 73 and stabilization of the hairpin by structural complementarity with Class I, but not Class II aaRS Urzymes appears to be necessary and sufficient to have enabled the generation of the first two aaRS•tRNA cognate pairs, and the launch of a rudimentary binary genetic coding related recognizably to contemporary cognate pairs. As a consequence, it seems likely that non-random aminoacylation of tRNAs preceded the advent of the tRNA anticodon stem-loop. Consistent with this suggestion, coding rules in the acceptor-stem bases also reveal a palimpsest of the codon•anticodon interaction, as previously proposed.

Optimizing de novo genome assembly from PCR-amplified metagenomes

PeerJ ◽

10.7717/peerj.6902 ◽

2019 ◽

Vol 7 ◽

pp. e6902 ◽

Cited By ~ 9

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Size Number ◽

Assembly Pipeline

Background Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. Conclusions PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Optimizing de novo genome assembly from PCR-amplified metagenomes

10.7287/peerj.preprints.27453 ◽

2018 ◽

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Assembly Pipeline

Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Nature Biotechnology ◽

10.1038/s41587-020-0719-5 ◽

2020 ◽

Author(s):

David Porubsky ◽

◽

Peter Ebert ◽

Peter A. Audano ◽

Mitchell R. Vollger ◽

...

Keyword(s):

Single Cell ◽

Genome Assembly ◽

De Novo ◽

Error Rates ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Parental Data ◽

Human Genome Assembly ◽

Long Read

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

High-Fidelity Translation of Recombinant Human Hemoglobin in Escherichia coli

Applied and Environmental Microbiology ◽

10.1128/aem.64.5.1589-1593.1998 ◽

1998 ◽

Vol 64 (5) ◽

pp. 1589-1593 ◽

Cited By ~ 11

Author(s):

Michael J. Weickert ◽

Izydor Apostol

Keyword(s):

Escherichia Coli ◽

Error Rates ◽

High Fidelity ◽

Human Hemoglobin ◽

Heterologous Proteins ◽

Expression Systems ◽

High Level Expression ◽

E Coli ◽

Translation Error ◽

High Level

ABSTRACT Coexpression of di-α-globin and β-globin in Escherichia coli in the presence of exogenous heme yielded high levels of soluble, functional recombinant human hemoglobin (rHb1.1). High-level expression of rHb1.1 provides a good model for measuring mistranslation in heterologous proteins. rHb1.1 does not contain isoleucine; therefore, any isoleucine present could be attributed to mistranslation, most likely mistranslation of one or more of the 200 codons that differ from an isoleucine codon by 1 bp. Sensitive amino acid analysis of highly purified rHb1.1 typically revealed ≤0.2 mol of isoleucine per mol of hemoglobin. This corresponds to a translation error rate of ≤0.001, which is not different from typical translation error rates found for E. coli proteins. Two different expression systems that resulted in accumulation of globin proteins to levels equivalent to ∼20% of the level of E. colisoluble proteins also resulted in equivalent translational fidelity.

Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models

Scientific Reports ◽

10.1038/s41598-019-52196-4 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Mustafa Abdallah ◽

Ashraf Mahgoub ◽

Hany Ahmed ◽

Somali Chaterji

Keyword(s):

Error Correction ◽

Language Processing ◽

De Novo ◽

Geometric Mean ◽

Language Modeling ◽

Error Rates ◽

Language Models ◽

Hill Climbing ◽

Strong Negative Correlation ◽

Best Value

Abstract The performance of most error-correction (EC) algorithms that operate on genomics reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of finding the best values of these configuration parameters to optimize error correction and consequently improve genome assembly. We perform this in an adaptive manner, adapted to different datasets and to EC tools, due to the observation that different configuration parameters are optimal for different datasets, i.e., from different platforms and species, and vary with the EC algorithm being applied. We use language modeling techniques from the Natural Language Processing (NLP) domain in our algorithmic suite, Athena, to automatically tune the performance-sensitive configuration parameters. Through the use of N-Gram and Recurrent Neural Network (RNN) language modeling, we validate the intuition that the EC performance can be computed quantitatively and efficiently using the “perplexity” metric, repurposed from NLP. After training the language model, we show that the perplexity metric calculated from a sample of the test (or production) data has a strong negative correlation with the quality of error correction of erroneous NGS reads. Therefore, we use the perplexity metric to guide a hill climbing-based search, converging toward the best configuration parameter value. Our approach is suitable for both de novo and comparative sequencing (resequencing), eliminating the need for a reference genome to serve as the ground truth. We find that Athena can automatically find the optimal value of k with a very high accuracy for 7 real datasets and using 3 different k-mer based EC algorithms, Lighter, Blue, and Racer. The inverse relation between the perplexity metric and alignment rate exists under all our tested conditions—for real and synthetic datasets, for all kinds of sequencing errors (insertion, deletion, and substitution), and for high and low error rates. The absolute value of that correlation is at least 73%. In our experiments, the best value of k found by Athena achieves an alignment rate within 0.53% of the oracle best value of k found through brute force searching (i.e., scanning through the entire range of k values). Athena’s selected value of k lies within the top-3 best k values using N-Gram models and the top-5 best k values using RNN models With best parameter selection by Athena, the assembly quality (NG50) is improved by a Geometric Mean of 4.72X across the 7 real datasets.

Pairwise Multiple Comparisons in Repeated Measures Designs

Journal of Educational Statistics ◽

10.3102/10769986005003269 ◽

1980 ◽

Vol 5 (3) ◽

pp. 269-287 ◽

Cited By ~ 53

Author(s):

Scott E. Maxwell

Keyword(s):

Repeated Measures ◽

Mixed Model ◽

Multiple Comparisons ◽

Error Rates ◽

Type I ◽

Omnibus Test ◽

Mixed Model Approach ◽

Significant Difference ◽

Repeated Measures Designs ◽

Necessary And Sufficient

Five methods of performing pairwise multiple comparisons in repeated measures designs were investigated. Tukey's Wholly Significant Difference (WSD) test, recommended by most experimental design texts, requires that all differences between pairs of means have a common variance. However, this assumption is equivalent to the sphericity condition that is necessary and sufficient for the validity of the mixed-model approach to the omnibus test. Monte Carlo methods revealed that Tukey's WSD leads to an inflated alpha level when the sphericity assumption is not met. Consideration of both Type I and Type II error rates found in the simulated conditions for the five procedures suggests that a Bonferroni method utilizing a separate error term for each comparison should be employed.

In Vivo Analysis of Cobinamide Salvaging in Rhodobacter sphaeroides Strain 2.4.1

Journal of Bacteriology ◽

10.1128/jb.00230-09 ◽

2009 ◽

Vol 191 (12) ◽

pp. 3842-3851 ◽

Cited By ~ 22

Author(s):

Michael J. Gray ◽

Jorge C. Escalante-Semerena

Keyword(s):

Rhodobacter Sphaeroides ◽

High Performance ◽

Biosynthetic Pathway ◽

De Novo ◽

Nutritional Analysis ◽

In Vivo Analysis ◽

Essential Enzyme ◽

Necessary And Sufficient ◽

Coenzyme B

ABSTRACT The genome of Rhodobacter sphaeroides encodes the components of two distinct pathways for salvaging cobinamide (Cbi), a precursor of adenosylcobalamin (AdoCbl, coenzyme B12). One pathway, conserved among bacteria, depends on a bifunctional kinase/guanylyltransferase (CobP) enzyme to convert adenosylcobinamide (AdoCbi) to AdoCbi-phosphate (AdoCbi-P), an intermediate in de novo AdoCbl biosynthesis. The other pathway, of archaeal origin, depends on an AdoCbi amidohydrolase (CbiZ) enzyme to generate adenosylcobyric acid (AdoCby), which is converted to AdoCbi-P by the AdoCbi-P synthetase (CobD) enzyme. Here we report that R. sphaeroides strain 2.4.1 synthesizes AdoCbl de novo and that it salvages Cbi using both of the predicted Cbi salvaging pathways. AdoCbl produced by R. sphaeroides was identified and quantified by high-performance liquid chromatography and bioassay. The deletion of cobB (encoding an essential enzyme of the de novo corrin ring biosynthetic pathway) resulted in a strain of R. sphaeroides that would not grow on acetate in the absence of exogenous corrinoids. The results from a nutritional analysis showed that the presence of either CbiZ or CobP was necessary and sufficient for Cbi salvaging, that CbiZ-dependent Cbi salvaging depended on the presence of CobD, and that CobP-dependent Cbi salvaging occurred in a cbiZ + strain. Possible reasons why R. sphaeroides maintains two distinct pathways for Cbi salvaging are discussed.

High contiguity long read assembly of Brassica nigra allows localization of active centromeres and provides insights into the ancestral Brassica genome

10.1101/2020.02.03.932665 ◽

2020 ◽

Cited By ~ 5

Author(s):

Sampath Perumal ◽

Chu Shin Koh ◽

Lingling Jin ◽

Miles Buchwaldt ◽

Erin Higgins ◽

...

Keyword(s):

De Novo ◽

Low Complexity ◽

Error Rates ◽

Brassica Nigra ◽

Genome Integrity ◽

Ancestral Genome ◽

Genomic Distance ◽

Long Read ◽

Genome Assemblies ◽

Technology Comparison

AbstractHigh-quality nanopore genome assemblies were generated for two Brassica nigra genotypes (Ni100 and CN115125); a member of the agronomically important Brassica species. The N50 contig length for the two assemblies were 17.1 Mb (58 contigs) and 0.29 Mb (963 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short read assembly for Ni100 corroborated genome integrity and quantified sequence related error rates (0.002%). The contiguity and coverage allowed unprecedented access to low complexity regions of the genome. Pericentromeric regions and coincidence of hypo-methylation enabled localization of active centromeres and identified a novel centromere-associated ALE class I element which appears to have proliferated through relatively recent nested transposition events (<1 million years ago). Computational abstraction was used to define a post-triplication Brassica specific ancestral genome and to calculate the extensive rearrangements that define the genomic distance separating B. nigra from its diploid relatives.

Chaperone-mediated ordered assembly of the SAGA and NuA4 transcription co-activator complexes

10.1101/524959 ◽

2019 ◽

Author(s):

Alberto Elías-Villalobos ◽

Damien Toullec ◽

Céline Faux ◽

Martial Séveno ◽

Dominique Helmlinger

Keyword(s):

Fission Yeast ◽

Transcription Initiation ◽

De Novo ◽

Functional Modules ◽

Assembly Pathway ◽

Ordered Assembly ◽

Conserved Region ◽

Transcriptional Complexes ◽

Necessary And Sufficient ◽

Functional Analyses

AbstractTranscription initiation involves the coordinated activities of large multimeric complexes that are organized into functional modules. Little is known about the mechanisms and pathways that govern their assembly from individual components. We report here several principles governing the assembly of the highly conserved SAGA and NuA4 co-activator complexes. Using fission yeast, which contain two functionally non-redundant paralogs of the shared Tra1 subunit, we demonstrate that Tra1 contributes to scaffolding the entire NuA4 complex. In contrast, within SAGA, Tra1 specifically promotes the incorporation of the de-ubiquitination module (DUB), defining an ordered assembly pathway. Biochemical and functional analyses elucidated the mechanism by which Tra1 assemble differentially into SAGA or NuA4 and identified a small, conserved region of Spt20 that is both necessary and sufficient to anchor Tra1 within SAGA. Finally, we establish that Hsp90 and its cochaperone TTT are required for Tra1 de novo incorporation into both SAGA and NuA4, indicating that Tra1, a pseudokinase of the PIKK family, shares a dedicated chaperone machinery with its cognate kinases. Overall, our work brings mechanistic insights into the de novo assembly of transcriptional complexes through ordered pathways and reveals the contribution of dedicated chaperones to this process.