Bacterial genes outnumber archaeal genes in eukaryotic genomes

Mapping Intimacies ◽

10.1101/779579 ◽

2019 ◽

Author(s):

Julia Brückner ◽

William F. Martin

Keyword(s):

Ribosomal Proteins ◽

Phylogenetic Trees ◽

Higher Plants ◽

Protein Sequences ◽

Eukaryotic Genome ◽

Metabolic Processes ◽

Eukaryotic Genes ◽

Bacterial Genes ◽

Origin Of Eukaryotes ◽

Eukaryotic Genomes

AbstractThe origin of eukaryotes is one of evolution’s most important transitions, yet it is still poorly understood. Evidence for how it occurred should be preserved in eukaryotic genomes. Based on phylogenetic trees from ribosomal RNA and ribosomal proteins, eukaryotes are typically depicted as branching together with or within archaea. This ribosomal affiliation is widely interpreted as evidence for an archaeal origin of eukaryotes. However, the extent to which the archaeal ancestry of genes for the cytosolic ribosomes of eukaryotic cells is representative for the rest of the eukaryotic genome is unknown. Here we have clustered 19,050,992 protein sequences from 5,443 bacteria and 212 archaea with 3,420,731 protein sequences from 150 eukaryotes spanning six eukaryotic supergroups to identify genes that link eukaryotes exclusively to bacteria and archaea respectively. By downsampling the bacterial sample we obtain estimates for the bacterial and archaeal proportions of genes among 150 eukaryotic genomes. Eukaryotic genomes possess a bacterial majority of genes. On average, eukaryotic genes are 56% bacterial in origin. The majority drops to 53% in eukaryotes that never possessed plastids, and increases to 61% in photosynthetic eukaryotic lineages, where the cyanobacterial ancestor of plastids contributed additional genes to the eukaryotic genome, reaching 67% in higher plants. Intracellular parasites, which undergo reductive evolution in adaptation to the nutrient rich environment of the cells that they infect, relinquish bacterial genes for metabolic processes. In the current sample, this process of adaptive gene loss is most pronounced in the human parasite Encephalitozoon intestinalis with 86% archaeal and 14% bacterial derived genes. The most bacterial eukaryote genome sampled is rice, with 67% bacterial and 33% archaeal genes. The functional dichotomy, initially described for yeast, of archaeal genes being involved in genetic information processing and bacterial genes being involved in metabolic processes is conserved across all eukaryotic supergroups.

Bacterial Genes Outnumber Archaeal Genes in Eukaryotic Genomes

Genome Biology and Evolution ◽

10.1093/gbe/evaa047 ◽

2020 ◽

Vol 12 (4) ◽

pp. 282-292 ◽

Cited By ~ 6

Author(s):

Julia Brueckner ◽

William F Martin

Keyword(s):

Gene Loss ◽

Protein Sequences ◽

Metabolic Processes ◽

Encephalitozoon Intestinalis ◽

Human Parasite ◽

Intracellular Parasites ◽

Genetic Information Processing ◽

Bacterial Genes ◽

Eukaryotic Genomes ◽

Eukaryote Genome

Abstract Eukaryotes are typically depicted as descendants of archaea, but their genomes are evolutionary chimeras with genes stemming from archaea and bacteria. Which prokaryotic heritage predominates? Here, we have clustered 19,050,992 protein sequences from 5,443 bacteria and 212 archaea with 3,420,731 protein sequences from 150 eukaryotes spanning six eukaryotic supergroups. By downsampling, we obtain estimates for the bacterial and archaeal proportions. Eukaryotic genomes possess a bacterial majority of genes. On average, the majority of bacterial genes is 56% overall, 53% in eukaryotes that never possessed plastids, and 61% in photosynthetic eukaryotic lineages, where the cyanobacterial ancestor of plastids contributed additional genes to the eukaryotic lineage. Intracellular parasites, which undergo reductive evolution in adaptation to the nutrient rich environment of the cells that they infect, relinquish bacterial genes for metabolic processes. Such adaptive gene loss is most pronounced in the human parasite Encephalitozoon intestinalis with 86% archaeal and 14% bacterial derived genes. The most bacterial eukaryote genome sampled is rice, with 67% bacterial and 33% archaeal genes. The functional dichotomy, initially described for yeast, of archaeal genes being involved in genetic information processing and bacterial genes being involved in metabolic processes is conserved across all eukaryotic supergroups.

Origin of eukaryotes from within archaea, archaeal eukaryome and bursts of gene gain: eukaryogenesis just made easier?

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2014.0333 ◽

2015 ◽

Vol 370 (1678) ◽

pp. 20140333 ◽

Cited By ~ 72

Author(s):

Eugene V. Koonin

Keyword(s):

Genomic Analysis ◽

Eukaryotic Cells ◽

Comparative Genomic ◽

Gene Gain ◽

Primitive Form ◽

Eukaryotic Genes ◽

Phylogenomic Analyses ◽

Complex Features ◽

Bacterial Genes ◽

Origin Of Eukaryotes

The origin of eukaryotes is a fundamental, forbidding evolutionary puzzle. Comparative genomic analysis clearly shows that the last eukaryotic common ancestor (LECA) possessed most of the signature complex features of modern eukaryotic cells, in particular the mitochondria, the endomembrane system including the nucleus, an advanced cytoskeleton and the ubiquitin network. Numerous duplications of ancestral genes, e.g. DNA polymerases, RNA polymerases and proteasome subunits, also can be traced back to the LECA. Thus, the LECA was not a primitive organism and its emergence must have resulted from extensive evolution towards cellular complexity. However, the scenario of eukaryogenesis, and in particular the relationship between endosymbiosis and the origin of eukaryotes, is far from being clear. Four recent developments provide new clues to the likely routes of eukaryogenesis. First, evolutionary reconstructions suggest complex ancestors for most of the major groups of archaea, with the subsequent evolution dominated by gene loss. Second, homologues of signature eukaryotic proteins, such as actin and tubulin that form the core of the cytoskeleton or the ubiquitin system, have been detected in diverse archaea. The discovery of this ‘dispersed eukaryome’ implies that the archaeal ancestor of eukaryotes was a complex cell that might have been capable of a primitive form of phagocytosis and thus conducive to endosymbiont capture. Third, phylogenomic analyses converge on the origin of most eukaryotic genes of archaeal descent from within the archaeal evolutionary tree, specifically, the TACK superphylum. Fourth, evidence has been presented that the origin of the major archaeal phyla involved massive acquisition of bacterial genes. Taken together, these findings make the symbiogenetic scenario for the origin of eukaryotes considerably more plausible and the origin of the organizational complexity of eukaryotic cells more readily explainable than they appeared until recently.

Phylogenetic relationships within the family Halobacteriaceae inferred from rpoB′ gene and protein sequences

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.65190-0 ◽

2007 ◽

Vol 57 (10) ◽

pp. 2289-2295 ◽

Cited By ~ 33

Author(s):

Madalin Enache ◽

Takashi Itoh ◽

Tadamasa Fukushima ◽

Ron Usami ◽

Lucia Dumitru ◽

...

Keyword(s):

16S Rrna ◽

Molecular Marker ◽

16S Rrna Gene ◽

Phylogenetic Trees ◽

Protein Sequences ◽

Rpob Gene ◽

Rrna Gene ◽

Gene Sequences ◽

16S Rrna Gene Sequences ◽

The Family

In order to clarify the current phylogeny of the haloarchaea, particularly the closely related genera that have been difficult to sort out using 16S rRNA gene sequences, the DNA-dependent RNA polymerase subunit B′ gene (rpoB′) was used as a complementary molecular marker. Partial sequences of the gene were determined from 16 strains of the family Halobacteriaceae. Comparisons of phylogenetic trees inferred from the gene and protein sequences as well as from corresponding 16S rRNA gene sequences suggested that species of the genera Natrialba, Natronococcus, Halobiforma, Natronobacterium, Natronorubrum, Natrinema/Haloterrigena and Natronolimnobius formed a monophyletic group in all trees. In the RpoB′ protein tree, the alkaliphilic species Natrialba chahannaoensis, Natrialba hulunbeirensis and Natrialba magadii formed a tight group, while the neutrophilic species Natrialba asiatica formed a separate group with species of the genera Natronorubrum and Natronolimnobius. Species of the genus Natronorubrum were split into two groups in both the rpoB′ gene and protein trees. The most important advantage of the use of the rpoB′ gene over the 16S rRNA gene is that sequences of the former are highly conserved amongst species of the family Halobacteriaceae. All sequences determined so far can be aligned unambiguously without any gaps. On the other hand, gaps are necessary at 49 positions in the inner part of the alignment of 16S rRNA gene sequences. The rpoB′ gene and protein sequences can be used as an excellent alternative molecular marker in phylogenetic analysis of the Halobacteriaceae.

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

The cyanelle genome ofCyanophora paradoxaencodes ribosomal proteins not encoded by the chloroplast genomes of higher plants

FEBS Letters ◽

10.1016/0014-5793(90)80026-f ◽

1990 ◽

Vol 259 (2) ◽

pp. 273-280 ◽

Cited By ~ 35

Author(s):

Donald A. Bryant ◽

Veronica L. Stirewalt

Keyword(s):

Ribosomal Proteins ◽

Higher Plants ◽

Chloroplast Genomes

The Nuclear Genes for Chloroplast Ribosomal Proteins L11 and L12 in Higher Plants

The Translational Apparatus ◽

10.1007/978-1-4615-2407-6_52 ◽

1993 ◽

pp. 555-564

Author(s):

Jürgen Schmidt ◽

Wolfgang Weglöhner ◽

Alap R. Subramanian

Keyword(s):

Ribosomal Proteins ◽

Higher Plants ◽

Nuclear Genes

Analysis of SARS-CoV-2 nucleocapsid protein sequence variations in ASEAN countries

Medical Journal of Indonesia ◽

10.13181/mji.oa.215304 ◽

2021 ◽

Author(s):

Mochammad Rajasa Mukti Negara ◽

Ita Krissanti ◽

Gita Widya Pradini

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Protein Sequences ◽

Reference Sequence ◽

N Protein ◽

Asean Country ◽

Sequence Variations ◽

Complete Sequences ◽

Asean Countries ◽

Global Initiative

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2 which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.

The Dual Origin of the Yeast Mitochondrial Proteome

Yeast ◽

10.1002/1097-0061(20000930)17:3<170::aid-yea25>3.0.co;2-v ◽

2000 ◽

Vol 1 (3) ◽

pp. 170-187 ◽

Cited By ~ 103

Author(s):

Olof Karlberg ◽

Björn Canbäck ◽

Charles G. Kurland ◽

Siv G. E. Andersson

Keyword(s):

Nuclear Genome ◽

Mitochondrial Proteins ◽

Mitochondrial Proteome ◽

Phylogenetic Reconstructions ◽

Eukaryotic Genes ◽

Genes Encoding ◽

Parallel Mode ◽

Bacterial Genes ◽

Dual Origin ◽

Regulatory Functions

We propose a scheme for the origin of mitochondria based on phylogenetic reconstructions with more than 400 yeast nuclear genes that encode mitochondrial proteins. Half of the yeast mitochondrial proteins have no discernable bacterial homologues, while one-tenth are unequivocally of α-proteobacterial origin. These data suggest that the majority of genes encoding yeast mitochondrial proteins are descendants of two different genomic lineages that have evolved in different modes. First, the ancestral free-living α-proteobacterium evolved into an endosymbiont of an anaerobic host. Most of the ancestral bacterial genes were lost, but a small fraction of genes supporting bioenergetic and translational processes were retained and eventually transferred to what became the host nuclear genome. In a second, parallel mode, a larger number of novel mitochondrial genes were recruited from the nuclear genome to complement the remaining genes from the bacterial ancestor. These eukaryotic genes, which are primarily involved in transport and regulatory functions, transformed the endosymbiont into an ATP-exporting organelle.

Ammonia, glutamine, and asparagine: a carbon–nitrogen interface

Canadian Journal of Botany ◽

10.1139/b88-288 ◽

1988 ◽

Vol 66 (10) ◽

pp. 2103-2109 ◽

Cited By ~ 109

Author(s):

K. W. Joy

Keyword(s):

Glutamine Synthetase ◽

Glutamate Dehydrogenase ◽

Higher Plants ◽

Amino Nitrogen ◽

Glutamate Synthase ◽

Dinitrogen Fixation ◽

Similar Function ◽

Metabolic Processes ◽

Organic Form ◽

Primary Input

In plants, the primary input of nitrogen (obtained from the soil or from symbiotic dinitrogen fixation) occurs through the assimilation of ammonia into organic form. Synthesis of glutamine (via glutamine synthetase) is the major, and possibly exclusive, route for this process, and there is little evidence for the participation of glutamate dehydrogenase. A variety of reactions distribute glutamine nitrogen to other compounds, including transfer to amino nitrogen through glutamate synthase. In many plants asparagine is a major recipient of glutamine nitrogen and provides a mobile reservoir for transport to sites of growth; ureides perform a similar function in some legumes. Utilisation of transport forms of nitrogen, and a number of other metabolic processes, involves release of ammonia, which must be reassimilated. In illuminated leaves, there is an extensive flux of ammonia released by the photorespiratory cycle, requiring continuous efficient reassimilation. Aspects of ammonia recycling and related amide metabolism in higher plants are reviewed.

Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2

BMC Genomics ◽

10.1186/s12864-020-07132-6 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Julie E. Hernández-Salmerón ◽

Gabriel Moreno-Hagelsieb

Keyword(s):

Comparative Genomics ◽

Protein Sequences ◽

Fast Algorithms ◽

Error Rates ◽

Estimated Error ◽

Bacterial Proteomes ◽

Eukaryotic Genomes

Abstract Background Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. Results We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the “ultra-sensitive” option. However, this option was diamond’s slowest, with the “very-sensitive” option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. Conclusions The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.