scholarly journals Concerted evolution at a multicopy locus in the protozoan parasite Theileria parva: extreme divergence of potential protein-coding sequences.

1997 ◽  
Vol 17 (3) ◽  
pp. 1666-1673 ◽  
Author(s):  
R Bishop ◽  
A Musoke ◽  
S Morzaria ◽  
B Sohanpal ◽  
E Gobright

Concerted evolution of multicopy gene families in vertebrates is recognized as an important force in the generation of biological novelty but has not been documented for the multicopy genes of protozoa. A multicopy locus, Tpr, which consists of tandemly arrayed open reading frames (ORFs) containing several repeated elements has been described for Theileria parva. Herein we show that probes derived from the 5'/N-terminal ends of ORFs in the genomic DNAs of T. parva Uganda (1,108 codons) and Boleni (699 codons) hybridized with multicopy sequences in homologous DNA but did not detect similar sequences in the DNA of 14 heterologous T. parva stocks and clones. The probe sequences were, however, protein coding according to predictive algorithms and codon usage. The 3'/C-terminal ends of the Uganda and Boleni ORFs exhibited 75% similarity and identity, respectively, to the previously identified Tpr1 and Tpr2 repetitive elements of T. parva Muguga. Tpr1-homologous sequences were detected in two additional species of Theileria. Eight different Tpr1-homologous transcripts were present in piroplasm mRNA from a single T. parva Muguga-infected animal. The Tpr1 and Tpr2 amino acid sequences contained six predicted membrane-associated segments. The ratio of synonymous to nonsynonymous substitutions indicates that Tpr1 evolves like protein-encoding DNA. The previously determined nucleotide sequence of the gene encoding the p67 antigen is completely identical in T. parva Muguga, Boleni, and Uganda, including the third base in codons. The data suggest that concerted evolution can lead to the radical divergence of coding sequences and that this can be a mechanism for the generation of novel genes.

2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


2004 ◽  
Vol 186 (21) ◽  
pp. 7123-7133 ◽  
Author(s):  
F. Chris Minion ◽  
Elliot J. Lefkowitz ◽  
Melissa L. Madsen ◽  
Barbara J. Cleary ◽  
Steven M. Swartzell ◽  
...  

ABSTRACT We present the complete genome sequence of Mycoplasma hyopneumoniae, an important member of the porcine respiratory disease complex. The genome is composed of 892,758 bp and has an average G+C content of 28.6 mol%. There are 692 predicted protein coding sequences, the average protein size is 388 amino acids, and the mean coding density is 91%. Functions have been assigned to 304 (44%) of the predicted protein coding sequences, while 261 (38%) of the proteins are conserved hypothetical proteins and 127 (18%) are unique hypothetical proteins. There is a single 16S-23S rRNA operon, and there are 30 tRNA coding sequences. The cilium adhesin gene has six paralogs in the genome, only one of which contains the cilium binding site. The companion gene, P102, also has six paralogs. Gene families constitute 26.3% of the total coding sequences, and the largest family is the 34-member ABC transporter family. Protein secretion occurs through a truncated pathway consisting of SecA, SecY, SecD, PrsA, DnaK, Tig, and LepA. Some highly conserved eubacterial proteins, such as GroEL and GroES, are notably absent. The DnaK-DnaJ-GrpR complex is intact, providing the only control over protein folding. There are several proteases that might serve as virulence factors, and there are 53 coding sequences with prokaryotic lipoprotein lipid attachment sites. Unlike other mycoplasmas, M. hyopneumoniae contains few genes with tandem repeat sequences that could be involved in phase switching or antigenic variation. Thus, it is not clear how M. hyopneumoniae evades the immune response and establishes a chronic infection.


2020 ◽  
Author(s):  
Richard V. Miller ◽  
Rafik Neme ◽  
Derek M. Clay ◽  
Jananan S. Pathmanathan ◽  
Michael W. Lu ◽  
...  

AbstractThe germline-soma divide is a fundamental distinction in developmental biology, and different genes are expressed in germline and somatic cells throughout metazoan life cycles. Ciliates, a group of microbial eukaryotes, exhibit germline-somatic nuclear dimorphism within a single cell with two different genomes. The ciliate Oxytricha trifallax undergoes massive RNA-guided DNA elimination and genome rearrangement to produce a new somatic macronucleus (MAC) from a copy of the germline micronucleus (MIC). This process eliminates noncoding DNA sequences that interrupt genes and also deletes hundreds of germline-limited open reading frames (ORFs) that are transcribed during genome rearrangement. Here, we update the set of transcribed germline-limited ORFs (TGLOs) in O. trifallax. We show that TGLOs tend to be expressed during nuclear development and then are absent from the somatic MAC. We also demonstrate that exposure to synthetic RNA can reprogram TGLO retention in the somatic MAC and that TGLO retention leads to transcription outside the normal developmental program. These data suggest that TGLOs represent a group of developmentally regulated protein coding sequences whose gene expression is terminated by DNA elimination.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 982
Author(s):  
Maksim Makarenko ◽  
Alexander Usatov ◽  
Tatiana Tatarinova ◽  
Kirill Azarin ◽  
Alexey Kovalevich ◽  
...  

The genus Helianthus is a diverse taxonomic group with approximately 50 species. Most sunflower genomic investigations are devoted to economically valuable species, e.g., H. annuus, while other Helianthus species, especially perennial, are predominantly a blind spot. In the current study, we have assembled the complete mitogenomes of two perennial species: H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). We analyzed their sequences and gene profiles in comparison to the available complete mitogenomes of H. annuus. Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when compared with each other and with annual sunflowers (H. annuus). Common mitochondrial open reading frames (ORFs) (orf117, orf139, and orf334) in sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were identified. The maintenance of plastid-derived coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mitochondrial DNA (mtDNA) coding sequences in the Helianthus genus.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.


2015 ◽  
Author(s):  
Anil Raj ◽  
Sidney H. Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I. Li ◽  
...  

AbstractAccurate annotation of protein coding regions is essential for understanding how genetic information is translated into biological functions. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,273 previously unannotated coding sequences, including 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. The novel sequences exhibit significant signatures of selective constraint in the reading frames of the inferred proteins, suggesting that many of these are functional. Nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Our work significantly expands the set of known coding regions in humans.


eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Anil Raj ◽  
Sidney H Wang ◽  
Heejung Shim ◽  
Arbel Harpak ◽  
Yang I Li ◽  
...  

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.


Microbiology ◽  
2011 ◽  
Vol 157 (3) ◽  
pp. 760-773 ◽  
Author(s):  
Hagai Rechnitzer ◽  
Elzbieta Brzuszkiewicz ◽  
Axel Strittmatter ◽  
Heiko Liesegang ◽  
Inna Lysnyansky ◽  
...  

We present the complete genomic sequence of Mycoplasma fermentans, an organism suggested to be associated with the pathogenesis of rheumatoid arthritis in humans. The genome is composed of 977 524 bp and has a mean G+C content of 26.95 mol%. There are 835 predicted protein-coding sequences and a mean coding density of 87.6 %. Functions have been assigned to 58.8 % of the predicted protein-coding sequences, while 18.4 % of the proteins are conserved hypothetical proteins and 22.8 % are hypothetical proteins. In addition, there are two complete rRNA operons and 36 tRNA coding sequences. The largest gene families are the ABC transporter family (42 members), and the functionally heterogeneous group of lipoproteins (28 members), which encode the characteristic prokaryotic cysteine ‘lipobox’. Protein secretion occurs through a pathway consisting of SecA, SecD, SecE, SecG, SecY and YidC. Some highly conserved eubacterial proteins, such as GroEL and GroES, are notably absent. The genes encoding DnaK-DnaJ-GrpE and Tig, forming the putative complex of chaperones, are intact, providing the only known control over protein folding. Eighteen nucleases and 17 proteases and peptidases were detected as well as three genes for the thioredoxin-thioreductase system. Overall, this study presents insights into the physiology of M. fermentans, and provides several examples of the genetic basis of systems that might function as virulence factors in this organism.


2001 ◽  
Vol 353 (2) ◽  
pp. 403-409 ◽  
Author(s):  
Marina FRANCESCHETTI ◽  
Colin HANFREY ◽  
Sonia SCARAMAGLI ◽  
Patrizia TORRIGIANI ◽  
Nello BAGNI ◽  
...  

S-Adenosyl-L-methionine decarboxylase (AdoMetDC; EC 4.1.1.50) is one of the key regulatory enzymes in the biosynthesis of polyamines. Isolation of genomic and cDNA sequences from rice and Arabidopsis had indicated that this enzyme is encoded by a small multigene family in monocot and dicot plants. Analysis of rice, maize and Arabidopsis AdoMetDC cDNA species revealed that the monocot enzyme possesses an extended C-terminus relative to dicot and human enzymes. Interestingly, we discovered that all expressed plant AdoMetDC mRNA 5´ leader sequences contain a highly conserved pair of overlapping upstream open reading frames (uORFs) that overlap by one base. The 5´ tiny uORF consists of two or three codons and the 3´ small uORF encodes 50Ő54 residues. Sequences of the small uORFs are highly conserved between monocot, dicot and gymnosperm AdoMetDC mRNA species and the C-terminus of the plant small uORFs is conserved with the C-terminus of nematode AdoMetDC uORFs; such a conserved arrangement is strongly suggestive of a translational regulatory mechanism. No introns were found in the main AdoMetDC proenzyme ORF from any of the plant genes encoding AdoMetDC, whereas introns were found in conserved positions flanking the overlapping uORFs. The absence of the furthest 3´ intron from the Arabidopsis gene encoding AdoMetDC2 suggests that this intron was lost recently. Reverse-transcriptase-mediated PCR analysis of the two Arabidopsis genes for AdoMetDC indicated that AdoMetDC1 is abundant and ubiquitous, whereas the gene for AdoMetDC2 is expressed preferentially in leaves and inflorescences. Investigation of recently released Arabidopsis genome sequences has revealed that in addition to the two genes encoding AdoMetDC isolated as part of the present work, four additional genes are present in Arabidopsis but they are probably not expressed.


Sign in / Sign up

Export Citation Format

Share Document