Predicting the Function of Hypothetical Genes in Genomes of Bioleaching Microorganisms

2009 ◽  
Vol 71-73 ◽  
pp. 203-206
Author(s):  
F.J. Ossandón ◽  
G. Rivera ◽  
F. Lazo ◽  
David S. Holmes

A particularly challenging problem in genome annotation is to attribute function to genes annotated as “hypothetical, no known function”. These typically account for about 40% of all genes regardless of the genome. Some of these are “orphan” genes and are not found in any other genome. Some of these could encode species specific proteins and so are particularly interesting for evaluating novel metabolic potential and for understanding the evolution of genes and genomes. Several similarity and non-similarity bioinformatics tools exist that help predict function of hypotheticals, but none are able to suggest function for more than a few percent and the annotation of the others remains a formidable task. We have developed a bioinformatics tool called AlterORF (www.AlterORF.cl) that is able to identify alternate open reading frames (ORFs) embedded within annotated genes. Analysis of over 2 million genes in over 700 completely sequenced genomes reveals that alternate ORFs of substantial length (potentially encoding 70 amino acids or more) are surprisingly common, especially in G+C rich genomes. During our examination of these alternate ORFs, we uncovered hundreds of examples where the alternate ORF has a significant hit with databases of motifs and domains (e.g. CDD, Pfam) and where the actual annotated gene is described as hypothetical and has no database match. This strongly suggests that the annotated gene has been incorrectly identified and that the alternate ORF is the real gene. We describe the evaluation of the following genomes of bioleaching microorganisms and others that reside in similar ecological niches using AlterORF: Acidithiobacillus ferrooxidans (2 strains), Leptospirillum type II, Methylacidiphilum infernorum, Picrophilus torridus, Sulfolobus acidocaldarius, S. solfataricus, S. tokodaii, Thermodesulfovibrio yellowstonii, Thermoplasma acidophilum and T. volcanium. Examples of novel genes from these microorganisms and their suggested roles in metabolism will be described.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.


Reproduction ◽  
2016 ◽  
Vol 152 (6) ◽  
pp. 727-739 ◽  
Author(s):  
Andrea Miccoli ◽  
Ike Olivotto ◽  
Andrea De Felice ◽  
Iole Leonori ◽  
Oliana Carnevali

The European anchovy Engraulis encrasicolus, a member of the Clupeiformes order, holds a great biological and economical importance. In the past, this species was mostly investigated with the aim of assessing its reproductive biology, trophic ecology, population dynamics and the relations existing with the physical environment. At present days, though, an almost complete lack of information afflicts its neuroendocrinology and reproductive physiology. The hypothalamic–pituitary–gonadal (HPG) axis at its highest levels was herein investigated. In this study, the gonadotropin-releasing hormone (GnRH), a neuropeptide underlying many reproduction-related processes, the most critical of which is the stimulation of gonadotropin synthesis and secretion from the pituitary gland, was cloned. Three forms (salmon GnRH, chicken-II GnRH and the species-specific type) were characterized in their full-length open-reading frames and, in accordance with other Clupeiformes species, the distinctive one was found to be the herring-type GnRH. We qualitatively and semiquantitatively evaluated the localizations of expressions and the temporal transcription patterns of the three GnRH forms in male and female specimens throughout their reproductive cycle as well as described their phylogeny with regard to teleost GnRH lineages, and, specifically, to other Clupeiformes species.


Author(s):  
Chaitanya Erady ◽  
Krishna Amin ◽  
Temiloluwa O. A. E. Onilogbo ◽  
Jakub Tomasik ◽  
Rebekah Jukes-Jones ◽  
...  

AbstractSchizophrenia (SCZ) and bipolar disorder are debilitating neuropsychiatric disorders arising from a combination of environmental and genetic factors. Novel open reading frames (nORFs) are genomic loci that give rise to previously uncharacterized transcripts and protein products. In our previous work, we have shown that nORFs can be biologically regulated and that they may play a role in cancer and rare diseases. More importantly, we have shown that nORFs may emerge in accelerated regions of the genome giving rise to species-specific functions. We hypothesize that nORFs represent a potentially important group of biological factors that may contribute to SCZ and bipolar disorder pathophysiology. Human accelerated regions (HARs) are genomic features showing human-lineage-specific rapid evolution that may be involved in biological regulation and have additionally been found to associate with SCZ genes. Transposable elements (TEs) are another set of genomic features that have been shown to regulate gene expression. As with HARs, their relevance to SCZ has also been suggested. Here, nORFs are investigated in the context of HARs and TEs. This work shows that nORFs whose expression is disrupted in SCZ and bipolar disorder are in close proximity to HARs and TEs and that some of them are significantly associated with SCZ and bipolar disorder genomic hotspots. We also show that nORF encoded proteins can form structures and potentially constitute novel drug targets.


2008 ◽  
Vol 75 (3) ◽  
pp. 811-822 ◽  
Author(s):  
Ralf Rosenstein ◽  
Christiane Nerz ◽  
Lalitha Biswas ◽  
Alexandra Resch ◽  
Guenter Raddatz ◽  
...  

ABSTRACT The Staphylococcus carnosus genome has the highest GC content of all sequenced staphylococcal genomes, with 34.6%, and therefore represents a species that is set apart from S. aureus, S. epidermidis, S. saprophyticus, and S. haemolyticus. With only 2.56 Mbp, the genome belongs to a family of smaller staphylococcal genomes, and the ori and ter regions are asymmetrically arranged with the replichores I (1.05 Mbp) and II (1.5 Mbp). The events leading up to this asymmetry probably occurred not that long ago in evolution, as there was not enough time to approach the natural tendency of a physical balance. Unlike the genomes of pathogenic species, the TM300 genome does not contain mobile elements such as plasmids, insertion sequences, transposons, or STAR elements; also, the number of repeat sequences is markedly decreased, suggesting a comparatively high stability of the genome. While most S. aureus genomes contain several prophages and genomic islands, the TM300 genome contains only one prophage, ΦTM300, and one genomic island, νSCA1, which is characterized by a mosaic structure mainly composed of species-specific genes. Most of the metabolic core pathways are present in the genome. Some open reading frames are truncated, which reflects the nutrient-rich environment of the meat starter culture, making some functions dispensable. The genome is well equipped with all functions necessary for the starter culture, such as nitrate/nitrite reduction, various sugar degradation pathways, two catalases, and nine osmoprotection systems. The genome lacks most of the toxins typical of S. aureus as well as genes involved in biofilm formation, underscoring the nonpathogenic status.


2004 ◽  
Vol 186 (2) ◽  
pp. 518-534 ◽  
Author(s):  
Jens Klockgether ◽  
Oleg Reva ◽  
Karen Larbig ◽  
Burkhard Tümmler

ABSTRACT The Pseudomonas aeruginosa plasmid pKLC102 coexists as a plasmid and a genome island in clone C strains. Whereas the related plasmid pKLK106 reversibly recombines with P. aeruginosa clone K chromosomes at one of the two tRNALys genes, pKLC102 is incorporated into the tRNALys gene only close to the pilA locus. Targeting of the other tRNALys copy in the chromosome is blocked by a 23,395-bp mosaic of truncated PAO open reading frames, transposons, and pKLC102 homologs. Annotation and phylogenetic analysis of the large 103,532-bp pKLC102 sequence revealed that pKLC102 is a hybrid of plasmid and phage origin. The plasmid lineage conferred oriV and genes for replication, partitioning, and conjugation, including a pil cluster encoding type IV thin sex pili and an 8,524-bp chvB glucan synthetase gene that is known to be a major determinant for host tropism and virulence. The phage lineage conferred integrase, att, and a syntenic set of conserved hypothetical genes also observed in the tRNAGly-associated genome islands of P. aeruginosa clone C chromosomes. In subgroup C isolates from patients with cystic fibrosis, pKLC102 was irreversibly fixed into the chromosome by the insertion of the large 23,061-bp class I transposon TNCP23, which is a composite of plasmid, integron, and IS6100 elements. Intramolecular transposition of a copy of IS6100 led to chromosomal inversions and disruption of plasmid synteny. The case of pKLC102 in P. aeruginosa clone C documents the intraclonal evolution of a genome island from a mobile ancestor via a reversibly integrated state to irreversible incorporation and dissipation in the chromosome.


2001 ◽  
Vol 82 (9) ◽  
pp. 2041-2050 ◽  
Author(s):  
Hiroaki Okamoto ◽  
Tsutomu Nishizawa ◽  
Masaharu Takahashi ◽  
Akio Tawara ◽  
Yihong Peng ◽  
...  

TT virus (TTV) was recovered from the sera of tupaias (Tupaia belangeri chinensis) by PCR using primers derived from the noncoding region of the human TTV genome, and its entire genomic sequence was determined. One tupaia TTV isolate (Tbc-TTV14) consisted of only 2199 nucleotides (nt) and had three open reading frames (ORFs), spanning 1506 nt (ORF1), 177 nt (ORF2) and 642 nt (ORF3), which were in the same orientation as the ORFs of the human prototype TTV (TA278). ORF3 was presumed to arise from a splicing of TTV mRNA, similar to reported human TTVs whose spliced mRNAs have been identified, and encoded a joint protein of 214 amino acids with a Ser-, Lys- and Arg-rich sequence at the C terminus. Tbc-TTV14 was less than 50% similar to previously reported TTVs of 3·4–3·9 kb and TTV-like mini viruses (TLMVs) of 2·8–3·0 kb isolated from humans and non-human primates, and known animal circoviruses. Although Tbc-TTV14 has a genomic length similar to animal circoviruses (1·8–2·3 kb), Tbc-TTV14 resembled TTVs and TLMVs with regard to putative genomic organization and transcription profile. Conserved motifs were commonly observed in the coding and noncoding regions of the Tbc-TTV14 genome and in all TTV and TLMV genomes. Phylogenetic analysis revealed that Tbc-TTV14 is the closest to TLMVs, and is closer to TTVs isolated from tamarin and douroucouli than to TTVs isolated from humans and chimpanzees. These results indicate that tupaias are naturally infected with a new TTV species that has not been identified among primates.


2013 ◽  
Vol 94 (6) ◽  
pp. 1365-1372 ◽  
Author(s):  
Christian E. Lange ◽  
Elisabeth Vetsch ◽  
Mathias Ackermann ◽  
Claude Favrot ◽  
Kurt Tobler

Papillomaviruses appear to be species-specific pathogens, and it was suggested that each animal species might harbour its own set of papillomaviruses. However, all approaches addressing the underlying evolutionary phenomena still suffer from very limited data about animal papillomaviruses. In case of the horse for example, only three equine papillomaviruses (EcPVs) have been identified. To further address the situation in this host, suspected papillomavirus-associated lesions were tested for EcPV DNA. Four novel EcPV types were detected and their genomes entirely cloned and sequenced. They display the characteristic organization, with early (E) and late (L) regions harbouring the seven classical open reading frames divided by non-coding regions. They were named EcPVs 4, 5, 6 and 7, according to their dissimilarity to other papillomaviruses. Most L1 nucleotide identities were shared with EcPV2 in case of EcPV4 (62 %) and EcPV5 (60 %) or with EcPV3 in case of EcPV6 (70 %) and EcPV7 (71 %). Thus, EcPVs 4 and 5 may establish novel species within the genus Dyoiota, while EcPVs 6 and 7 might fit into the genus Dyorho and belong to the same species as EcPV3. They were found in genital plaques (EcPV4), aural plaques (EcPV5, EcPV6) or penile masses (EcPV7). Interestingly, PCR analysis revealed the DNA of EcPV2 and EcPV4 as well as of EcPV3 and EcPV6 together in the same tissue samples, respectively. In conclusion, the DNA of four novel EcPV types was identified and cloned. They cluster with the known types and support broad genetic EcPV diversity in at least two of the known clades. Furthermore, PCR assays also provide evidence for EcPV co-infections in horses.


2005 ◽  
Vol 187 (21) ◽  
pp. 7292-7308 ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Shinya Watanabe ◽  
Tadashi Baba ◽  
Harumi Yuzawa ◽  
Teruyo Ito ◽  
...  

ABSTRACT Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the “oriC environ,” likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance.


1998 ◽  
Vol 180 (10) ◽  
pp. 2701-2710 ◽  
Author(s):  
Ekaterina V. Pestova ◽  
Donald A. Morrison

ABSTRACT Although more than a dozen new proteins are produced whenStreptococcus pneumoniae cells become competent for genetic transformation, only a few of the corresponding genes have been identified to date. To find genes responsible for the production of competence-specific proteins, a random lacZ transcriptional fusion library was constructed in S. pneumoniae by using the insertional lacZ reporter vector pEVP3. Screening the library for clones with competence-specific β-galactosidase (β-Gal) production yielded three insertion mutants with induced β-Gal levels of about 4, 10, and 40 Miller units. In all three clones, activation of the lacZ reporter correlated with competence and depended on competence-stimulating peptide. Chromosomal loci adjacent to the integrated vector were subcloned from the insertion mutants, and their nucleotide sequences were determined. Genes at two of the loci exhibited strong similarity to parts ofBacillus subtilis com operons. One locus contained open reading frames (ORFs) homologous to the comEA andcomEC genes in B. subtilis but lacked acomEB homolog. A second locus contained four ORFs with homology to the B. subtilis comG gene ORFs 1 to 4, butcomG gene ORFs 5 to 7 were replaced in S. pneumoniae with an ORF encoding a protein homologous to transport ATP-binding proteins. Genes at all three loci were confirmed to be required for transformation by mutagenesis using pEVP3 for insertion duplications or an erm cassette for gene disruptions.


1998 ◽  
Vol 180 (11) ◽  
pp. 3007-3012 ◽  
Author(s):  
Joseph C. Oppon ◽  
Robert J. Sarnovsky ◽  
Nancy L. Craig ◽  
Douglas E. Rawlings

ABSTRACT The region downstream of the Thiobacillus ferrooxidansATCC 33020 atp operon was examined, and the genes encodingN-acetylglucosamine-1-uridyltransferase (glmU) and glucosamine synthetase (glmS) were found. ThisatpEFHAGDC-glmUS gene order is identical to that ofEscherichia coli. The T. ferrooxidans glmS gene was shown to complement E. coli glmS mutants for growth on minimal medium lacking glucosamine. A Tn7-like transposon, Tn5468, was found inserted into the region immediately downstream of the glmS gene in a manner similar to the site-specific insertion of transposon Tn7 within the termination region of the E. coli glmS gene. Tn5468 was sequenced, and Tn7-like terminal repeat sequences as well as several open reading frames which are related to the Tn7 transposition genes tnsA,tnsB, tnsC, and tnsD were found. Tn5468 is the closest relative of Tn7 to have been characterized to date. Southern blot hybridization indicated that a similar or identical transposon was present in three T. ferrooxidans strains isolated from different parts of the world but not in two Thiobacillus thiooxidans strains or aLeptospirillum ferrooxidans strain. Since T. ferrooxidans is an obligately acidophilic autotroph and E. coli is a heterotroph, ancestors of the Tn7-like transposons must have been active in a variety of physiologically different bacteria so that their descendants are now found in bacteria that occupy very different ecological niches.


Sign in / Sign up

Export Citation Format

Share Document