scholarly journals Terpene synthases are widely distributed in bacteria

2014 ◽  
Vol 112 (3) ◽  
pp. 857-862 ◽  
Author(s):  
Yuuki Yamada ◽  
Tomohisa Kuzuyama ◽  
Mamoru Komatsu ◽  
Kazuo Shin-ya ◽  
Satoshi Omura ◽  
...  

Odoriferous terpene metabolites of bacterial origin have been known for many years. In genome-sequencedStreptomycetaceaemicroorganisms, the vast majority produces the degraded sesquiterpene alcohol geosmin. Two minor groups of bacteria do not produce geosmin, with one of these groups instead producing other sesquiterpene alcohols, whereas members of the remaining group do not produce any detectable terpenoid metabolites. Because bacterial terpene synthases typically show no significant overall sequence similarity to any other known fungal or plant terpene synthases and usually exhibit relatively low levels of mutual sequence similarity with other bacterial synthases, simple correlation of protein sequence data with the structure of the cyclized terpene product has been precluded. We have previously described a powerful search method based on the use of hidden Markov models (HMMs) and protein families database (Pfam) search that has allowed the discovery of monoterpene synthases of bacterial origin. Using an enhanced set of HMM parameters generated using a training set of 140 previously identified bacterial terpene synthase sequences, a Pfam search of 8,759,463 predicted bacterial proteins from public databases and in-house draft genome data has now revealed 262 presumptive terpene synthases. The biochemical function of a considerable number of these presumptive terpene synthase genes could be determined by expression in a specially engineered heterologousStreptomyceshost and spectroscopic identification of the resulting terpene products. In addition to a wide variety of terpenes that had been previously reported from fungal or plant sources, we have isolated and determined the complete structures of 13 previously unidentified cyclic sesquiterpenes and diterpenes.

DNA Research ◽  
2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Kin H Lau ◽  
Wajid Waheed Bhat ◽  
John P Hamilton ◽  
Joshua C Wood ◽  
Brieanne Vaillancourt ◽  
...  

Abstract Chiococca alba (L.) Hitchc. (snowberry), a member of the Rubiaceae, has been used as a folk remedy for a range of health issues including inflammation and rheumatism and produces a wealth of specialized metabolites including terpenes, alkaloids, and flavonoids. We generated a 558 Mb draft genome assembly for snowberry which encodes 28,707 high-confidence genes. Comparative analyses with other angiosperm genomes revealed enrichment in snowberry of lineage-specific genes involved in specialized metabolism. Synteny between snowberry and Coffea canephora Pierre ex A. Froehner (coffee) was evident, including the chromosomal region encoding caffeine biosynthesis in coffee, albeit syntelogs of N-methyltransferase were absent in snowberry. A total of 27 putative terpene synthase genes were identified, including 10 that encode diterpene synthases. Functional validation of a subset of putative terpene synthases revealed that combinations of diterpene synthases yielded access to products of both general and specialized metabolism. Specifically, we identified plausible intermediates in the biosynthesis of merilactone and ribenone, structurally unique antimicrobial diterpene natural products. Access to the C. alba genome will enable additional characterization of biosynthetic pathways responsible for health-promoting compounds in this medicinal species.


mBio ◽  
2012 ◽  
Vol 3 (1) ◽  
Author(s):  
Kelli L. Palmer ◽  
Paul Godfrey ◽  
Allison Griggs ◽  
Veronica N. Kos ◽  
Jeremy Zucker ◽  
...  

ABSTRACTThe enterococci are Gram-positive lactic acid bacteria that inhabit the gastrointestinal tracts of diverse hosts. However,Enterococcus faeciumandE. faecalishave emerged as leading causes of multidrug-resistant hospital-acquired infections. The mechanism by which a well-adapted commensal evolved into a hospital pathogen is poorly understood. In this study, we examined high-quality draft genome data for evidence of key events in the evolution of the leading causes of enterococcal infections, includingE. faecalis,E. faecium,E.casseliflavus, andE.gallinarum. We characterized two clades within what is currently classified asE. faeciumand identified traits characteristic of each, including variation in operons for cell wall carbohydrate and putative capsule biosynthesis. We examined the extent of recombination between the twoE. faeciumclades and identified two strains with mosaic genomes. We determined the underlying genetics for the defining characteristics of the motile enterococciE.casseliflavusandE.gallinarum. Further, we identified species-specific traits that could be used to advance the detection of medically relevant enterococci and their identification to the species level.IMPORTANCEThe enterococci, in particular, vancomycin-resistant enterococci, have emerged as leading causes of multidrug-resistant hospital-acquired infections. In this study, we examined genome sequence data to define traits with the potential to influence host-microbe interactions and to identify sequences and biochemical functions that could form the basis for the rapid identification of enterococcal species or lineages of importance in clinical and environmental samples.


2020 ◽  
Author(s):  
Janani Durairaj ◽  
Elena Melillo ◽  
Harro J Bouwmeester ◽  
Jules Beekwilder ◽  
Dick de Ridder ◽  
...  

AbstractSesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.Author summaryPredicting enzyme function is a popular problem in the bioinformatics field that grows more pressing with the increase in protein sequences, and more attainable with the increase in experimentally characterized enzymes. Terpenes and terpenoids form the largest classes of natural products and find use in many drugs, flavouring agents, and perfumes. Terpene synthases catalyze the biosynthesis of terpenes via multiple cyclizations and carbocation rearrangements, generating a vast array of product skeletons. In this work, we present a three-pronged computational approach to predict carbocation specificity in sesquiterpene synthases, a subset of terpene synthases with one of the highest diversities of products. Using homology modelling, machine learning and co-evolutionary analysis, our approach combines sparse structural data, large amounts of uncharacterized sequence data, and the current set of experimentally characterized enzymes to provide insight into residues and structural regions that likely play a role in determining product specifcity. Similar techniques can be repurposed for function prediction and enzyme engineering in many other classes of enzymes.


2018 ◽  
Author(s):  
Hsin-Nan Lin ◽  
Ching-Tai Chen ◽  
Ting-Yi Sung ◽  
Wen-Lian Hsu

ABSTRACTThere is a growing gap between protein subcellular localization (PSL) data and protein sequence data, raising the need for computation methods to rapidly determine subcellular localizations for uncharacterized proteins. Currently, the most efficient computation method involves finding sequence-similar proteins (hereafter referred to as similar proteins) in the annotated database and transferring their annotations to the target protein. When a sequence-similarity search fails to find similar proteins, many PSL predictors adopt machine learning methods for the prediction of localization sites. We proposed a universal protein localization site predictor - UniLoc - to take advantage of implicit similarity among proteins through sequence analysis alone. The notion of related protein words is introduced to explore the localization site assignment of uncharacterized proteins. UniLoc is found to identify useful template proteins and produce reliable predictions when similar proteins were not available.


2020 ◽  
Author(s):  
Benzhong Fu ◽  
Olakunle Olawole ◽  
Gwyn A Beattie

Glutamicibacter sp.FBE-19 was isolated based on its strong antagonism to the cucurbit bacterial blight pathogen Erwinia tracheiphila on plates. Members of the Glutamicibacter genus can promote plant growth under saline conditions and antagonize fungi on plates via chitinolytic activity, but their production of antibacterial compounds has not been examined. Here we report the genome sequence of strain FBE-19. The genome is 3.85 Mbp with a G+C content of 60.1% and comprised of 3,791 genes. Genes that may contribute to its antagonistic activity include genes for the secondary metabolites stenothricin, salinosporamide A, a second beta-lactone compound, and a carotenoid. The Glutamicibacter sp. FBE-19 genome data may be a useful resource if this strain proves an effective biocontrol agent against E. tracheiphila.


2019 ◽  
Author(s):  
Carol A. Soderlund

AbstractDe novo transcriptome sequencing and analysis provides a way for researchers of non-model organisms to explore the differences between various conditions and species. These experiments are expensive and produce large-scale data. The results are typically not definitive but will lead to new hypotheses to study. Therefore, it is important that the results be reproducible, extensible, queryable, and easily available to all members of the team. Towards this end, the Transcriptome Computational Workbench (TCW) is a software package to perform the fundamental computations for transcriptome analysis (singleTCW) and comparative analysis (multiTCW). It is a Java-based desktop application that uses MySQL. The input to singleTCW is sequence and optional count files; the computations are sequence similarity annotation, gene ontology assignment, open reading frame (ORF) finding using hit information and 5th-order Markov models, and differential expression (DE). For DE analysis, TCW interfaces with an R script, where R scripts for edgeR and DEseq are provided, but the user can supply their own. TCW provides support for searching with the super-fast DIAMOND program against UniProt taxonomic databases, though the user can request BLAST and provide other databases to search against. The input to multiTCW is multiple singleTCW databases; the computations are homologous pair assignment, pairwise analysis (e.g. Ka/Ks) from codon-based alignments, clustering (bidirectional best hit, Closure, OrthoMCL, user-supplied), and cluster analysis and annotation. Both singleTCW and multiTCW provide a graphical interface for extensive query and display of the data. Example results are presented from three datasets: (i) a rhizome plant with de novo assembled contigs, (ii) a rhizome plant with gene models from a draft genome sequence, and (iii) a non-rhizome plant with gene models from a finished genome sequence. The two rhizome plants have replicate count data for rhizome, root, stem and leaf samples. The software is freely available at https://github.com/csoderlund/TCW.


2020 ◽  
Author(s):  
Hamid Bagheri ◽  
Robert Dyer ◽  
Andrew Severin ◽  
Hridesh Rajan

Abstract Background: Scientists around the world use NCBI’s non-redundant (NR) database to identify the taxonomic origin and functional annotation of their favorite protein sequences using BLAST. Unfortunately, due to the exponential growth of this database, many scientists do not have a good understanding of the contents of the NR database. There is a need for tools to explore the contents of large biological datasets, such as NR, to better understand the assumptions and limitations of the data they contain. Results: Protein sequence data, protein functional annotation, and taxonomic assignment from NCBI’s NR database were placed into a BoaG database, a domain-specific language and shared data science infrastructure for genomics, along with a CD-HIT clustering of all these protein sequences at different sequence similarity levels. We show that BoaG can efficiently perform queries on this large dataset to determine the average length of protein sequences and identify the most common taxonomic assignments and functional annotations. Using the clustering information, we also show that the non-redundant (NR) database has a considerable amount of annotation redundancy at the 95% similarity level. Conclusions: We implemented BoaG and provided a web-based interface to BoaG’s infrastructure that will help researchers to explore the dataset further. Researchers can submit queries and download the results or share them with others. Availability and implementation: The web-interface of the BoaG infrastructure can be accessed here: http://boa.cs.iastate.edu/boag. Please use user = boag and password = boag to login. Source code and other documentation are also provided as a GitHub repository: https://github.com/boalang/NR_Dataset.


2005 ◽  
Vol 03 (04) ◽  
pp. 803-819 ◽  
Author(s):  
ARUNKUMAR CHINNASAMY ◽  
WING-KIN SUNG ◽  
ANKUSH MITTAL

Due to the large volume of protein sequence data, computational methods to determine the structure class and the fold class of a protein sequence have become essential. Several techniques based on sequence similarity, Neural Networks, Support Vector Machines (SVMs), etc. have been applied. Since most of these classifiers use binary classifiers for multi-classification, there may be Nc2 classifiers required. This paper presents a framework using the Tree-Augmented Bayesian Networks (TAN) which performs multi-classification based on the theory of learning Bayesian Networks and using improved feature vector representation of (Ding et al., 2001).4 In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, hydrophobicity) for each class, which helps to further understand the complexity in protein structure. The experiments on the datasets used in three prominent recent works show that our approach is more accurate than other discriminative methods. The framework is implemented on the BAYESPROT web server and it is available at . More detailed results are also available on the above website.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


Data in Brief ◽  
2021 ◽  
Vol 35 ◽  
pp. 106784
Author(s):  
Chinda Chhe ◽  
Ayaka Uke ◽  
Sirilak Baramee ◽  
Umbhorn Ungkulpasvich ◽  
Chakrit Tachaapaikoon ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document