scholarly journals Design to Data for mutants of β-glucosidase B from Paenibacillus polymyxa: M319C, T431I, and K337D

2019 ◽  
Author(s):  
Peishan Huang ◽  
Stephanie C. Contreras ◽  
Eliana Bloomfield ◽  
Kristine Schmitz ◽  
Augustine Arredondo ◽  
...  

ABSTRACTThe use of computational tools has become an increasingly popular tool for engineering protein function. While there are numerous examples of computational tools enabling the design of novel protein functions, there remains room for improvement in both prediction accuracy and success. To improve algorithms for functional and stability predictions, we have initiated the development of a data set designed to be used for training new computational algorithms for enzyme design. To date our dataset is composed of over 129 mutants with associated expression levels, kinetic data, and thermal stability for the enzyme β-glucosidase B (BglB) from Paenibacillus polymyxa. In this study, we introduced three new variants (M319C, T431I, and K337D) to our existing dataset with the goal of cultivating a larger dataset to train new design algorithms and more broadly explore structure-function relationships in BglB.

2016 ◽  
Author(s):  
Morgan N. Price ◽  
Kelly M. Wetmore ◽  
R. Jordan Waters ◽  
Mark Callaghan ◽  
Jayashree Ray ◽  
...  

SummaryThe function of nearly half of all protein-coding genes identified in bacterial genomes remains unknown. To systematically explore the functions of these proteins, we generated saturated transposon mutant libraries from 25 diverse bacteria and we assayed mutant phenotypes across hundreds of distinct conditions. From 3,903 genome-wide mutant fitness assays, we obtained 14.9 million gene phenotype measurements and we identified a mutant phenotype for 8,487 proteins with previously unknown functions. The majority of these hypothetical proteins (57%) had phenotypes that were either specific to a few conditions or were similar to that of another gene, thus enabling us to make informed predictions of protein function. For 1,914 of these hypothetical proteins, the functional associations are conserved across related proteins from different bacteria, which confirms that these associations are genuine. This comprehensive catalogue of experimentally-annotated protein functions also enables the targeted exploration of specific biological processes. For example, sensitivity to a DNA-damaging agent revealed 28 known families of DNA repair proteins and 11 putative novel families. Across all sequenced bacteria, 14% of proteins that lack detailed annotations have an ortholog with a functional association in our data set. Our study demonstrates the utility and scalability of high-throughput genetics for large-scale annotation of bacterial proteins and provides a vast compendium of experimentally-determined protein functions across diverse bacteria.


2017 ◽  
Author(s):  
Pin-San Xu ◽  
Jun Luo ◽  
Tong-Yi Dou

Most biological processes within a cell are carried out by protein-protein interaction (PPI) networks, or so called interactomics. Therefore, identification of PPIs is crucial to elucidating protein functions and further understanding of various cellular biological processes. Currently, a series of high-throughput experimental technologies for detect PPIs have been presented. However, the time-consuming and labor-driven characteristics of these methods forced people to turn to virtual technology for PPIs prediction. Herein, we developed a new predictor which uses stacking algorithm with information extraction by wavelet transform. When applied on the Saccharomyces cerevisiae PPI dataset, the proposed method got a prediction accuracy of 83.35% with sensitivity of 92.95% at the specificity of 65.41%. An independent data set of 2726 Helicobacter pylori PPIs was also used to evaluate this prediction model, and the prediction accuracy is 80.39%, which is better than that of most existing methods.


2017 ◽  
Author(s):  
Pin-San Xu ◽  
Jun Luo ◽  
Tong-Yi Dou

Most biological processes within a cell are carried out by protein-protein interaction (PPI) networks, or so called interactomics. Therefore, identification of PPIs is crucial to elucidating protein functions and further understanding of various cellular biological processes. Currently, a series of high-throughput experimental technologies for detect PPIs have been presented. However, the time-consuming and labor-driven characteristics of these methods forced people to turn to virtual technology for PPIs prediction. Herein, we developed a new predictor which uses stacking algorithm with information extraction by wavelet transform. When applied on the Saccharomyces cerevisiae PPI dataset, the proposed method got a prediction accuracy of 83.35% with sensitivity of 92.95% at the specificity of 65.41%. An independent data set of 2726 Helicobacter pylori PPIs was also used to evaluate this prediction model, and the prediction accuracy is 80.39%, which is better than that of most existing methods.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2007 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Rafael Nascimento ◽  
Paulo A. Zaini ◽  
Hossein Gouran ◽  
Basuthkar J. Rao ◽  
...  

Background.Xylella fastidiosa, the causative agent of various plant diseases including Pierce’s disease in the US, and Citrus Variegated Chlorosis in Brazil, remains a continual source of concern and economic losses, especially since almost all commercial varieties are sensitive to this Gammaproteobacteria. Differential expression of proteins in infected tissue is an established methodology to identify key elements involved in plant defense pathways.Methods. In the current work, we developed a methodology named CHURNER that emphasizes relevant protein functions from proteomic data, based on identification of proteins with similar structures that do not necessarily have sequence homology. Such clustering emphasizes protein functions which have multiple copies that are up/down-regulated, and highlights similar proteins which are differentially regulated. As a working example we present proteomic data enumerating differentially expressed proteins in xylem sap from grapevines that were infected withX. fastidiosa.Results. Analysis of this data by CHURNER highlighted pathogenesis related PR-1 proteins, reinforcing this as the foremost protein function in xylem sap involved in the grapevine defense response toX. fastidiosa.β-1, 3-glucanase, which has both anti-microbial and anti-fungal activities, is also up-regulated. Simultaneously, chitinases are found to be both up and down-regulated by CHURNER, and thus the net gain of this protein function loses its significance in the defense response.Discussion. We demonstrate how structural data can be incorporated in the pipeline of proteomic data analysis prior to making inferences on the importance of individual proteins to plant defense mechanisms. We expect CHURNER to be applicable to any proteomic data set.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2006 ◽  
Vol 5 (7) ◽  
pp. 1711-1720 ◽  
Author(s):  
Deepti Bhushan ◽  
Aarti Pandey ◽  
Arnab Chattopadhyay ◽  
Mani Kant Choudhary ◽  
Subhra Chakraborty ◽  
...  

2021 ◽  
Vol 28 ◽  
Author(s):  
Yu-He Yang ◽  
Jia-Shu Wang ◽  
Shi-Shi Yuan ◽  
Meng-Lu Liu ◽  
Wei Su ◽  
...  

: Protein-ligand interactions are necessary for majority protein functions. Adenosine-5’-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.


2021 ◽  
Vol 90 (1) ◽  
Author(s):  
Jihye Seong ◽  
Michael Z. Lin

Optobiochemical control of protein activities allows the investigation of protein functions in living cells with high spatiotemporal resolution. Over the last two decades, numerous natural photosensory domains have been characterized and synthetic domains engineered and assembled into photoregulatory systems to control protein function with light.Here, we review the field of optobiochemistry, categorizing photosensory domains by chromophore, describing photoregulatory systems by mechanism of action, and discussing protein classes frequently investigated using optical methods. We also present examples of how spatial or temporal control of proteins in living cells has provided new insights not possible with traditional biochemical or cell biological techniques. Expected final online publication date for the Annual Review of Biochemistry, Volume 90 is June 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2022 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero


Machines ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 80
Author(s):  
Yalong Li ◽  
Fan Yang ◽  
Wenting Zha ◽  
Licheng Yan

With the continuous optimization of energy structures, wind power generation has become the dominant new energy source. The strong random fluctuation of natural wind will bring challenges to power system dispatching, so it is necessary to predict wind power. In order to improve the short-term prediction accuracy of regional wind power, this paper proposes a new combination prediction model based on convolutional neural network (CNN) and similar days analysis. Firstly, the least square fitting and batch normalization (BN) are used to preprocess the data, and then the recent historical wind power data set for CNN is established. Secondly, the Pearson correlation coefficient and cosine similarity combination method are utilized to find similar days in the long-term data set, and the prediction model based on similar days is constructed by the weighting method. Finally, based on the particle swarm optimization (PSO) method, a combined forecasting model is established. The results show that the combined model can accurately predict the future short-term wind power curve, and the prediction accuracy is improved to different extents compared to a single method.


Sign in / Sign up

Export Citation Format

Share Document