CLUSTERING MULTI-DOMAIN PROTEIN STRUCTURES IN THE ESSENTIAL DYNAMICS SUBSPACE

2013 ◽  
Vol 12 (08) ◽  
pp. 1341008 ◽  
Author(s):  
BIN WEN ◽  
YUNYU SHI ◽  
ZHIYONG ZHANG

A multi-domain protein is able to exist as equilibrium of different conformations in solution, which may be critical to its biological function. Besides experimental techniques, computational methods like molecular dynamics (MD) simulations are suitable to study inter-domain motions of the protein and sample different conformational states. A MD simulation usually generates a trajectory containing large amount of protein structures, and a post-processing cluster analysis would be necessary to group similar structures into clusters and identify these typical conformations of the multi-domain protein. In this paper, the widely used k-means clustering algorithm is implemented in the protein essential dynamics (ED) subspace defined by principal component analysis on the MD trajectory. Cluster analysis of the formin binding protein 21 (FBP21) tandem WW domains demonstrate that the k-means clustering results by measuring distances between structures in the ED subspace are superior to those by using other metrics like pairwise inter-domain residue distances.

2021 ◽  
Author(s):  
Pritam Biswas ◽  
Uttam Pal ◽  
Aniruddha Adhikari ◽  
Susmita Mondal ◽  
Ria Ghosh ◽  
...  

Conformational dynamics of macromolecules including enzymes are essential for their function. The present work reports the role of essential dynamics in alpha-chymotrypsin (CHT) which correlates with its catalytic activity. Detailed optical spectroscopy and classical molecular dynamics (MD) simulation were used to study thermal stability, catalytic activity and dynamical flexibility of the enzyme. The study of the enzyme kinetics reveals an optimum catalytic efficiency at 308K. Polarization gated fluorescence anisotropy with 8-anilino-1-napthelene sulfonate (ANS) have indicated increasing flexibility of the enzyme with an increase in temperature. Examination of the structure of CHT reveal the presence of five loop regions (LRs) around the catalytic S1 pocket. MD simulations have indicated that flexibility increases concurrently with temperature which decreases beyond optimum temperature. Principal component analysis (PCA) of the eigenvectors manifests essential dynamics and gatekeeping role of the five LRs surrounding the catalytic pocket which controls the enzyme activity.


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


2019 ◽  
Author(s):  
Aleksandra Badaczewska-Dawid ◽  
Andrzej Kolinski ◽  
Sebastian Kmiecik

SummaryConformational flexibility of protein structures can play an important role in protein function. The flexibility is often studied using computational methods, since experimental characterization can be difficult. Depending on protein system size; computational tools may require large computational resources or significant simplifications in the modeled systems to speed-up calculations. In this work, we present the protocols for efficient simulations of flexibility of folded protein structures that use coarse-grained simulation tools of different resolutions: medium, represented by CABS-flex, and low, represented by SUPRASS. We test the protocols using a set of 140 globular proteins and compare the results with structure fluctuations observed in MD simulations, ENM modeling and NMR ensembles. As demonstrated, CABS-flex predictions show high correlation to experimental and MD simulation data, while SURPASS is less accurate but promising in terms of future developments.


2021 ◽  
Author(s):  
John B. Lemos ◽  
Matheus R. S. Barbosa ◽  
Edric B. Troccoli ◽  
Alexsandro G. Cerqueira

This work aims to delimit the Direct Hydrocarbon Indicators (DHI) zones using the Gaussian Mixture Models (GMM) algorithm, an unsupervised machine learning method, over the FS8 seismic horizon in the seismic data of the Dutch F3 Field. The dataset used to perform the cluster analysis was extracted from the 3D seismic dataset. It comprises the following seismic attributes: Sweetness, Spectral Decomposition, Acoustic Impedance, Coherence, and Instantaneous Amplitude. The Principal Component Analysis (PCA) algorithm was applied in the original dataset for dimensionality reduction and noise filtering, and we choose the first three principal components to be the input of the clustering algorithm. The cluster analysis using the Gaussian Mixture Models was performed by varying the number of groups from 2 to 20. The Elbow Method suggested a smaller number of groups than needed to isolate the DHI zones. Therefore, we observed that four is the optimal number of clusters to highlight this seismic feature. Furthermore, it was possible to interpret other clusters related to the lithology through geophysical well log data.


2019 ◽  
Author(s):  
Carlos P. Modenutti ◽  
Juan I. Blanco Capurro ◽  
Roberta Ibba ◽  
Snežana Vasiljević ◽  
Mario Hensen ◽  
...  

SummaryUDP-glucose:glycoprotein glucosyltransferase (UGGT) is the only known glycoprotein folding quality control checkpoint in the eukaryotic glycoprotein secretory pathway. When the enzyme detects a misfolded glycoprotein in the Endoplasmic Reticulum (ER), it dispatches it for ER retention by re-glucosylating it on one of its N-linked glycans. Recent crystal structures of a fungal UGGT have suggested the enzyme is conformationally mobile. Here, a negative stain electron microscopy reconstruction of UGGT in complex with a monoclonal antibody confirms that the misfold-sensing N-terminal portion of UGGT and its C-terminal catalytic domain are tightly associated. Molecular Dynamics (MD) simulations capture UGGT in so far unobserved conformational states, giving new insights into the molecule’s flexibility. Principal component analysis of the MD trajectories affords a description of UGGT’s overall inter-domain motions, highlighting three types of inter-domain movements: bending, twisting and clamping. These inter-domain motions modify the accessible surface area of the enzyme’s central saddle, likely enabling the protein to recognize and re-glucosylate substrates of different sizes and shapes, and/or re-glucosylate N-linked glycans situated at variable distances from the site of misfold. We propose to name “Parodi limit” the maximum distance between a site of misfolding on a UGGT glycoprotein substrate and an N-linked glycan that monomeric UGGT can re-glucosylate on the same glycoprotein. MD simulations estimate the Parodi limit to be around 60-70 Å. Re-glucosylation assays using UGGT deletion mutants suggest that the TRXL2 domain is necessary for activity against urea-misfolded bovine thyroglobulin. Taken together, our findings support a “one-size-fits-all adjustable spanner” substrate recognition model, with a crucial role for the TRXL2 domain in the recruitment of misfolded substrates to the enzyme’s active site.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


Author(s):  
Rameez Jabeer Khan ◽  
Rajat Kumar Jha ◽  
Gizachew Muluneh Amera ◽  
Jayaraman Muthukumaran ◽  
Rashmi Prabha Singh ◽  
...  

Introduction: Lactoperoxidase (LPO) is a member of mammalian heme peroxidase family and is an enzyme of innate immune system. It possesses a covalently linked heme prosthetic group (a derivative of protoporphyrin IX) in its active site. LPO catalyzes the oxidation of halides and pseudohalides in the presence of hydrogen peroxide (H2O2) and shows a broad range of antimicrobial activity. Methods: In this study, we have used two pharmaceutically important drug molecules, namely dapsone and propofol, which are earlier reported as potent inhibitors of LPO. Whereas the stereochemistry and mode of binding of dapsone and propofol to LPO is still not known because of the lack of the crystal structure of LPO with these two drugs. In order to fill this gap, we utilized molecular docking and molecular dynamics (MD) simulation studies of LPO in native and complex forms with dapsone and propofol. Results: From the docking results, the estimated binding free energy (ΔG) of -9.25 kcal/mol (Ki = 0.16 μM) and -7.05 kcal/mol (Ki = 6.79 μM) was observed for dapsone, and propofol, respectively. The standard error of Auto Dock program is 2.5 kcal/mol; therefore, molecular docking results alone were inconclusive. Conclusion: To further validate the docking results, we performed MD simulation on unbound, and two drugs bounded LPO structures. Interestingly, MD simulations results explained that the structural stability of LPO-Propofol complex was higher than LPO-Dapsone complex. The results obtained from this study establish the mode of binding and interaction pattern of the dapsone and propofol to LPO as inhibitors.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


Sign in / Sign up

Export Citation Format

Share Document