scholarly journals Deep clustering of protein folding simulations

2018 ◽  
Author(s):  
Debsindhu Bhowmik ◽  
Shang Gao ◽  
Michael T Young ◽  
Arvind Ramanathan

AbstractWe examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes. In this paper, we use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely the Fs-peptide (14μs aggregate sampling), villin head piece (single trajectory of 125μs) and the mixedβ-β-α(BBA) protein (223 + 102μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features. Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.

2021 ◽  
Vol 22 (3) ◽  
pp. 1496
Author(s):  
Domenico Loreto ◽  
Giarita Ferraro ◽  
Antonello Merlino

The structures of the adducts formed upon reaction of the cytotoxic paddlewheel dirhodium complex [Rh2(μ-O2CCH3)4] with the model protein hen egg white lysozyme (HEWL) under different experimental conditions are reported. Results indicate that [Rh2(μ-O2CCH3)4] extensively reacts with HEWL:it in part breaks down, at variance with what happens in reactions with other proteins. A Rh center coordinates the side chains of Arg14 and His15. Dimeric Rh–Rh units with Rh–Rh distances between 2.3 and 2.5 Å are bound to the side chains of Asp18, Asp101, Asn93, and Lys96, while a dirhodium unit with a Rh–Rh distance of 3.2–3.4 Å binds the C-terminal carboxylate and the side chain of Lys13 at the interface between two symmetry-related molecules. An additional monometallic fragment binds the side chain of Lys33. These data, which are supported by replicated structural determinations, shed light on the reactivity of dirhodium tetracarboxylates with proteins, providing useful information for the design of new Rh-containing biomaterials with an array of potential applications in the field of catalysis or of medicinal chemistry and valuable insight into the mechanism of action of these potential anticancer agents.


2021 ◽  
Vol 15 ◽  
pp. 174830262110249
Author(s):  
Cong-Zhe You ◽  
Zhen-Qiu Shu ◽  
Hong-Hui Fan

Recently, in the area of artificial intelligence and machine learning, subspace clustering of multi-view data is a research hotspot. The goal is to divide data samples from different sources into different groups. We proposed a new subspace clustering method for multi-view data which termed as Non-negative Sparse Laplacian regularized Latent Multi-view Subspace Clustering (NSL2MSC) in this paper. The method proposed in this paper learns the latent space representation of multi view data samples, and performs the data reconstruction on the latent space. The algorithm can cluster data in the latent representation space and use the relationship of different views. However, the traditional representation-based method does not consider the non-linear geometry inside the data, and may lose the local and similar information between the data in the learning process. By using the graph regularization method, we can not only capture the global low dimensional structural features of data, but also fully capture the nonlinear geometric structure information of data. The experimental results show that the proposed method is effective and its performance is better than most of the existing alternatives.


2021 ◽  
Author(s):  
Anderson F. Santos ◽  
Amanda A. O. do Carmo ◽  
Vanessa C. Harthman ◽  
Mariza B. Romagnolo ◽  
Luiz A. Souza

The Rubiaceae tribe Psychotrieae sensu lato and its two largest genera, Psychotria L. and Palicourea Aubl., have been considered taxonomically controversial for a long time. We aimed to identify structural features of the ontogeny of the fruits and seeds with taxonomic potential for the tribe by using species of these two genera, and Rudgea jasminoides (Cham.) Müll.Arg. The samples were obtained from a herbarium and from Brazilian state parks, and sectioned by using a rotation microtome. The fruits were found to be derived from an inferior ovary, and were characterised by a fleshy mesocarp and sclerenchymatic sinuate pyrene. The seeds were pachychalazal and arillate. The fruit was classified as a pomaceous drupoid nuculanium. The investigation showed the utility of some fruit features to discriminate species. Our study also showed that ontogenetic features of fruits and seeds are very homogeneous in Palicourea and Psychotria, which supports the inclusion of both genera in the tribe Psychotrieae.


2018 ◽  
Vol 19 (S18) ◽  
Author(s):  
Debsindhu Bhowmik ◽  
Shang Gao ◽  
Michael T. Young ◽  
Arvind Ramanathan

2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


2020 ◽  
Vol 34 (04) ◽  
pp. 3357-3364
Author(s):  
Abdulkadir Celikkanat ◽  
Fragkiskos D. Malliaros

Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.


2019 ◽  
Vol 21 (22) ◽  
pp. 11924-11936 ◽  
Author(s):  
Qiang Shao ◽  
Weiliang Zhu

The folding simulations of three ββα-motifs and β-barrel structured proteins (NTL9, NuG2b, and CspA) were performed to determine the important roles of native and nonnative contacts in protein folding.


Sign in / Sign up

Export Citation Format

Share Document