scholarly journals Geometricus Represents Protein Structures as Shape-mers Derived from Moment Invariants

2020 ◽  
Author(s):  
Janani Durairaj ◽  
Mehmet Akdel ◽  
Dick de Ridder ◽  
Aalt DJ van Dijk

AbstractMotivationAs the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds, and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment-based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.ResultsWe present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering, and structure classification across proteins from different superfamilies as well as within the same family.AvailabilityPython code available at https://git.wur.nl/durai001/[email protected], [email protected]

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i718-i725
Author(s):  
Janani Durairaj ◽  
Mehmet Akdel ◽  
Dick de Ridder ◽  
Aalt D J van Dijk

Abstract Motivation As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. Results We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. Availability and implementation Python code available at https://git.wur.nl/durai001/geometricus.


2020 ◽  
Author(s):  
Javier Caceres-Delpiano ◽  
Roberto Ibañez ◽  
Patricio Alegre ◽  
Cynthia Sanhueza ◽  
Romualdo Paz-Fiblas ◽  
...  

AbstractProtein sequences are highly dimensional and present one of the main problems for the optimization and study of sequence-structure relations. The intrinsic degeneration of protein sequences is hard to follow, but the continued discovery of new protein structures has shown that there is convergence in terms of the possible folds that proteins can adopt, such that proteins with sequence identities lower than 30% may still fold into similar structures. Given that proteins share a set of conserved structural motifs, machine-learning algorithms can play an essential role in the study of sequence-structure relations. Deep-learning neural networks are becoming an important tool in the development of new techniques, such as protein modeling and design, and they continue to gain power as new algorithms are developed and as increasing amounts of data are released every day. Here, we trained a deep-learning model based on previous recurrent neural networks to design analog protein structures using representations learning based on the evolutionary and structural information of proteins. We test the capabilities of this model by creating de novo variants of an antifungal peptide, with sequence identities of 50% or lower relative to the wild-type (WT) peptide. We show by in silico approximations, such as molecular dynamics, that the new variants and the WT peptide can successfully bind to a chitin surface with comparable relative binding energies. These results are supported by in vitro assays, where the de novo designed peptides showed antifungal activity that equaled or exceeded the WT peptide.


Author(s):  
Zhixian Liu ◽  
Qingfeng Chen ◽  
Wei Lan ◽  
Jiahai Liang ◽  
Yiping Pheobe Chen ◽  
...  

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Truong Khanh Linh Dang ◽  
Thach Nguyen ◽  
Michael Habeck ◽  
Mehmet Gültas ◽  
Stephan Waack

Abstract Background Conformational transitions are implicated in the biological function of many proteins. Structural changes in proteins can be described approximately as the relative movement of rigid domains against each other. Despite previous efforts, there is a need to develop new domain segmentation algorithms that are capable of analysing the entire structure database efficiently and do not require the choice of protein-dependent tuning parameters such as the number of rigid domains. Results We develop a graph-based method for detecting rigid domains in proteins. Structural information from multiple conformational states is represented by a graph whose nodes correspond to amino acids. Graph clustering algorithms allow us to reduce the graph and run the Viterbi algorithm on the associated line graph to obtain a segmentation of the input structures into rigid domains. In contrast to many alternative methods, our approach does not require knowledge about the number of rigid domains. Moreover, we identified default values for the algorithmic parameters that are suitable for a large number of conformational ensembles. We test our algorithm on examples from the DynDom database and illustrate our method on various challenging systems whose structural transitions have been studied extensively. Conclusions The results strongly suggest that our graph-based algorithm forms a novel framework to characterize structural transitions in proteins via detecting their rigid domains. The web server is available at http://azifi.tz.agrar.uni-goettingen.de/webservice/.


Foods ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 608
Author(s):  
Inma Arenas ◽  
Miguel Ribeiro ◽  
Luís Filipe-Ribeiro ◽  
Rafael Vilamarim ◽  
Elisa Costa ◽  
...  

In this work, the effect of pre-fermentative skin maceration (PFSM) on the chemical composition of the macromolecular fraction, polysaccharides and proteins, phenolic compounds, chromatic characteristics, and protein stability of Albariño monovarietal white wines was studied. PFSM increased the extraction of phenolic compounds and polysaccharides and reduced the extraction of pathogenesis-related proteins (PRPs). PFSM wine showed significantly higher protein instability. Sodium and calcium bentonites were used for protein stabilisation of wines obtained with PFSM (+PFSM) and without PFSM (−PFSM), and their efficiencies compared to fungal chitosan (FCH) and k-carrageenan. k-Carrageenan reduced the content of PRPs and the protein instability in both wines, and it was more efficient than sodium and calcium bentonites. FCH was unable to heat stabilise both wines, and PRPs levels remained unaltered. On the other hand, FCH decreased the levels of wine polysaccharides by 60%. Sodium and calcium bentonite also decreased the levels of wine polysaccharides although to a lower extent (16% to 59%). k-Carrageenan did not affect the wine polysaccharide levels. Overall, k-carrageenan is suitable for white wine protein stabilisation, having a more desirable impact on the wine macromolecular fraction than the other fining agents, reducing the levels of the wine PRPs without impacting polysaccharide composition.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Garima Sharma ◽  
Ashish Ranjan Sharma ◽  
Eun-Min Seo ◽  
Ju-Suk Nam

The Wnt signaling pathway is mediated by a family of secreted glycoproteins through canonical and noncanonical mechanism. The signaling pathways are regulated by various modulators, which are classified into two classes on the basis of their interaction with either Wnt or its receptors. Secreted frizzled-related proteins (sFRPs) are the member of class that binds to Wnt protein and antagonizes Wnt signaling pathway. The other class consists of Dickkopf (DKK) proteins family that binds to Wnt receptor complex. The present review discusses the disease related association of various polymorphisms in Wnt signaling modulators. Furthermore, this review also highlights that some of the sFRPs and DKKs are unable to act as an antagonist for Wnt signaling pathway and thus their function needs to be explored more extensively.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Ahmet Mert ◽  
Hasan Huseyin Celik

Abstract The feasibility of using time–frequency (TF) ridges estimation is investigated on multi-channel electroencephalogram (EEG) signals for emotional recognition. Without decreasing accuracy rate of the valence/arousal recognition, the informative component extraction with low computational cost will be examined using multivariate ridge estimation. The advanced TF representation technique called multivariate synchrosqueezing transform (MSST) is used to obtain well-localized components of multi-channel EEG signals. Maximum-energy components in the 2D TF distribution are determined using TF-ridges estimation to extract instantaneous frequency and instantaneous amplitude, respectively. The statistical values of the estimated ridges are used as a feature vector to the inputs of machine learning algorithms. Thus, component information in multi-channel EEG signals can be captured and compressed into low dimensional space for emotion recognition. Mean and variance values of the five maximum-energy ridges in the MSST based TF distribution are adopted as feature vector. Properties of five TF-ridges in frequency and energy plane (e.g., mean frequency, frequency deviation, mean energy, and energy deviation over time) are computed to obtain 20-dimensional feature space. The proposed method is performed on the DEAP emotional EEG recordings for benchmarking, and the recognition rates are yielded up to 71.55, and 70.02% for high/low arousal, and high/low valence, respectively.


2018 ◽  
Vol 618 ◽  
pp. A59 ◽  
Author(s):  
A. Castro-Ginard ◽  
C. Jordi ◽  
X. Luri ◽  
F. Julbe ◽  
M. Morvan ◽  
...  

Context. The publication of the Gaia Data Release 2 (Gaia DR2) opens a new era in astronomy. It includes precise astrometric data (positions, proper motions, and parallaxes) for more than 1.3 billion sources, mostly stars. To analyse such a vast amount of new data, the use of data-mining techniques and machine-learning algorithms is mandatory. Aims. A great example of the application of such techniques and algorithms is the search for open clusters (OCs), groups of stars that were born and move together, located in the disc. Our aim is to develop a method to automatically explore the data space, requiring minimal manual intervention. Methods. We explore the performance of a density-based clustering algorithm, DBSCAN, to find clusters in the data together with a supervised learning method such as an artificial neural network (ANN) to automatically distinguish between real OCs and statistical clusters. Results. The development and implementation of this method in a five-dimensional space (l, b, ϖ, μα*, μδ) with the Tycho-Gaia Astrometric Solution (TGAS) data, and a posterior validation using Gaia DR2 data, lead to the proposal of a set of new nearby OCs. Conclusions. We have developed a method to find OCs in astrometric data, designed to be applied to the full Gaia DR2 archive.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


Sign in / Sign up

Export Citation Format

Share Document