scholarly journals A graph-based algorithm for detecting rigid domains in protein structures

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Truong Khanh Linh Dang ◽  
Thach Nguyen ◽  
Michael Habeck ◽  
Mehmet Gültas ◽  
Stephan Waack

Abstract Background Conformational transitions are implicated in the biological function of many proteins. Structural changes in proteins can be described approximately as the relative movement of rigid domains against each other. Despite previous efforts, there is a need to develop new domain segmentation algorithms that are capable of analysing the entire structure database efficiently and do not require the choice of protein-dependent tuning parameters such as the number of rigid domains. Results We develop a graph-based method for detecting rigid domains in proteins. Structural information from multiple conformational states is represented by a graph whose nodes correspond to amino acids. Graph clustering algorithms allow us to reduce the graph and run the Viterbi algorithm on the associated line graph to obtain a segmentation of the input structures into rigid domains. In contrast to many alternative methods, our approach does not require knowledge about the number of rigid domains. Moreover, we identified default values for the algorithmic parameters that are suitable for a large number of conformational ensembles. We test our algorithm on examples from the DynDom database and illustrate our method on various challenging systems whose structural transitions have been studied extensively. Conclusions The results strongly suggest that our graph-based algorithm forms a novel framework to characterize structural transitions in proteins via detecting their rigid domains. The web server is available at http://azifi.tz.agrar.uni-goettingen.de/webservice/.

2021 ◽  
Author(s):  
Jiaying Lai ◽  
Jordan Yang ◽  
Ece D Uzun ◽  
Brenda Rubenstein ◽  
Indra Neil Sarkar

Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can aid in the diagnosis and understanding of the genetic architecture of complex diseases, such as cancer. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. Nevertheless, previous analyses have shown that methods that depend on only sequence or structural information may have limited accuracy. Recently, researchers have attempted to increase the accuracy of their predictions by incorporating protein dynamics into pathogenicity predictions. This study presents <Lai Yang Rubenstein Uzun Sarkar> (LYRUS), a machine learning method that uses an XGBoost classifier selected by TPOT to predict the pathogenicity of SAVs. LYRUS incorporates five sequence–based features, six structure–based features, and four dynamics–based features. Uniquely, LYRUS includes a newly–proposed sequence co–evolution feature called variation number. LYRUS's performance was evaluated using a dataset that contains 4,363 protein structures corresponding to 20,307 SAVs based on human genetic variant data from the ClinVar database. Based on our dataset, the LYRUS classifier has higher accuracy, specificity, F–measure, and Matthews correlation coefficient (MCC) than alternative methods including PolyPhen2, PROVEAN, SIFT, Rhapsody, EVMutation, MutationAssessor, SuSPect, FATHMM, and MVP. Variation numbers used within LYRUS differ greatly between pathogenic and neutral SAVs, and have a high feature weight in the XGBoost classifier employed by this method. Applications of the method to PTEN and TP53 further corroborate LYRUS's strong performance. LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS.


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


2021 ◽  
Vol 15 (4) ◽  
pp. 1-22
Author(s):  
Huan Wang ◽  
Chunming Qiao ◽  
Xuan Guo ◽  
Lei Fang ◽  
Ying Sha ◽  
...  

Recently, dynamic social network research has attracted a great amount of attention, especially in the area of anomaly analysis that analyzes the anomalous change in the evolution of dynamic social networks. However, most of the current research focused on anomaly analysis of the macro representation of dynamic social networks and failed to analyze the nodes that have anomalous structural changes at a micro level. To identify and evaluate anomalous structural change-based nodes in generalized dynamic social networks that only have limited structural information, this research considers undirected and unweighted graphs and develops a multiple-neighbor superposition similarity method ( ), which mainly consists of a multiple-neighbor range algorithm ( ) and a superposition similarity fluctuation algorithm ( ). introduces observation nodes, characterizes the structural similarities of nodes within multiple-neighbor ranges, and proposes a new multiple-neighbor similarity index on the basis of extensional similarity indices. Subsequently, maximally reflects the structural change of each node, using a new superposition similarity fluctuation index from the perspective of diverse multiple-neighbor similarities. As a result, based on and , not only identifies anomalous structural change-based nodes by detecting the anomalous structural changes of nodes but also evaluates their anomalous degrees by quantifying these changes. Results obtained by comparing with state-of-the-art methods via extensive experiments show that can accurately identify anomalous structural change-based nodes and evaluate their anomalous degrees well.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


1987 ◽  
Vol 19 (1-2) ◽  
pp. 7-49 ◽  
Author(s):  
S. J. Opella ◽  
P. L. Stewart ◽  
K. G. Valentine

The three-dimensional structures of proteins are among the most valuable contributions of biophysics to the understanding of biological systems (Dickerson & Geis, 1969; Creighton, 1983). Protein structures are utilized in the description and interpretation of a wide variety of biological phenomena, including genetic regulation, enzyme mechanisms, antibody recognition, cellular energetics, and macroscopic mechanical and structural properties of molecular assemblies. Virtually all of the information currently available about the structures of proteins at atomic resolution has been obtained from diffraction studies of single crystals of proteins (Wyckoff et al, 1985). However, recently developed NMR methods are capable of determining the structures of proteins and are now being applied to a variety of systems, including proteins in solution and other non-crystalline environments that are not amenable for X-ray diffraction studies. Solid-state NMR methods are useful for proteins that undergo limited overall reorientation by virtue of their being in the crystalline solid state or integral parts of supramolecular structures that do not reorient rapidly in solution. For reviews of applications of solid-state NMR spectroscopy to biological systems see Torchia and VanderHart (1979), Griffin (1981), Oldfield et al. (1982), Opella (1982), Torchia (1982), Gauesh (1984), Torchia (1984) and Opella (1986). This review describes how solid-state NMR can be used to obtain structural information about proteins. Methods applicable to samples with macroscopic orientation are emphasized.


2018 ◽  
Vol 16 (02) ◽  
pp. 1840005 ◽  
Author(s):  
Dmitry Suplatov ◽  
Yana Sharapova ◽  
Daria Timonina ◽  
Kirill Kopylov ◽  
Vytas Švedas

The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand’s binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.


2019 ◽  
Vol 20 (10) ◽  
pp. 2442 ◽  
Author(s):  
Teppei Ikeya ◽  
Peter Güntert ◽  
Yutaka Ito

To date, in-cell NMR has elucidated various aspects of protein behaviour by associating structures in physiological conditions. Meanwhile, current studies of this method mostly have deduced protein states in cells exclusively based on ‘indirect’ structural information from peak patterns and chemical shift changes but not ‘direct’ data explicitly including interatomic distances and angles. To fully understand the functions and physical properties of proteins inside cells, it is indispensable to obtain explicit structural data or determine three-dimensional (3D) structures of proteins in cells. Whilst the short lifetime of cells in a sample tube, low sample concentrations, and massive background signals make it difficult to observe NMR signals from proteins inside cells, several methodological advances help to overcome the problems. Paramagnetic effects have an outstanding potential for in-cell structural analysis. The combination of a limited amount of experimental in-cell data with software for ab initio protein structure prediction opens an avenue to visualise 3D protein structures inside cells. Conventional nuclear Overhauser effect spectroscopy (NOESY)-based structure determination is advantageous to elucidate the conformations of side-chain atoms of proteins as well as global structures. In this article, we review current progress for the structure analysis of proteins in living systems and discuss the feasibility of its future works.


2019 ◽  
Vol 35 (22) ◽  
pp. 4854-4856 ◽  
Author(s):  
James D Stephenson ◽  
Roman A Laskowski ◽  
Andrew Nightingale ◽  
Matthew E Hurles ◽  
Janet M Thornton

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 117 (11) ◽  
pp. 5977-5986 ◽  
Author(s):  
Greg Slodkowicz ◽  
Nick Goldman

Understanding the molecular basis of adaptation to the environment is a central question in evolutionary biology, yet linking detected signatures of positive selection to molecular mechanisms remains challenging. Here we demonstrate that combining sequence-based phylogenetic methods with structural information assists in making such mechanistic interpretations on a genomic scale. Our integrative analysis shows that positively selected sites tend to colocalize on protein structures and that positively selected clusters are found in functionally important regions of proteins, indicating that positive selection can contravene the well-known principle of evolutionary conservation of functionally important regions. This unexpected finding, along with our discovery that positive selection acts on structural clusters, opens previously unexplored strategies for the development of better models of protein evolution. Remarkably, proteins where we detect the strongest evidence of clustering belong to just two functional groups: Components of immune response and metabolic enzymes. This gives a coherent picture of pathogens and xenobiotics as important drivers of adaptive evolution of mammals.


Sign in / Sign up

Export Citation Format

Share Document