Protein complex prediction with AlphaFold-Multimer

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

Protein complex similarity based on Weisfeiler-Lehman labeling

10.7287/peerj.preprints.26612 ◽

2018 ◽

Author(s):

Bianca K Stöcker ◽

Till Schäfer ◽

Petra Mutzel ◽

Johannes Köster ◽

Nils Kriege ◽

...

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Similarity Measures ◽

Interaction Networks ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Protein Protein Interaction Networks ◽

Human Integrin ◽

Good Agreement

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

A Grammar-Based Structural CNN Decoder for Code Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017055 ◽

2019 ◽

Vol 33 ◽

pp. 7055-7062 ◽

Cited By ~ 3

Author(s):

Zeyu Sun ◽

Qihao Zhu ◽

Lili Mou ◽

Yingfei Xiong ◽

Ge Li ◽

...

Keyword(s):

Neural Network ◽

Programming Language ◽

Code Generation ◽

State Of The Art ◽

Semantic Parsing ◽

Code Generator ◽

Percentage Points ◽

Grammar Rules ◽

Previous State ◽

Program Description

Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more tokens than a natural language sentence, and thus it may be inappropriate for RNN to capture such a long sequence. In this paper, we propose a grammar-based structural convolutional neural network (CNN) for code generation. Our model generates a program by predicting the grammar rules of the programming language; we design several CNN modules, including the tree-based convolution and pre-order convolution, whose information is further aggregated by dedicated attentive pooling layers. Experimental results on the HearthStone benchmark dataset show that our CNN code generator significantly outperforms the previous state-of-the-art method by 5 percentage points; additional experiments on several semantic parsing tasks demonstrate the robustness of our model. We also conduct in-depth ablation test to better understand each component of our model.

Download Full-text

Identifying protein complexes based on an edge weight algorithm and core-attachment structure

BMC Bioinformatics ◽

10.1186/s12859-019-3007-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Rongquan Wang ◽

Guixia Liu ◽

Caixia Wang

Keyword(s):

Protein Complex ◽

State Of The Art ◽

Protein Complexes ◽

Statistical Significance ◽

Academic Research ◽

Edge Weight ◽

Weighting Method ◽

Ppi Networks ◽

Art Methods ◽

Attachment Proteins

Abstract Background Protein complex identification from protein-protein interaction (PPI) networks is crucial for understanding cellular organization principles and functional mechanisms. In recent decades, numerous computational methods have been proposed to identify protein complexes. However, most of the current state-of-the-art studies still have some challenges to resolve, including their high false-positives rates, incapability of identifying overlapping complexes, lack of consideration for the inherent organization within protein complexes, and absence of some biological attachment proteins. Results In this paper, to overcome these limitations, we present a protein complex identification method based on an edge weight method and core-attachment structure (EWCA) which consists of a complex core and some sparse attachment proteins. First, we propose a new weighting method to assess the reliability of interactions. Second, we identify protein complex cores by using the structural similarity between a seed and its direct neighbors. Third, we introduce a new method to detect attachment proteins that is able to distinguish and identify peripheral proteins and overlapping proteins. Finally, we bind attachment proteins to their corresponding complex cores to form protein complexes and discard redundant protein complexes. The experimental results indicate that EWCA outperforms existing state-of-the-art methods in terms of both accuracy and p-value. Furthermore, EWCA could identify many more protein complexes with statistical significance. Additionally, EWCA could have better balance accuracy and efficiency than some state-of-the-art methods with high accuracy. Conclusions In summary, EWCA has better performance for protein complex identification by a comprehensive comparison with twelve algorithms in terms of different evaluation metrics. The datasets and software are freely available for academic research at https://github.com/RongquanWang/EWCA.

Download Full-text

USING INDIRECT PROTEIN–PROTEIN INTERACTIONS FOR PROTEIN COMPLEX PREDICTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720008003497 ◽

2008 ◽

Vol 06 (03) ◽

pp. 435-466 ◽

Cited By ~ 97

Author(s):

HON NIAN CHUA ◽

KANG NING ◽

WING-KIN SUNG ◽

HON WAI LEONG ◽

LIMSOON WONG

Keyword(s):

Protein Interactions ◽

Protein Complex ◽

Protein Complexes ◽

Clustering Algorithms ◽

Indirect Interactions ◽

Protein Protein Interactions ◽

Protein Complex Prediction ◽

Ppi Networks ◽

Level 2 ◽

Novel Protein

Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein–protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein–protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.

Download Full-text

Un-complicating protein complex prediction.

10.1101/017376 ◽

2015 ◽

Author(s):

Konstantinos Koutroumpas ◽

François Képès

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Parametric Method ◽

Training Data ◽

Parametric Methods ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Parameter Values ◽

Non Parametric

Identification of protein complexes from proteomic experiments is crucial to understand not only their function but also the principles of cellular organization. Advances in experimental techniques have enabled the construction of large-scale protein-protein interaction networks, and computational methods have been developed to analyze high-throughput data. In most cases several parameters are introduced that have to be trained before application. But how do we select the parameter values when there are no training data available? How many data do we need to properly train a method. How is the performance of a method affected when we incorrectly select the parameter values? The above questions, although important to determine the applicability of a method, are most of the time overlooked. We highlight the importance of such an analysis by investigating how limited knowledge, in the form of incomplete training data, affects the performance of parametric protein-complex prediction algorithms. Furthermore, we develop a simple non-parametric method that does not rely on the existence of training data and we compare it with the parametric alternatives. Using datasets from yeast and fly we demonstrate that parametric methods trained with limited data provide sub-optimal predictions, while our non-parametric method performs better or is on par with the parametric alternatives. Overall, our analysis questions, at least for the specific problem, whether parametric methods provide significantly better results than non-parametric ones to justify the additional effort for applying them.

Download Full-text

TreeGen: A Tree-Based Transformer Architecture for Code Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6430 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8984-8991

Author(s):

Zeyu Sun ◽

Qihao Zhu ◽

Yingfei Xiong ◽

Yican Sun ◽

Lili Mou ◽

...

Keyword(s):

Code Generation ◽

State Of The Art ◽

Structural Information ◽

Semantic Parsing ◽

Generation System ◽

Neural Architecture ◽

Percentage Points ◽

Code Generators ◽

Grammar Rules ◽

Previous State

A code generation system generates programming language code based on an input natural language description. State-of-the-art approaches rely on neural networks for code generation. However, these code generators suffer from two problems. One is the long dependency problem, where a code element often depends on another far-away code element. A variable reference, for example, depends on its definition, which may appear quite a few lines before. The other problem is structure modeling, as programs contain rich structural information. In this paper, we propose a novel tree-based neural architecture, TreeGen, for code generation. TreeGen uses the attention mechanism of Transformers to alleviate the long-dependency problem, and introduces a novel AST reader (encoder) to incorporate grammar rules and AST structures into the network. We evaluated TreeGen on a Python benchmark, HearthStone, and two semantic parsing benchmarks, ATIS and GEO. TreeGen outperformed the previous state-of-the-art approach by 4.5 percentage points on HearthStone, and achieved the best accuracy among neural network-based approaches on ATIS (89.1%) and GEO (89.6%). We also conducted an ablation test to better understand each component of our model.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

The Degree of Oxidation of Graphene Oxide

Nanomaterials ◽

10.3390/nano11030560 ◽

2021 ◽

Vol 11 (3) ◽

pp. 560

Author(s):

Alexandra Carvalho ◽

Mariana C. F. Costa ◽

Valeria S. Marangoni ◽

Pei Rou Ng ◽

Thi Le Hang Nguyen ◽

...

Keyword(s):

Graphene Oxide ◽

Ab Initio ◽

State Of The Art ◽

High Accuracy ◽

Precise Determination ◽

Photoemission Spectroscopy ◽

Pristine Graphene ◽

X Ray ◽

Degree Of Oxidation

We show that the degree of oxidation of graphene oxide (GO) can be obtained by using a combination of state-of-the-art ab initio computational modeling and X-ray photoemission spectroscopy (XPS). We show that the shift of the XPS C1s peak relative to pristine graphene, ΔEC1s, can be described with high accuracy by ΔEC1s=A(cO−cl)2+E0, where c0 is the oxygen concentration, A=52.3 eV, cl=0.122, and E0=1.22 eV. Our results demonstrate a precise determination of the oxygen content of GO samples.

Download Full-text

Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02376-3 ◽

2021 ◽

Author(s):

Jorge F. Lazo ◽

Aldo Marzullo ◽

Sara Moccia ◽

Michele Catellani ◽

Benoit Rosa ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Automatic Segmentation ◽

Temporal Information ◽

Invasive Technique ◽

Dice Similarity Coefficient ◽

Specular Reflections ◽

Lumen Segmentation ◽

Previous State

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.

Download Full-text