QAcon: single model quality assessment using protein structural and contact information with machine learning techniques

Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.

Download Full-text

Protein model quality assessment using 3D oriented convolutional neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz122 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3313-3319 ◽

Cited By ~ 14

Author(s):

Guillaume Pagès ◽

Benoit Charmettant ◽

Sergei Grudinin

Keyword(s):

Neural Networks ◽

Quality Assessment ◽

Convolutional Neural Networks ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Density Maps ◽

Protein Model ◽

Protein Model Quality Assessment ◽

3D Cnn

Abstract Motivation Protein model quality assessment (QA) is a crucial and yet open problem in structural bioinformatics. The current best methods for single-model QA typically combine results from different approaches, each based on different input features constructed by experts in the field. Then, the prediction model is trained using a machine-learning algorithm. Recently, with the development of convolutional neural networks (CNN), the training paradigm has changed. In computer vision, the expert-developed features have been significantly overpassed by automatically trained convolutional filters. This motivated us to apply a three-dimensional (3D) CNN to the problem of protein model QA. Results We developed Ornate (Oriented Routed Neural network with Automatic Typing)—a novel method for single-model QA. Ornate is a residue-wise scoring function that takes as input 3D density maps. It predicts the local (residue-wise) and the global model quality through a deep 3D CNN. Specifically, Ornate aligns the input density map, corresponding to each residue and its neighborhood, with the backbone topology of this residue. This circumvents the problem of ambiguous orientations of the initial models. Also, Ornate includes automatic identification of atom types and dynamic routing of the data in the network. Established benchmarks (CASP 11 and CASP 12) demonstrate the state-of-the-art performance of our approach among single-model QA methods. Availability and implementation The method is available at https://team.inria.fr/nano-d/software/Ornate/. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the Ornate model to these maps. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SVMQA: support–vector-machine-based protein single-model quality assessment

Bioinformatics ◽

10.1093/bioinformatics/btx222 ◽

2017 ◽

Vol 33 (16) ◽

pp. 2496-2503 ◽

Cited By ~ 102

Author(s):

Balachandran Manavalan ◽

Jooyoung Lee

Keyword(s):

Support Vector Machine ◽

Quality Assessment ◽

Support Vector ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment

Download Full-text

P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features

Bioengineering ◽

10.3390/bioengineering8030040 ◽

2021 ◽

Vol 8 (3) ◽

pp. 40

Author(s):

Yuma Takei ◽

Takashi Ishida

Keyword(s):

Quality Assessment ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Sequence Profile ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Assessment Performance

Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.

Download Full-text

The Power of Noise and the Art of Prediction

10.31235/osf.io/zu64w ◽

2017 ◽

Cited By ~ 1

Author(s):

ZhiMin Xiao ◽

Steve Higgins

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Multiple Models ◽

Generation Process ◽

Data Generation ◽

Single Model ◽

Intervention Effect ◽

Specific Data ◽

Learning Techniques

Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventional analyses often assume a specific data generation process, which suggests a theoretical model that best fits the data. Machine learning techniques do not make such an assumption. In fact, they encourage multiple models to compete on the same data. Applying logistic regression and machine learning algorithms to real and simulated datasets with different features of noise and signal, we demonstrate that no single model dominates others under all circumstances. By showing when different models shine or struggle, we argue it is both possible and important to conduct comparative analyses.

Download Full-text

A single-model quality assessment method for poor quality protein structure

10.21203/rs.3.rs-17080/v1 ◽

2020 ◽

Author(s):

Jianquan Ouyang ◽

Ningqiao Huang ◽

Yunqi Jiang

Keyword(s):

Protein Structure ◽

Quality Assessment ◽

Structure Prediction ◽

Assessment Method ◽

Poor Quality ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Quality Assessment Method

Abstract Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model’s utility and potential applications. Estimating the quality of a single model predicts the model’s quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool. In this work, we introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models.

Download Full-text

Protein model quality assessment using 3D oriented convolutional neural networks

10.1101/432146 ◽

2018 ◽

Cited By ~ 1

Author(s):

Guillaume Pagès ◽

Benoit Charmettant ◽

Sergei Grudinin

Keyword(s):

Neural Networks ◽

Quality Assessment ◽

Convolutional Neural Networks ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Density Maps ◽

Protein Model ◽

Protein Model Quality Assessment ◽

3D Cnn

Protein model quality assessment (QA) is a crucial and yet open problem in structural bioinformatics. The current best methods for single-model QA typically combine results from different approaches, each based on different input features constructed by experts in the field. Then, the prediction model is trained using a machine-learning algorithm. Recently, with the development of convolutional neural networks (CNN), the training paradigm has changed. In computer vision, the expert-developed features have been significantly overpassed by automatically trained convolutional filters. This motivated us to apply a three-dimensional (3D) CNN to the problem of protein model QA.We developed a novel method for single-model QA called Ornate. Ornate (Oriented Routed Neural network with Automatic Typing) is a residue-wise scoring function that takes as input 3D density maps. It predicts the local (residue-wise) and the global model quality through a deep 3D CNN. Specifically, Ornate aligns the input density map, corresponding to each residue and its neighborhood, with the backbone topology of this residue. This circumvents the problem of ambiguous orientations of the initial models. Also, Ornate includes automatic identification of atom types and dynamic routing of the data in the network. Established benchmarks (CASP 11 and CASP 12) demonstrate the state-of-the-art performance of our approach among singlemodel QA methods.The method is available at https://team.inria.fr/nanod/software/Ornate/. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the Ornate model to these maps.

Download Full-text