Zoomqa: Residue-Level Single-Model QA Support Vector Machine Utilizing Sequential and 3D Structural Features

ABSTRACTMotivationThe Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. When predictions are made for proteins of which we do not know the native structure, we run into an issue to tell how good a tertiary structure prediction is, especially the protein binding regions, which are useful for drug discovery. Currently, most methods only evaluate the overall quality of a protein decoy, and few can work on residue level and protein complex. Here we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure / complex prediction at residue level. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius r of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grades their placement within the protein as a whole. Moreover, ZoomQA can evaluate the quality of protein complex, which is unique.ResultsWe benchmark ZoomQA on CASP14, it outperforms other state of the art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features, and shows our method is able to match the performance of other state-of-the-art methods without the use of homology searching against database or PSSM matrix.Availabilityhttp://[email protected]

Download Full-text

Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine

International Journal on Data Science ◽

10.18517/ijods.1.1.14-17.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 14-17

Author(s):

Nur Aini Zakaria ◽

Zuraini Ali Shah ◽

Shahreen Kasim

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Secondary Structure Prediction ◽

Principal Component ◽

Training Dataset ◽

Support Vector ◽

Testing Dataset ◽

Prediction Function ◽

Rbf Kernel

Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure

Download Full-text

State-of-the-art web services for de novo protein structure prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbaa139 ◽

2020 ◽

Cited By ~ 1

Author(s):

Luciano A Abriata ◽

Matteo Dal Peraro

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

De Novo ◽

State Of The Art ◽

Data Bank ◽

End Users ◽

Model Quality ◽

Uncharacterized Protein

Abstract Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.

Download Full-text

Evaluating the absolute quality of a single protein model using structural features and support vector machines

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.22275 ◽

2009 ◽

Vol 75 (3) ◽

pp. 638-647 ◽

Cited By ~ 64

Author(s):

Zheng Wang ◽

Allison N. Tegge ◽

Jianlin Cheng

Keyword(s):

Support Vector Machines ◽

Structural Features ◽

Support Vector ◽

Protein Model ◽

Vector Machines ◽

Single Protein ◽

The Absolute

Download Full-text

P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features

Bioengineering ◽

10.3390/bioengineering8030040 ◽

2021 ◽

Vol 8 (3) ◽

pp. 40

Author(s):

Yuma Takei ◽

Takashi Ishida

Keyword(s):

Quality Assessment ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Sequence Profile ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Assessment Performance

Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.

Download Full-text

A General Approach to Multimodal Document Quality Assessment

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11647 ◽

2020 ◽

Vol 68 ◽

pp. 607-632

Author(s):

Aili Shen ◽

Bahar Salehi ◽

Jianzhong Qi ◽

Timothy Baldwin

Keyword(s):

Quality Assessment ◽

State Of The Art ◽

Feature Learning ◽

Structural Features ◽

Joint Model ◽

Visual Features ◽

General Applicability ◽

Textual Features ◽

Text Content

The perceived quality of a document is affected by various factors, including grammat- icality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual rendering of the document for document qual- ity assessment. Our joint model achieves state-of-the-art results over five datasets in two domains (Wikipedia and academic papers), which demonstrates the complementarity of textual and visual features, and the general applicability of our model. To examine what kinds of features our model has learned, we further train our model in a multi-task learning setting, where document quality assessment is the primary task and feature learning is an auxiliary task. Experimental results show that visual embeddings are better at learning structural features while textual embeddings are better at learning readability scores, which further verifies the complementarity of visual and textual features.

Download Full-text

A single-model quality assessment method for poor quality protein structure

10.21203/rs.3.rs-17080/v1 ◽

2020 ◽

Author(s):

Jianquan Ouyang ◽

Ningqiao Huang ◽

Yunqi Jiang

Keyword(s):

Protein Structure ◽

Quality Assessment ◽

Structure Prediction ◽

Assessment Method ◽

Poor Quality ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Quality Assessment Method

Abstract Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model’s utility and potential applications. Estimating the quality of a single model predicts the model’s quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool. In this work, we introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models.

Download Full-text

Clustering Support Vector Machines and Its Application to Local Protein Tertiary Structure Prediction

Computational Science – ICCS 2006 - Lecture Notes in Computer Science ◽

10.1007/11758525_96 ◽

2006 ◽

pp. 710-717 ◽

Cited By ~ 2

Author(s):

Jieyue He ◽

Wei Zhong ◽

Robert Harrison ◽

Phang C. Tai ◽

Yi Pan

Keyword(s):

Support Vector Machines ◽

Structure Prediction ◽

Tertiary Structure ◽

Support Vector ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Vector Machines ◽

Protein Tertiary Structure Prediction ◽

Local Protein

Download Full-text

Multi-View Consistency for Relation Extraction via Mutual Information and Structure Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6445 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9106-9113

Author(s):

Amir Veyseh ◽

Franck Dernoncourt ◽

My Thai ◽

Dejing Dou ◽

Thien Nguyen

Keyword(s):

Structure Prediction ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Relations ◽

General Strategy ◽

Novel Method ◽

Dependency Trees ◽

The Common ◽

The One

Relation Extraction (RE) is one of the fundamental tasks in Information Extraction. The goal of this task is to find the semantic relations between entity mentions in text. It has been shown in many previous work that the structure of the sentences (i.e., dependency trees) can provide important information/features for the RE models. However, the common limitation of the previous work on RE is the reliance on some external parsers to obtain the syntactic trees for the sentence structures. On the one hand, it is not guaranteed that the independent external parsers can offer the optimal sentence structures for RE and the customized structures for RE might help to further improve the performance. On the other hand, the quality of the external parsers might suffer when applied to different domains, thus also affecting the performance of the RE models on such domains. In order to overcome this issue, we introduce a novel method for RE that simultaneously induces the structures and predicts the relations for the input sentences, thus avoiding the external parsers and potentially leading to better sentence structures for RE. Our general strategy to learn the RE-specific structures is to apply two different methods to infer the structures for the input sentences (i.e., two views). We then introduce several mechanisms to encourage the structure and semantic consistencies between these two views so the effective structure and semantic representations for RE can emerge. We perform extensive experiments on the ACE 2005 and SemEval 2010 datasets to demonstrate the advantages of the proposed method, leading to the state-of-the-art performance on such datasets.

Download Full-text

Smooth orientation-dependent scoring function for coarse-grained protein quality assessment

Bioinformatics ◽

10.1093/bioinformatics/bty1037 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2801-2808 ◽

Cited By ~ 22

Author(s):

Mikhail Karasikov ◽

Guillaume Pagès ◽

Sergei Grudinin

Keyword(s):

Quality Assessment ◽

Structure Prediction ◽

High Performance ◽

Protein Quality ◽

Scoring Function ◽

Structural Features ◽

Coarse Grained ◽

Supplementary Information ◽

Single Model ◽

Protein Models

Abstract Motivation Protein quality assessment (QA) is a crucial element of protein structure prediction, a fundamental and yet open problem in structural bioinformatics. QA aims at ranking predicted protein models to select the best candidates. The assessment can be performed based either on a single model or on a consensus derived from an ensemble of models. The latter strategy can yield very high performance but substantially depends on the pool of available candidate models, which limits its applicability. Hence, single-model QA methods remain an important research target, also because they can assist the sampling of candidate models. Results We present a novel single-model QA method called SBROD. The SBROD (Smooth Backbone-Reliant Orientation-Dependent) method uses only the backbone protein conformation, and hence it can be applied to scoring coarse-grained protein models. The proposed method deduces its scoring function from a training set of protein models. The SBROD scoring function is composed of four terms related to different structural features: residue–residue orientations, contacts between backbone atoms, hydrogen bonding and solvent–solute interactions. It is smooth with respect to atomic coordinates and thus is potentially applicable to continuous gradient-based optimization of protein conformations. Furthermore, it can also be used for coarse-grained protein modeling and computational protein design. SBROD proved to achieve similar performance to state-of-the-art single-model QA methods on diverse datasets (CASP11, CASP12 and MOULDER). Availability and implementation The standalone application implemented in C++ and Python is freely available at https://gitlab.inria.fr/grudinin/sbrod and supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An Empirical Study of Different Approaches for Protein Classification

The Scientific World JOURNAL ◽

10.1155/2014/236717 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17 ◽

Cited By ~ 33

Author(s):

Loris Nanni ◽

Alessandra Lumini ◽

Sheryl Brahnam

Keyword(s):

Tertiary Structure ◽

State Of The Art ◽

Support Vector ◽

Protein Classification ◽

Matrix Representations ◽

New Methods ◽

Multiple Datasets ◽

Different Types ◽

Scoring Matrix ◽

Better Than

Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art.

Download Full-text