FTIP: an accurate and efficient method for global protein surface comparison

Abstract Motivation Global protein surface comparison (GPSC) studies have been limited compared to other research works on protein structure alignment/comparison due to lack of real applications associated with GPSC. However, the technology advances in cryo-electron tomography (CET) have made methods to identify proteins from their surface shapes extremely useful. Results In this study, we developed a new method called Farthest point sampling (FPS)-enhanced Triangulation-based Iterative-closest-Point (ICP) (FTIP) for GPSC. We applied it to protein classification using only surface shape information. Our method first extracts a set of feature points from protein surfaces using FPS and then uses a triangulation-based efficient ICP algorithm to align the feature points of the two proteins to be compared. Tested on a benchmark dataset with 2329 proteins using nearest-neighbor classification, FTIP outperformed the state-of-the-art method for GPSC based on 3D Zernike descriptors. Using real and simulated cryo-EM data, we show that FTIP could be applied in the future to address problems in protein identification in CET experiments. Availability and implementation Programs/scripts we developed/used in the study are available at http://ani.stat.fsu.edu/∼yuan/index.fld/FTIP.tar.bz2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MatAlign: PRECISE PROTEIN STRUCTURE COMPARISON BY MATRIX ALIGNMENT

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002417 ◽

2006 ◽

Vol 04 (06) ◽

pp. 1197-1216 ◽

Cited By ~ 18

Author(s):

ZEYAR AUNG ◽

KIAN-LEE TAN

Keyword(s):

Protein Structure ◽

Protein Structures ◽

Scoring Function ◽

Structure Alignment ◽

Supplementary Information ◽

Protein Structure Alignment ◽

Initial Alignment ◽

Structure Comparison ◽

Structural Database ◽

Step Algorithm

We propose a detailed protein structure alignment method named "MatAlign". It is a two-step algorithm. Firstly, we represent 3D protein structures as 2D distance matrices, and align these matrices by means of dynamic programming in order to find the initially aligned residue pairs. Secondly, we refine the initial alignment iteratively into the optimal one according to an objective scoring function. We compare our method against DALI and CE, which are among the most accurate and the most widely used of the existing structural comparison tools. On the benchmark set of 68 protein structure pairs by Fischer et al., MatAlign provides better alignment results, according to four different criteria, than both DALI and CE in a majority of cases. MatAlign also performs as well in structural database search as DALI does, and much better than CE does. MatAlign is about two to three times faster than DALI, and has about the same speed as CE. The software and the supplementary information for this paper are available at . .

Download Full-text

PROTEIN STRUCTURE ALIGNMENT AND FAST SIMILARITY SEARCH USING LOCAL SHAPE SIGNATURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000533 ◽

2004 ◽

Vol 02 (01) ◽

pp. 215-239 ◽

Cited By ~ 4

Author(s):

TOLGA CAN ◽

YUAN-FANG WANG

Keyword(s):

Protein Structure ◽

Protein Structures ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Specific Information ◽

Alignment Algorithm ◽

Screening Process ◽

Domain Specific ◽

Local Sequence ◽

Shape Signatures

We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.

Download Full-text

Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic

Bioinformatics ◽

10.1093/bioinformatics/btv580 ◽

2015 ◽

Vol 32 (3) ◽

pp. 370-377 ◽

Cited By ~ 11

Author(s):

Peter Brown ◽

Wayne Pullan ◽

Yuedong Yang ◽

Yaoqi Zhou

Keyword(s):

Protein Structure ◽

Structure Alignment ◽

Protein Structure Alignment

Download Full-text

Measurement of protein surface shape by solid angles

Journal of Molecular Graphics ◽

10.1016/0263-7855(86)80086-8 ◽

1986 ◽

Vol 4 (1) ◽

pp. 3-6 ◽

Cited By ~ 90

Author(s):

M L Connolly

Keyword(s):

Surface Shape ◽

Protein Surface ◽

Solid Angles

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Implementation of a Parallel Protein Structure Alignment Service on Cloud

International Journal of Genomics ◽

10.1155/2013/439681 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 17

Author(s):

Che-Lun Hung ◽

Yaw-Ling Lin

Keyword(s):

Protein Structure ◽

Programming Model ◽

Protein Structures ◽

Structure Alignment ◽

Evolutionary Relationships ◽

Protein Structure Alignment ◽

Alignment Algorithm ◽

Cloud Platform ◽

Computational Performance ◽

Refinement Algorithm

Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

Download Full-text