PROTEIN STRUCTURE ALIGNMENT AND FAST SIMILARITY SEARCH USING LOCAL SHAPE SIGNATURES

We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.

Download Full-text

Implementation of a Parallel Protein Structure Alignment Service on Cloud

International Journal of Genomics ◽

10.1155/2013/439681 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 17

Author(s):

Che-Lun Hung ◽

Yaw-Ling Lin

Keyword(s):

Protein Structure ◽

Programming Model ◽

Protein Structures ◽

Structure Alignment ◽

Evolutionary Relationships ◽

Protein Structure Alignment ◽

Alignment Algorithm ◽

Cloud Platform ◽

Computational Performance ◽

Refinement Algorithm

Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

Download Full-text

Fast Protein Structure Alignment Algorithm Based on Local Geometric Similarity

Lecture Notes in Computer Science - MICAI 2006: Advances in Artificial Intelligence ◽

10.1007/11925231_113 ◽

2006 ◽

pp. 1179-1189 ◽

Cited By ~ 1

Author(s):

Chan-Yong Park ◽

Sung-Hee Park ◽

Dae-Hee Kim ◽

Soo-Jun Park ◽

Man-Kyu Sung ◽

...

Keyword(s):

Protein Structure ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Alignment Algorithm ◽

Geometric Similarity

Download Full-text

FAST: A novel protein structure alignment algorithm

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.20331 ◽

2004 ◽

Vol 58 (3) ◽

pp. 618-627 ◽

Cited By ~ 114

Author(s):

Jianhua Zhu ◽

Zhiping Weng

Keyword(s):

Protein Structure ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Alignment Algorithm ◽

Novel Protein

Download Full-text

TOP: a new method for protein structure comparisons and similarity searches

Journal of Applied Crystallography ◽

10.1107/s0021889899012339 ◽

2000 ◽

Vol 33 (1) ◽

pp. 176-183 ◽

Cited By ~ 149

Author(s):

Guoguang Lu

Keyword(s):

User Interface ◽

Protein Structure ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Structure Alignment ◽

Dimensional Structure ◽

Protein Structure Alignment ◽

Protein Structure Analysis ◽

Structure Comparison

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.

Download Full-text

OPTIMAL PAIRWISE ALIGNMENT OF FIXED PROTEIN STRUCTURES IN SUBQUADRATIC TIME

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005562 ◽

2011 ◽

Vol 09 (03) ◽

pp. 367-382 ◽

Cited By ~ 7

Author(s):

ALEKSANDAR POLEKSIC

Keyword(s):

Protein Structure ◽

Structural Alignment ◽

Protein Structures ◽

Dynamic Programming Algorithm ◽

Pairwise Alignment ◽

Structure Alignment ◽

Programming Algorithm ◽

Protein Structure Alignment ◽

Running Time ◽

Speed Accuracy

The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith–Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith–Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed–accuracy tradeoff in a number of popular protein structure alignment methods.

Download Full-text

FATCAT 2.0: towards a better understanding of the structural diversity of proteins

Nucleic Acids Research ◽

10.1093/nar/gkaa443 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W60-W64

Author(s):

Zhanwen Li ◽

Lukasz Jaroszewski ◽

Mallika Iyer ◽

Mayya Sedova ◽

Adam Godzik

Keyword(s):

Protein Structures ◽

Structural Diversity ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Alignment Algorithm ◽

Graphical Interface ◽

Structural Differences ◽

Functional Forms ◽

Crystallization Conditions ◽

Flexible Protein

Abstract FATCAT 2.0 server (http://fatcat.godziklab.org/), provides access to a flexible protein structure alignment algorithm developed in our group. In such an alignment, rotations and translations between elements in the structure are allowed to minimize the overall root mean square deviation (RMSD) between the compared structures. This allows to effectively compare protein structures even if they underwent structural rearrangements in different functional forms, different crystallization conditions or as a result of mutations. The major update for the server introduces a new graphical interface, much faster database searches and several new options for visualization of the structural differences between proteins

Download Full-text

A rapid protein structure alignment algorithm based on a text modeling technique

Bioinformation ◽

10.6026/97320630006344 ◽

2011 ◽

Vol 6 (9) ◽

pp. 344-347 ◽

Cited By ~ 4

Author(s):

Jafar Razmara ◽

Safaai Deris ◽

Sepideh Parvizpour

Keyword(s):

Protein Structure ◽

Structure Alignment ◽

Modeling Technique ◽

Protein Structure Alignment ◽

Alignment Algorithm

Download Full-text

TM-align: a protein structure alignment algorithm based on the TM-score

Nucleic Acids Research ◽

10.1093/nar/gki524 ◽

2005 ◽

Vol 33 (7) ◽

pp. 2302-2309 ◽

Cited By ~ 1476

Author(s):

Y. Zhang

Keyword(s):

Protein Structure ◽

Structure Alignment ◽

Protein Structure Alignment ◽

Alignment Algorithm

Download Full-text

MatAlign: PRECISE PROTEIN STRUCTURE COMPARISON BY MATRIX ALIGNMENT

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002417 ◽

2006 ◽

Vol 04 (06) ◽

pp. 1197-1216 ◽

Cited By ~ 18

Author(s):

ZEYAR AUNG ◽

KIAN-LEE TAN

Keyword(s):

Protein Structure ◽

Protein Structures ◽

Scoring Function ◽

Structure Alignment ◽

Supplementary Information ◽

Protein Structure Alignment ◽

Initial Alignment ◽

Structure Comparison ◽

Structural Database ◽

Step Algorithm

We propose a detailed protein structure alignment method named "MatAlign". It is a two-step algorithm. Firstly, we represent 3D protein structures as 2D distance matrices, and align these matrices by means of dynamic programming in order to find the initially aligned residue pairs. Secondly, we refine the initial alignment iteratively into the optimal one according to an objective scoring function. We compare our method against DALI and CE, which are among the most accurate and the most widely used of the existing structural comparison tools. On the benchmark set of 68 protein structure pairs by Fischer et al., MatAlign provides better alignment results, according to four different criteria, than both DALI and CE in a majority of cases. MatAlign also performs as well in structural database search as DALI does, and much better than CE does. MatAlign is about two to three times faster than DALI, and has about the same speed as CE. The software and the supplementary information for this paper are available at . .

Download Full-text

ALIGNING MULTIPLE PROTEIN STRUCTURES BY DETERMINISTIC ANNEALING

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001351 ◽

2005 ◽

Vol 03 (04) ◽

pp. 837-860 ◽

Cited By ~ 5

Author(s):

TIANSHOU ZHOU ◽

LUONAN CHEN ◽

YUN TANG ◽

XIANGSUN ZHANG

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Protein Structures ◽

Structure Alignment ◽

Deterministic Annealing ◽

Protein Structure Alignment ◽

Protein Chain ◽

Fold Family ◽

Multiple Protein ◽

Wide Range

Protein structure alignment plays a key role in protein structure prediction and fold family classification. An efficient method for multiple protein structure alignment in a mathematical manner is presented, based on deterministic annealing technique. The alignment problem is mapped onto a nonlinear continuous optimization problem (NCOP) with common consensus chain, matching assignment matrices and atomic coordinates as variables. At each step in the annealing procedure, the NCOP is decomposed into as many subproblems as the number of protein chains, each of which is actually an independent pairwise structure alignment between a protein chain and the consensus chain and hence can be efficiently solved by the parallel computation technique. The proposed method is robust with respect to choice of iteration parameters for a wide range of proteins, and performs well in both multiple and pairwise structure alignment cases, compared with existing alignment methods.

Download Full-text