INDEX-BASED SIMILARITY SEARCH FOR PROTEIN STRUCTURE DATABASES

We propose new methods for finding similarities in protein structure databases. These methods extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. The feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. It quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times, while keeping the sensitivity similar. Our technique can also be incorporated with DALI and CE to improve their running times by a factor of 2 and 2.7 respectively. The software is available online at .

Download Full-text

Protein Structure Databases

Molecular Biotechnology ◽

10.1007/s12033-010-9372-4 ◽

2011 ◽

Vol 48 (2) ◽

pp. 183-198 ◽

Cited By ~ 4

Author(s):

Roman A. Laskowski

Keyword(s):

Protein Structure ◽

Structure Databases

Download Full-text

A Method of Structure Comparison Using Spatial Topological Patterns

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.277-279.272 ◽

2005 ◽

Vol 277-279 ◽

pp. 272-277

Author(s):

Sung Hee Park ◽

Keun Ho Ryu

Keyword(s):

Spatial Data ◽

Similarity Search ◽

3D Structure ◽

Structural Similarity ◽

Query Protein ◽

Data Types ◽

Structure Comparison ◽

3D Space ◽

Structure Databases ◽

Computationally Expensive

The problem of comparison of structural similarity has been complex and computationally expensive. The first step to solve comparison of structural similarity in 3D structure databases is to develop fast methods for structural similarity. Therefore, we propose a new method of comparing structural similarity in protein structure databases by using topological patterns of proteins. In our approach, the geometry of secondary structure elements in 3D space is represented by spatial data types and is indexed using Rtrees. Topological patterns are discovered by spatial topology relations based on the Rtree index join. An algorithm for a similarity search compares topological patterns of a query protein with those of proteins in structure databases by the intersection frequency of SSEs. Our experimental results show that the execution time of our method is three times faster than the generally known method DALITE. Our method can generate small candidate sets for more accurate alignment tools such as DALI and SSAP.

Download Full-text

Protein Structure Databases

Computational Structural Biology ◽

10.1142/9789812778789_0026 ◽

2008 ◽

pp. 705-727

Author(s):

D. Dimitropoulos ◽

M. John ◽

E. Krissinel ◽

R. Newman ◽

G. J. Swaminathan

Keyword(s):

Protein Structure ◽

Structure Databases

Download Full-text

3P004 Shape comparison of 3D electron microscopy data using both feature-vectors and GMM-based superimpositions(01A. Protein: Structure,Poster,The 52nd Annual Meeting of the Biophysical Society of Japan(BSJ2014))

Seibutsu Butsuri ◽

10.2142/biophys.54.s249_4 ◽

2014 ◽

Vol 54 (supplement1-2) ◽

pp. S249

Author(s):

Hirofumi Suzuki ◽

Takeshi Kawabata ◽

Haruki Nakamura

Keyword(s):

Electron Microscopy ◽

Protein Structure ◽

Annual Meeting ◽

Feature Vectors ◽

Shape Comparison ◽

Electron Microscopy Data ◽

3D Electron Microscopy ◽

Biophysical Society ◽

3D Electron ◽

Microscopy Data

Download Full-text

Towards index-based similarity search for protein structure databases

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003 ◽

10.1109/csb.2003.1227314 ◽

2004 ◽

Cited By ~ 6

Author(s):

O. Camoglu ◽

T. Kahveci ◽

A.K. Singh

Keyword(s):

Protein Structure ◽

Similarity Search ◽

Structure Databases

Download Full-text

Improving classification in protein structure databases using text mining

BMC Bioinformatics ◽

10.1186/1471-2105-10-129 ◽

2009 ◽

Vol 10 (1) ◽

pp. 129 ◽

Cited By ~ 11

Author(s):

Antonis Koussounadis ◽

Oliver C Redfern ◽

David T Jones

Keyword(s):

Protein Structure ◽

Text Mining ◽

Structure Databases

Download Full-text

G-PAS 2.0 – an improved version of protein alignment tool with an efficient backtracking routine on multiple GPUs

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/v10175-012-0062-1 ◽

2012 ◽

Vol 60 (3) ◽

pp. 491-494 ◽

Cited By ~ 1

Author(s):

W. Frohmberg ◽

M. Kierzynka ◽

J. Blazewicz ◽

P. Wojciechowski

Keyword(s):

High Throughput ◽

Graphics Processing Units ◽

Pairwise Alignment ◽

Protein Alignment ◽

Multiple Gpus ◽

The Past ◽

Highly Efficient ◽

Computational Architecture ◽

Alignment Tool ◽

Graphics Processing

Abstract Several highly efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units). G-PAS (GPU-based Pairwise Alignment Software) was one of them, however, with a couple of interesting features that made it unique. Nevertheless, in order to adapt it to a new computational architecture some changes had to be introduced. In this paper we present G-PAS 2.0 - a new version of the software for performing high-throughput alignment. Results show, that the new version is faster nearly by a fourth on the same hardware, reaching over 20 GCUPS (Giga Cell Updates Per Second).

Download Full-text

Deep Template-based Protein Structure Prediction

10.1101/2020.12.26.424433 ◽

2020 ◽

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Random Fields ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Supplementary Information ◽

Distance Information ◽

Alternating Direction

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.

Download Full-text

Analytical tools in protein structure determination

International Journal of Pharmaceutical Chemistry and Analysis ◽

10.18231/j.ijpca.2021.021 ◽

2021 ◽

Vol 8 (3) ◽

pp. 103-111

Author(s):

Krishna R Gupta ◽

Uttam Patle ◽

Uma Kabra ◽

P. Mishra ◽

Milind J Umekar

Keyword(s):

Protein Structure ◽

Structure Determination ◽

Structure Prediction ◽

Protein Structures ◽

Three Dimensional ◽

Protein Structure Determination ◽

Computational Techniques ◽

Determination Process ◽

Structure Databases ◽

Analytical Tools

Three-dimensional protein structure prediction from amino acid sequence has been a thought-provoking task for decades, but it of pivotal importance as it provides a better understanding of its function. In recent years, the methods for prediction of protein structures have advanced considerably. Computational techniques and increase in protein sequence and structure databases have influence the laborious protein structure determination process. Still there is no single method which can predict all the protein structures. In this review, we describe the four stages of protein structure determination. We have also explored the currenttechniques used to uncover the protein structure and highpoint best suitable method for a given protein.

Download Full-text

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

10.1101/2020.10.06.327585 ◽

2020 ◽

Author(s):

Fusong Ju ◽

Jianwei Zhu ◽

Bin Shao ◽

Lupeng Kong ◽

Tie-Yan Liu ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Protein ◽

Spatial Proximity ◽

Multiple Sequence ◽

Variance Matrix

Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches.

Download Full-text