protein structure retrieval
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

2007 ◽  
Author(s):  
◽  
Pin-Hao Chi

Functionally important sites of proteins are potentially conserved to specific three-dimensional structural folds. To understand the structure-to-function relationship, life sciences researchers and biologists have a great need to retrieve similar structures from protein databases and classify these structures into the same protein fold. Traditional protein structure retrieval and classification methods are known to be either computationally expensive or labor intensive. In the past decade, more than 35000 protein structures have been identified. To meet the needs of fast retrieval and classifying high-throughput protein data, our research covers three main subjects: (1) Real-time global protein structure retrieval: We introduce an image-based approach that extracts signatures of three-dimensional protein structures. Our high-level protein signatures are then indexed by multi-dimensional indexing trees for fast retrieval. (2) Real-time global protein structure classification: An advanced knowledge discovery and data mining (KDD) model is proposed to convert high-level protein signature into itemsets for mining association rules. The advantage of this KDD approach is to effectively reveal the hidden knowledge from similar protein tertiary structures and quickly suggest possible SCOP domains for a newly-discovered protein. In addition, we develop a non-parametric classifier, E-Predict, that can rapidly assign known SCOP folds and recognize novel folds for newly-discovered proteins. (3) Efficient local protein structure retrieval and classification: We propose a novel algorithm, namely, the Index-based Protein Substructure Alignment (IPSA), that constructs a two-layer indexing tree to capture the obscured similarity of protein substructures in a timely fashion. Our research works exhibit significantly high efficiency with reasonably high accuracy and will benefit the study of high-throughput protein structure-function evolutionary relationships.


2006 ◽  
Vol 7 (Suppl 5) ◽  
pp. S5 ◽  
Author(s):  
Sourangshu Bhattacharya ◽  
Chiranjib Bhattacharyya ◽  
Nagasuma R Chandra

Author(s):  
PIN-HAO CHI ◽  
GRANT SCOTT ◽  
CHI-REN SHYU

Indexing protein tertiary structures has been shown to provide a scalable solution for structure-to-structure comparisons in large protein structure retrieval systems. To conduct similarity searches against 53,356 polypeptide chains in a database with real-time responses, two critical issues must be addressed, information extraction and suitable indexing. In this paper, we apply computer vision techniques to extract the predominant information encoded in each 2D distance matrix, generated from 3D coordinates of protein chains. Distance matrices are capable of representing specific protein structural topologies, and similar proteins will generate similar matrices. Once meaningful features are extracted from distance images, an advanced indexing structure, Entropy Balanced Statistical (EBS) k-d tree, can be utilized to index the multidimensional data. With a limited amount of training data from domain experts, namely structural classification of a subset of available protein chains, we apply various techniques in the pattern recognition field to determine clusters of proteins in the multi-dimensional feature space. Our system is able to recall search results in a ranked order from the protein database in seconds, exhibiting a reasonably high degree of precision.


2005 ◽  
Vol 277-279 ◽  
pp. 324-330 ◽  
Author(s):  
Sung Hee Park ◽  
Soo Jun Park ◽  
Seon Hee Park

This paper proposes a novel protein structure descriptor (or representation) and its application for structure comparison. Since the functions of protein may come from its structure, the method of measuring structural similarities between two proteins can infer their functional closeness. In this paper, we have developed a novel descriptor (3D edge histogram) to compare the structures of proteins. The 3D edge histogram is a local distribution of bonds between the atoms in a protein. We have designed and implemented a protein structure retrieval system based on the 3D edge histogram to demonstrate that it could be effective in protein structure comparison. In this system, principal component analysis for aligning, voxelization for volume generation, quantization, 3D edge extraction, and comparison of 3D edge histogram are performed. The protein structure retrieval system using the 3D edge histogram shows fast retrieval with relatively precise results. It can be used for pre-screening purposes with a huge database.


Sign in / Sign up

Export Citation Format

Share Document