Parallel protein sequence matching on multicore computers

The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.

Download Full-text

Fold-specific sequence scoring improves protein sequence matching

BMC Bioinformatics ◽

10.1186/s12859-016-1198-z ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 2

Author(s):

Sumudu P. Leelananda ◽

Andrzej Kloczkowski ◽

Robert L. Jernigan

Keyword(s):

Protein Sequence ◽

Sequence Matching ◽

Specific Sequence

Download Full-text

Protein Sequence Matching Using Parametric Spectral Estimate Scheme

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2015.061121 ◽

2015 ◽

Vol 6 (11) ◽

Author(s):

Hsuan-Ting Chang ◽

Hsiao-Wei Peng ◽

Ciing-He Li ◽

Neng-Wen Lo

Keyword(s):

Protein Sequence ◽

Spectral Estimate ◽

Sequence Matching

Download Full-text

SeqStruct: A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

10.1101/268904 ◽

2018 ◽

Author(s):

Kejue Jia ◽

Robert L. Jernigan

Keyword(s):

Amino Acid ◽

Protein Sequence ◽

Sequence Similarity ◽

Protein Structures ◽

Substitution Matrix ◽

Similarity Matrix ◽

Sequence Matching ◽

Sequence Structure ◽

Amino Acid Similarity ◽

Simple Amino Acid

SUMMARYProtein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved.

Download Full-text