Reducing Dimensionality in Remote Homology Detection

Abstract: Homology detection plays a major role in bioinformatics. Different type of methods is used for Homology detection. Here we extract the information from protein sequences and then uses the various algorithm to predict the similarity between protein families. SVM most commonly used the algorithm in homology detection. Classification techniques are not suitable for homology detection because theyare not suitable for high dimensional datasets. Soreducing the higher dimensionality is very important than easily can predict the similarity of protein families. Keywords: Homology detection, Protein, Sequence, Reducing dimensionality, BLAST, SCOP.

Download Full-text

SPSO: Synthetic Protein Sequence Oversampling for Imbalanced Protein Data and Remote Homology Detection

Biological and Medical Data Analysis - Lecture Notes in Computer Science ◽

10.1007/11946465_10 ◽

2006 ◽

pp. 104-115 ◽

Cited By ~ 2

Author(s):

Majid Beigi ◽

Andreas Zell

Keyword(s):

Protein Sequence ◽

Homology Detection ◽

Remote Homology ◽

Synthetic Protein ◽

Remote Homology Detection

Download Full-text

Filling-in Void and Sparse Regions in Protein Sequence Space by Protein-Like Artificial Sequences Enables Remarkable Enhancement in Remote Homology Detection Capability

Journal of Molecular Biology ◽

10.1016/j.jmb.2013.11.026 ◽

2014 ◽

Vol 426 (4) ◽

pp. 962-979 ◽

Cited By ~ 8

Author(s):

Richa Mudgal ◽

Ramanathan Sowdhamini ◽

Nagasuma Chandra ◽

Narayanaswamy Srinivasan ◽

Sankaran Sandhya

Keyword(s):

Sequence Space ◽

Protein Sequence ◽

Homology Detection ◽

Detection Capability ◽

Remote Homology ◽

Protein Sequence Space ◽

Remote Homology Detection

Download Full-text

Synthetic Protein Sequence Oversampling Method for Classification and Remote Homology Detection in Imbalanced Protein Data

Bioinformatics Research and Development - Lecture Notes in Computer Science ◽

10.1007/978-3-540-71233-6_21 ◽

2007 ◽

pp. 263-277 ◽

Cited By ~ 2

Author(s):

Majid M. Beigi ◽

Andreas Zell

Keyword(s):

Protein Sequence ◽

Homology Detection ◽

Remote Homology ◽

Synthetic Protein ◽

Remote Homology Detection

Download Full-text

Word correlation matrices for protein sequence analysis and remote homology detection

BMC Bioinformatics ◽

10.1186/1471-2105-9-259 ◽

2008 ◽

Vol 9 (1) ◽

Cited By ~ 15

Author(s):

Thomas Lingner ◽

Peter Meinicke

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Protein Sequence Analysis ◽

Homology Detection ◽

Correlation Matrices ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

Molecular BioSystems ◽

10.1039/c2mb25113b ◽

2012 ◽

Vol 8 (8) ◽

pp. 2076 ◽

Cited By ~ 6

Author(s):

S. Sandhya ◽

R. Mudgal ◽

C. Jayadev ◽

K. R. Abhinandan ◽

R. Sowdhamini ◽

...

Keyword(s):

Sequence Space ◽

Protein Sequence ◽

Space Use ◽

Homology Detection ◽

Remote Homology ◽

Protein Sequence Space ◽

Remote Homology Detection

Download Full-text

The Irredundant Class Method for Remote Homology Detection of Protein Sequences

Journal of Computational Biology ◽

10.1089/cmb.2010.0171 ◽

2011 ◽

Vol 18 (12) ◽

pp. 1819-1829 ◽

Cited By ~ 18

Author(s):

Matteo Comin ◽

Davide Verzotto

Keyword(s):

Protein Sequences ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21049 ◽

2006 ◽

Vol 64 (4) ◽

pp. 960-967 ◽

Cited By ~ 27

Author(s):

Derek A. Debe ◽

Joseph F. Danzer ◽

William A. Goddard ◽

Aleksandar Poleksic

Keyword(s):

Dynamic Programming ◽

Protein Sequence ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) for Protein Remote Homology Detection

10.21203/rs.3.rs-729077/v1 ◽

2021 ◽

Author(s):

Sajithra Nakshathram ◽

Ramyachitra Duraisamy ◽

Manikandan Pandurangan

Keyword(s):

Machine Learning ◽

Structural Alignment ◽

Protein Sequences ◽

Support Vector ◽

Local Alignment ◽

Homology Detection ◽

Remote Homology ◽

Matrix Sampling ◽

N Gram ◽

Remote Homology Detection

Abstract Background: Protein Remote Homology Detection (PRHD) is used to find the homologous proteins which are similar in function and structure but sharing low sequence identity. In general, the Sequence-Order Frequency Matrix (SOFM) was used for protein remote homology detection. In the SOFM Top-n-gram (SOFM-Top) algorithm, the probability of substrings was calculated based on the highest probability value of substrings. Moreover, SOFM-Smith Waterman (SOFM-SW) algorithm combines the SOFM with local alignment for protein remote homology detection. However, the computation complexity of SOFM based PRHD is high since it processes all protein sequences in SOFM.Objective: Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) algorithm is proposed for predicting the protein remote homology. The SOFM-SMSW algorithm used the PVS method to select the optimum target sequences based on the uniform distribution measure.Method: This research work considers the most important sequences for PRHD by introducing Proportional Volume Sampling (PVS). After sampling the protein sequences, a feature vector is constructed and labeling is performed based on the concatenation between two protein sequences. Then, a substitution score which represents the structural alignment is learned using k-Nearest Neighbor (k-NN). Based on the learned substitution score and alignment score, the protein homology is detected using Smith-Waterman algorithm and Support Vector Machine (SVM). By selecting the most important sequences, the accuracy of PRHD is improved and the computational complexity for PRHD is reduced by using structural alignment along with the local alignment.Results: The performance of the proposed SOFM-SMSW algorithm is tested with SCOP database and it has been compared with various existing algorithms such as SVM Top-N-gram, SVM pairwise, GPkernal, Long Short-Term Memory (LSTM), SOFM Top-N-gram and SOFM-SW. Conclusion: The experimental results illustrate that the proposed SOFM-SMSW algorithm has better accuracy, precision, recall, ROC and ROC 50 for PRHD than the other existing algorithms.

Download Full-text