homology detection Latest Research Papers

Abstract: Homology detection plays a major role in bioinformatics. Different type of methods is used for Homology detection. Here we extract the information from protein sequences and then uses the various algorithm to predict the similarity between protein families. SVM most commonly used the algorithm in homology detection. Classification techniques are not suitable for homology detection because theyare not suitable for high dimensional datasets. Soreducing the higher dimensionality is very important than easily can predict the similarity of protein families. Keywords: Homology detection, Protein, Sequence, Reducing dimensionality, BLAST, SCOP.

Cross-Platform Binary Code Homology Analysis Based on GRU Graph Embedding

Security and Communication Networks ◽

10.1155/2021/3095203 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shen Wang ◽

Xunzhi Jiang ◽

Xiangzhan Yu ◽

Xiaohui Su

Keyword(s):

Natural Language ◽

Binary Code ◽

Graph Embedding ◽

Control Flow ◽

Detection Accuracy ◽

Plagiarism Detection ◽

Homology Detection ◽

Homology Analysis ◽

Cross Platform ◽

Iot Devices

Binary code homology analysis refers to detecting whether two pieces of binary code are compiled from the same piece of source code, which is a fundamental technique for many security applications, such as vulnerability search, plagiarism detection, and malware detection. With the increase in critical vulnerabilities in IoT devices, homology analysis is increasingly needed to perform cross-platform vulnerability searches. Existing methods for cross-platform binary code homology detection usually convert binary code to instruction sequences and do semantic embedding of the sequences as if they were natural language. However, the gap between natural language and binary code is large, and the spatial features of the binary code are easily lost by directly comparing the semantics. In this paper, we propose a GRU-based graph embedding method to compare the homology of binary functions. First, the attribute control flow graph (ACFG) is built for the assembly function, then the GRU-based graph embedding neural network is used to generate the embedding vector for the ACFG, and finally the homology of the binary code is determined by calculating the distance between the embedding vectors. The experimental results show that our method greatly improves the detection accuracy of negative samples compared with Gemini, the latest method based on graph embedding binary code similarity detection.

Evolutionary Characterization of the Short Protein SPAAR

Genes ◽

10.3390/genes12121864 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1864

Author(s):

Jiwon Lee ◽

Aaron Wacholder ◽

Anne-Ruxandra Carvunis

Keyword(s):

Muscle Regeneration ◽

Evolutionary Dynamics ◽

De Novo ◽

Sequence Divergence ◽

Biological Processes ◽

Noncoding Sequence ◽

Homology Detection ◽

Evolutionary Innovation ◽

Short Protein

Microproteins (<100 amino acids) are receiving increasing recognition as important participants in numerous biological processes, but their evolutionary dynamics are poorly understood. SPAAR is a recently discovered microprotein that regulates muscle regeneration and angiogenesis through interactions with conserved signaling pathways. Interestingly, SPAAR does not belong to any known protein family and has known homologs exclusively among placental mammals. This lack of distant homology could be caused by challenges in homology detection of short sequences, or it could indicate a recent de novo emergence from a noncoding sequence. By integrating syntenic alignments and homology searches, we identify SPAAR orthologs in marsupials and monotremes, establishing that SPAAR has existed at least since the emergence of mammals. SPAAR shows substantial primary sequence divergence but retains a conserved protein structure. In primates, we infer two independent evolutionary events leading to the de novo origination of 5′ elongated isoforms of SPAAR from a noncoding sequence and find evidence of adaptive evolution in this extended region. Thus, SPAAR may be of ancient origin, but it appears to be experiencing continual evolutionary innovation in mammals.

Transfer of Knowledge from Model Organisms to Evolutionarily Distant Non-Model Organisms: The Coral Pocillopora damicornis Membrane Signaling Receptome

10.1101/2021.10.18.464760 ◽

2021 ◽

Author(s):

Lokender Kumar ◽

Nathanael Brenner ◽

Samuel Sledieski ◽

Monsurat Olaosebikan ◽

Matthew Lynn-Goin ◽

...

Keyword(s):

Markov Models ◽

Sequence Data ◽

Function Analysis ◽

Light Sensitivity ◽

Membrane Receptors ◽

Model Organisms ◽

Transfer Of Knowledge ◽

Pocillopora Damicornis ◽

Homology Detection ◽

Remote Homology

With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the need to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins are therefore often in the gray zone or undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to be true human homologues. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.

Protein remote homology detection combining PCA and multiobjective optimization tools

Evolutionary Intelligence ◽

10.1007/s12065-021-00642-6 ◽

2021 ◽

Author(s):

Mukti Routray ◽

Swati Vipsita

Keyword(s):

Multiobjective Optimization ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) for Protein Remote Homology Detection

10.21203/rs.3.rs-729077/v1 ◽

2021 ◽

Author(s):

Sajithra Nakshathram ◽

Ramyachitra Duraisamy ◽

Manikandan Pandurangan

Keyword(s):

Machine Learning ◽

Structural Alignment ◽

Protein Sequences ◽

Support Vector ◽

Local Alignment ◽

Homology Detection ◽

Remote Homology ◽

Matrix Sampling ◽

N Gram ◽

Remote Homology Detection

Abstract Background: Protein Remote Homology Detection (PRHD) is used to find the homologous proteins which are similar in function and structure but sharing low sequence identity. In general, the Sequence-Order Frequency Matrix (SOFM) was used for protein remote homology detection. In the SOFM Top-n-gram (SOFM-Top) algorithm, the probability of substrings was calculated based on the highest probability value of substrings. Moreover, SOFM-Smith Waterman (SOFM-SW) algorithm combines the SOFM with local alignment for protein remote homology detection. However, the computation complexity of SOFM based PRHD is high since it processes all protein sequences in SOFM.Objective: Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) algorithm is proposed for predicting the protein remote homology. The SOFM-SMSW algorithm used the PVS method to select the optimum target sequences based on the uniform distribution measure.Method: This research work considers the most important sequences for PRHD by introducing Proportional Volume Sampling (PVS). After sampling the protein sequences, a feature vector is constructed and labeling is performed based on the concatenation between two protein sequences. Then, a substitution score which represents the structural alignment is learned using k-Nearest Neighbor (k-NN). Based on the learned substitution score and alignment score, the protein homology is detected using Smith-Waterman algorithm and Support Vector Machine (SVM). By selecting the most important sequences, the accuracy of PRHD is improved and the computational complexity for PRHD is reduced by using structural alignment along with the local alignment.Results: The performance of the proposed SOFM-SMSW algorithm is tested with SCOP database and it has been compared with various existing algorithms such as SVM Top-N-gram, SVM pairwise, GPkernal, Long Short-Term Memory (LSTM), SOFM Top-N-gram and SOFM-SW. Conclusion: The experimental results illustrate that the proposed SOFM-SMSW algorithm has better accuracy, precision, recall, ROC and ROC 50 for PRHD than the other existing algorithms.

Remote Homology Detection Identifies a Eukaryotic RPA DBD-C-like DNA Binding Domain as a Conserved Feature of Archaeal Rpa1-Like Proteins

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.675229 ◽

2021 ◽

Vol 8 ◽

Author(s):

Stuart A. MacNeill

Keyword(s):

Dna Binding ◽

Zinc Finger ◽

Protein A ◽

Evolutionary Relationship ◽

Terminal Region ◽

Homology Detection ◽

Winged Helix ◽

C Terminus ◽

Evolutionary Relatedness ◽

Ob Fold

The eukaryotic single-stranded DNA binding factor replication protein A (RPA) is essential for DNA replication, repair and recombination. RPA is a heterotrimer containing six related OB folds and a winged helix-turn-helix (wH) domain. The OB folds are designated DBD-A through DBD-F, with DBD-A through DBD-D being directly involved in ssDNA binding. DBD-C is located at the C-terminus of the RPA1 protein and has a distinctive structure that includes an integral C4 zinc finger, while the wH domain is found at the C-terminus of the RPA2 protein. Previously characterised archaeal RPA proteins fall into a number of classes with varying numbers of OB folds, but one widespread class includes proteins that contain a C4 or C3H zinc finger followed by a 100–120 amino acid C-terminal region reported to lack detectable sequence or structural similarity. Here, the sequences spanning this zinc finger and including the C-terminal region are shown to comprise a previously unrecognised DBD-C-like OB fold, confirming the evolutionary relatedness of this group of archaeal RPA proteins to eukaryotic RPA1. The evolutionary relationship between eukaryotic and archaeal RPA is further underscored by the presence of RPA2-like proteins comprising an OB fold and C-terminal winged helix (wH) domain in multiple species and crucially, suggests that several biochemically characterised archaeal RPA proteins previously thought to exist as monomers are likely to be RPA1-RPA2 heterodimers.

PHROG: families of prokaryotic virus proteins clustered using remote homology

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab067 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Paul Terzian ◽

Eric Olo Ndela ◽

Clovis Galiez ◽

Julien Lossouarn ◽

Rubén Enrique Pérez Bucio ◽

...

Keyword(s):

Reference Sequence ◽

Homology Detection ◽

Protein Families ◽

Great Opportunity ◽

Remote Homology ◽

Viral Genomes ◽

Manual Inspection ◽

Clustering Approach ◽

Biological Entities ◽

Similarity Searches

Abstract Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities.

Extending the Horizon of Homology Detection with Coevolution-based structure prediction

Journal of Molecular Biology ◽

10.1016/j.jmb.2021.167106 ◽

2021 ◽

pp. 167106

Author(s):

Luis Sanchez-Pulido ◽

Chris P Ponting

Keyword(s):

Structure Prediction ◽

Homology Detection

Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.643752 ◽

2021 ◽

Vol 8 ◽

Author(s):

Sutanu Bhattacharya ◽

Rahmatullah Roche ◽

Md Hossain Shuvo ◽

Debswapna Bhattacharya

Keyword(s):

Structure Prediction ◽

Accurate Estimation ◽

Protein Homology ◽

Homology Detection ◽

Sequence Alignments ◽

Homologous Proteins ◽

Multiple Sequence ◽

Additional Information ◽

Residue Interaction ◽

Interaction Map

Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.

homology detection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Reducing Dimensionality in Remote Homology Detection

Cross-Platform Binary Code Homology Analysis Based on GRU Graph Embedding

Evolutionary Characterization of the Short Protein SPAAR

Transfer of Knowledge from Model Organisms to Evolutionarily Distant Non-Model Organisms: The Coral Pocillopora damicornis Membrane Signaling Receptome

Protein remote homology detection combining PCA and multiobjective optimization tools

Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) for Protein Remote Homology Detection

Remote Homology Detection Identifies a Eukaryotic RPA DBD-C-like DNA Binding Domain as a Conserved Feature of Archaeal Rpa1-Like Proteins

PHROG: families of prokaryotic virus proteins clustered using remote homology

Extending the Horizon of Homology Detection with Coevolution-based structure prediction

Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading

Export Citation Format

homology detectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Reducing Dimensionality in Remote Homology Detection

Cross-Platform Binary Code Homology Analysis Based on GRU Graph Embedding

Evolutionary Characterization of the Short Protein SPAAR

Transfer of Knowledge from Model Organisms to Evolutionarily Distant Non-Model Organisms: The Coral Pocillopora damicornis Membrane Signaling Receptome

Protein remote homology detection combining PCA and multiobjective optimization tools

Sequence-Order Frequency Matrix - Sampling and Machine learning with Smith-Waterman (SOFM-SMSW) for Protein Remote Homology Detection

Remote Homology Detection Identifies a Eukaryotic RPA DBD-C-like DNA Binding Domain as a Conserved Feature of Archaeal Rpa1-Like Proteins

PHROG: families of prokaryotic virus proteins clustered using remote homology

Extending the Horizon of Homology Detection with Coevolution-based structure prediction

Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading

homology detection
Recently Published Documents