Protein Structural Alignments From Sequence

SHsearch - a method for fast remote homology detection

10.31219/osf.io/sh5gj ◽

2018 ◽

Author(s):

Mohamed Baddar

Keyword(s):

Markov Models ◽

Sequence Similarity ◽

Protein Homology ◽

Homology Detection ◽

Remote Homology ◽

Sensitivity Loss ◽

Computationally Intensive ◽

Profile Hmm ◽

Programming Algorithms ◽

Remote Homology Detection

Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases. Methods based on profile hidden Markov models (HMM) often exhibit relatively higher sensitivity for detecting remote homologies than commonly used approaches. However, calculating similarity scores in profile HMM methods is computationally intensive as they use dynamic programming algorithms. In this paper we introduce SHsearch: a new method for remote protein homology detection. Our method is implemented as a modification of HHsearch: a remote protein homology detection method based on comparing two profile HMMs. The motivation for modification was to reduce the run time of HHsearch significantly with minimal sensitivity loss. SHsearch focuses on comparing the important submodels of the query and database HMMs instead of comparing the complete models. Hence, SHsearch achieves a significant speedup over HHsearch with minimal loss in sensitivity. On SCOP 1.63, SHsearch achieved 88X speedup with 8.2% loss in sensitivity with respect to HHsearch at error rate of 10%, which deemed to be an acceptable tradeoff.

Download Full-text

SHsearch: a method for fast remote homology detection

10.7287/peerj.preprints.27111 ◽

2018 ◽

Author(s):

Mohamed Baddar

Keyword(s):

Markov Models ◽

Sequence Similarity ◽

Protein Homology ◽

Homology Detection ◽

Remote Homology ◽

Sensitivity Loss ◽

Computationally Intensive ◽

Profile Hmm ◽

Programming Algorithms ◽

Remote Homology Detection

Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases. Methods based on profile hidden Markov models (HMM) often exhibit relatively higher sensitivity for detecting remote homologies than commonly used approaches. However, calculating similarity scores in profile HMM methods is computationally intensive as they use dynamic programming algorithms. In this paper we introduce SHsearch: a new method for remote protein homology detection. Our method is implemented as a modification of HHsearch: a remote protein homology detection method based on comparing two profile HMMs. The motivation for modification was to reduce the run time of HHsearch significantly with minimal sensitivity loss. SHsearch focuses on comparing the important submodels of the query and database HMMs instead of comparing the complete models. Hence, SHsearch achieves a significant speedup over HHsearch with minimal loss in sensitivity. On SCOP 1.63, SHsearch achieved 88X speedup with 8.2% loss in sensitivity with respect to HHsearch at error rate of 10%, which deemed to be an acceptable tradeoff.

Download Full-text

Efficient Multiple Sequences Alignment Algorithm Generation via Components Assembly Under PAR Framework

Frontiers in Genetics ◽

10.3389/fgene.2020.628175 ◽

2021 ◽

Vol 11 ◽

Author(s):

Haipeng Shi ◽

Haihe Shi ◽

Shenghua Xu

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Sequence Similarity ◽

Alignment Algorithm ◽

Pairwise Sequence Alignment ◽

Multiple Sequence ◽

Sequence Alignment Algorithm ◽

Alignment Algorithms ◽

Sequence Similarity Analysis ◽

High Level

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.

Download Full-text

On the Role of Structural Information in Remote Homology Detection and Sequence Alignment: New Methods Using Hybrid Sequence Profiles

Journal of Molecular Biology ◽

10.1016/j.jmb.2003.10.025 ◽

2003 ◽

Vol 334 (5) ◽

pp. 1043-1062 ◽

Cited By ~ 60

Author(s):

Christopher L. Tang ◽

Lei Xie ◽

Ingrid Y.Y. Koh ◽

Shoshana Posy ◽

Emil Alexov ◽

...

Keyword(s):

Sequence Alignment ◽

Structural Information ◽

Homology Detection ◽

Remote Homology ◽

New Methods ◽

Hybrid Sequence ◽

Sequence Profiles ◽

Remote Homology Detection

Download Full-text

SHsearch: a method for fast remote homology detection

10.7287/peerj.preprints.27111v1 ◽

2018 ◽

Author(s):

Mohamed Baddar

Keyword(s):

Markov Models ◽

Sequence Similarity ◽

Protein Homology ◽

Homology Detection ◽

Remote Homology ◽

Sensitivity Loss ◽

Computationally Intensive ◽

Profile Hmm ◽

Programming Algorithms ◽

Remote Homology Detection

Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases. Methods based on profile hidden Markov models (HMM) often exhibit relatively higher sensitivity for detecting remote homologies than commonly used approaches. However, calculating similarity scores in profile HMM methods is computationally intensive as they use dynamic programming algorithms. In this paper we introduce SHsearch: a new method for remote protein homology detection. Our method is implemented as a modification of HHsearch: a remote protein homology detection method based on comparing two profile HMMs. The motivation for modification was to reduce the run time of HHsearch significantly with minimal sensitivity loss. SHsearch focuses on comparing the important submodels of the query and database HMMs instead of comparing the complete models. Hence, SHsearch achieves a significant speedup over HHsearch with minimal loss in sensitivity. On SCOP 1.63, SHsearch achieved 88X speedup with 8.2% loss in sensitivity with respect to HHsearch at error rate of 10%, which deemed to be an acceptable tradeoff.

Download Full-text

SVM-BALSA: Remote homology detection based on Bayesian sequence alignment

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2005.09.006 ◽

2005 ◽

Vol 29 (6) ◽

pp. 440-443 ◽

Cited By ~ 16

Author(s):

Bobbie-Jo Webb-Robertson ◽

Christopher Oehmen ◽

Melissa Matzke

Keyword(s):

Sequence Alignment ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

Protein Remote Homology Detection by Combining Pseudo Dimer Composition with an Ensemble Learning Method

Current Proteomics ◽

10.2174/157016461302160514002939 ◽

2016 ◽

Vol 13 (2) ◽

pp. 86-91 ◽

Cited By ~ 7

Author(s):

Bin Liu ◽

Junjie Chen ◽

Shanyi Wang

Keyword(s):

Ensemble Learning ◽

Learning Method ◽

Homology Detection ◽

Remote Homology ◽

Remote Homology Detection

Download Full-text

A Comprehensive Analysis of Sequence Alignment Algorithms for LongRead Sequencing

Current Bioinformatics ◽

10.2174/1574893611666160115213144 ◽

2016 ◽

Vol 11 (3) ◽

pp. 375-381

Author(s):

Yu Zhang ◽

Jian Tai He ◽

Yangde Zhang ◽

Ke Zuo

Keyword(s):

Sequence Alignment ◽

Comprehensive Analysis ◽

Alignment Algorithms

Download Full-text

INTEGRATION OF n-GRAM LANGUAGE MODELS IN MULTIPLE CLASSIFIER SYSTEMS FOR OFFLINE HANDWRITTEN TEXT LINE RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006855 ◽

2008 ◽

Vol 22 (07) ◽

pp. 1301-1321 ◽

Cited By ~ 2

Author(s):

ROMAN BERTOLAMI ◽

HORST BUNKE

Keyword(s):

Language Model ◽

Language Models ◽

Combination Method ◽

Text Line ◽

Multiple Classifier Systems ◽

Classifier Systems ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Multiple Classifier ◽

N Gram

Current multiple classifier systems for unconstrained handwritten text recognition do not provide a straightforward way to utilize language model information. In this paper, we describe a generic method to integrate a statistical n-gram language model into the combination of multiple offline handwritten text line recognizers. The proposed method first builds a word transition network and then rescores this network with an n-gram language model. Experimental evaluation conducted on a large dataset of offline handwritten text lines shows that the proposed approach improves the recognition accuracy over a reference system as well as over the original combination method that does not include a language model.

Download Full-text

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

AI ◽

10.3390/ai2010001 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-16

Author(s):

Juan Cruz-Benito ◽

Sanjay Vishwakarma ◽

Francisco Martin-Fernandez ◽

Ismael Faro

Keyword(s):

Deep Learning ◽

Learning Community ◽

Programming Languages ◽

Language Processing ◽

Code Generation ◽

Language Model ◽

Language Models ◽

Stochastic Gradient Descent ◽

Network Architectures ◽

Learning Architectures

In recent years, the use of deep learning in language models has gained much attention. Some research projects claim that they can generate text that can be interpreted as human writing, enabling new possibilities in many application areas. Among the different areas related to language processing, one of the most notable in applying this type of modeling is programming languages. For years, the machine learning community has been researching this software engineering area, pursuing goals like applying different approaches to auto-complete, generate, fix, or evaluate code programmed by humans. Considering the increasing popularity of the deep learning-enabled language models approach, we found a lack of empirical papers that compare different deep learning architectures to create and use language models based on programming code. This paper compares different neural network architectures like Average Stochastic Gradient Descent (ASGD) Weight-Dropped LSTMs (AWD-LSTMs), AWD-Quasi-Recurrent Neural Networks (QRNNs), and Transformer while using transfer learning and different forms of tokenization to see how they behave in building language models using a Python dataset for code generation and filling mask tasks. Considering the results, we discuss each approach’s different strengths and weaknesses and what gaps we found to evaluate the language models or to apply them in a real programming context.

Download Full-text