Hidden Markov models in biological sequence analysis

Abstract Summary Hidden Markov models (HMMs) and profile HMMs form an integral part of biological sequence analysis, supporting an ever-growing list of applications. The aphid R package can be used to derive, train, plot, import and export HMMs and profile HMMs in the R environment. Computationally-intensive dynamic programing recursions, such as the Viterbi, forward and backward algorithms are implemented in C++ and parallelized for increased speed and efficiency. Availability and implementation The aphid package is released under the GPL-3 license, and is freely available for download from CRAN and GitHub (https://github.com/shaunpwilkinson/aphid). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

JUCHMME: a Java Utility for Class Hidden Markov Models and Extensions for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/btz533 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5309-5312

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Georgios N Tsaousis ◽

...

Keyword(s):

Hidden Markov Models ◽

Open Source Software ◽

Markov Models ◽

Hidden Markov ◽

Research Community ◽

Supplementary Information ◽

Biological Sequence ◽

Large Collection ◽

Biological Sequence Analysis ◽

Open Source Software Package

Abstract Summary JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. Availability and implementation http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Learning Algorithms of Coupled Discrete Hidden Markov Models

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.2106 ◽

2013 ◽

Vol 411-414 ◽

pp. 2106-2110

Author(s):

Shi Ping Du ◽

Jian Wang ◽

Yu Ming Wei

Keyword(s):

Hidden Markov Models ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Process Models ◽

State Variables ◽

Biological Sequence ◽

Training Problem ◽

Biological Sequence Analysis ◽

Coupled Hidden Markov Models

A hidden Markov model (HMM) encompasses a large class of stochastic process models and has been successfully applied to a number of scientific and engineering problems, including speech and other pattern recognition problems, and biological sequence analysis. A major restriction is found, however, in conventional HMM, i.e., it is ill-suited to capture the interactions among different models. A variety of coupled hidden Markov models (CHMMs) have recently been proposed as extensions of HMM to better characterize multiple interdependent sequences. The resulting models have multiple state variables that are temporally coupled via matrices of conditional probabilities. This paper study is focused on the coupled discrete HMM, there are two state variables in the network. By generalizing forward-backward algorithm, Viterbi algorithm and Baum-Welch algorithm commonly used in conventional HMM to accommodate two state variables, several new formulae solving the 2-chain coupled discrete HMM probability evaluation, decoding and training problem are theoretically derived.

Download Full-text

cswHMM: A NOVEL CONTEXT SWITCHING HIDDEN MARKOV MODEL FOR BIOLOGICAL SEQUENCE ANALYSIS

Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms ◽

10.5220/0003780902080213 ◽

2012 ◽

Keyword(s):

Sequence Analysis ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Context Switching

Download Full-text

Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/bty910 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2208-2215 ◽

Cited By ~ 5

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Pantelis G Bagos

Keyword(s):

Sequence Analysis ◽

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Training Data ◽

Supplementary Information ◽

Training Procedure ◽

Partially Labeled Data

Abstract Motivation Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text