JUCHMME: a Java Utility for Class Hidden Markov Models and Extensions for biological sequence analysis

Abstract Summary JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. Availability and implementation http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

aphid: an R package for analysis with profile hidden Markov models

Bioinformatics ◽

10.1093/bioinformatics/btz159 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3829-3830 ◽

Cited By ~ 8

Author(s):

Shaun P Wilkinson

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

R Package ◽

Supplementary Information ◽

Biological Sequence ◽

Profile Hmms ◽

Biological Sequence Analysis ◽

Import And Export ◽

Computationally Intensive

Abstract Summary Hidden Markov models (HMMs) and profile HMMs form an integral part of biological sequence analysis, supporting an ever-growing list of applications. The aphid R package can be used to derive, train, plot, import and export HMMs and profile HMMs in the R environment. Computationally-intensive dynamic programing recursions, such as the Viterbi, forward and backward algorithms are implemented in C++ and parallelized for increased speed and efficiency. Availability and implementation The aphid package is released under the GPL-3 license, and is freely available for download from CRAN and GitHub (https://github.com/shaunpwilkinson/aphid). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Hidden Markov models in biological sequence analysis

IBM Journal of Research and Development ◽

10.1147/rd.453.0449 ◽

2001 ◽

Vol 45 (3.4) ◽

pp. 449-454 ◽

Cited By ~ 43

Author(s):

E. Birney

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis

Download Full-text

High Speed Biological Sequence Analysis With Hidden Markov Models on Reconfigurable Platforms

IEEE Transactions on Information Technology in Biomedicine ◽

10.1109/titb.2007.904632 ◽

2009 ◽

Vol 13 (5) ◽

pp. 740-746 ◽

Cited By ~ 13

Author(s):

T.F. Oliver ◽

B. Schmidt ◽

Y. Jakop ◽

D.L. Maskell

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

High Speed ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Reconfigurable Platforms

Download Full-text

Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis

AI 2008: Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-89378-3_27 ◽

2008 ◽

pp. 278-288 ◽

Cited By ~ 1

Author(s):

Stefan Mutter ◽

Bernhard Pfahringer ◽

Geoffrey Holmes

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Profile Hidden Markov Models

Download Full-text

High Speed Biological Sequence Analysis with Hidden Markov Models on Reconfigurable Platforms

IEEE Transactions on Information Technology in Biomedicine ◽

10.1109/titb.2008.917898 ◽

2009 ◽

Author(s):

T. Oliver ◽

B. Schmidt ◽

Y. Jakop ◽

D. Maskell

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

High Speed ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Reconfigurable Platforms

Download Full-text

Hidden Markov Models and their Applications in Biological Sequence Analysis

Current Genomics ◽

10.2174/138920209789177575 ◽

2009 ◽

Vol 10 (6) ◽

pp. 402-415 ◽

Cited By ~ 127

Author(s):

Byung-Jun Yoon

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis

Download Full-text

Biological Sequence Analysis with Hidden Markov Models on an FPGA

Advances in Computer Systems Architecture - Lecture Notes in Computer Science ◽

10.1007/11572961_34 ◽

2005 ◽

pp. 429-439

Author(s):

Jacop Yanto ◽

Timothy F. Oliver ◽

Bertil Schmidt ◽

Douglas L. Maskell

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis

Download Full-text

The Learning Algorithms of Coupled Discrete Hidden Markov Models

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.2106 ◽

2013 ◽

Vol 411-414 ◽

pp. 2106-2110

Author(s):

Shi Ping Du ◽

Jian Wang ◽

Yu Ming Wei

Keyword(s):

Hidden Markov Models ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Process Models ◽

State Variables ◽

Biological Sequence ◽

Training Problem ◽

Biological Sequence Analysis ◽

Coupled Hidden Markov Models

A hidden Markov model (HMM) encompasses a large class of stochastic process models and has been successfully applied to a number of scientific and engineering problems, including speech and other pattern recognition problems, and biological sequence analysis. A major restriction is found, however, in conventional HMM, i.e., it is ill-suited to capture the interactions among different models. A variety of coupled hidden Markov models (CHMMs) have recently been proposed as extensions of HMM to better characterize multiple interdependent sequences. The resulting models have multiple state variables that are temporally coupled via matrices of conditional probabilities. This paper study is focused on the coupled discrete HMM, there are two state variables in the network. By generalizing forward-backward algorithm, Viterbi algorithm and Baum-Welch algorithm commonly used in conventional HMM to accommodate two state variables, several new formulae solving the 2-chain coupled discrete HMM probability evaluation, decoding and training problem are theoretically derived.

Download Full-text

Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/bty910 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2208-2215 ◽

Cited By ~ 5

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Pantelis G Bagos

Keyword(s):

Sequence Analysis ◽

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Training Data ◽

Supplementary Information ◽

Training Procedure ◽

Partially Labeled Data

Abstract Motivation Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Using evolutionary Expectation Maximization to estimate indel rates

Bioinformatics ◽

10.1093/bioinformatics/bti177 ◽

2005 ◽

Vol 21 (10) ◽

pp. 2294-2300 ◽

Cited By ~ 21

Author(s):

Ian Holmes

Keyword(s):

Em Algorithm ◽

Hidden Markov Models ◽

Expectation Maximization ◽

Phylogenetic Trees ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Stochastic Grammars

Abstract Motivation The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact [email protected]

Download Full-text