Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models

Abstract Motivation Simple tandem repeats, microsatellites in particular, have regulatory functions, links to several diseases, and applications in biotechnology. There is an immediate need for an accurate tool for detecting microsatellites in newly sequenced genomes. The current available tools are either sensitive or specific but not both; some tools require adjusting parameters manually. Results We propose Look4TRs, the first application of self-supervised hidden Markov models to discovering microsatellites. Look4TRs adapts itself to the input genomes, balancing high sensitivity and low false positive rate. It auto-calibrates itself. We evaluated Look4TRs on 26 eukaryotic genomes. Based on F measure, which combines sensitivity and false positive rate, Look4TRs outperformed TRF and MISA —the most widely-used tools—by 78% and 84%. Look4TRs outperformed the second and the third best tools, MsDetector and Tantan by 17% and 34%. On eight bacterial genomes, Look4TRs outperformed the second and the third best tools by 27% and 137%. Availability https://github.com/TulsaBioinformaticsToolsmith/Look4TRs Supplementary information Supplementary data are available at Bioinformatics online and on https://drive.google.com/open?id=1cIcS7Gvj0wj1B81-rnTU_OAG3IiNH54Y.

Download Full-text

Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models

10.1101/449801 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alfredo Velasco ◽

Benjamin T. James ◽

Vincent D. Wells ◽

Hani Z. Girgis

Keyword(s):

Hidden Markov Models ◽

False Positive ◽

Tandem Repeats ◽

Markov Models ◽

De Novo ◽

Hidden Markov ◽

False Positive Rate ◽

High Sensitivity ◽

Positive Rate ◽

Simple Tandem Repeats

ABSTRACTSimple tandem repeats, microsatellites in particular, have regulatory functions, links to several diseases, and applications in biotechnology. Sequences of thousands of species will be available soon. There is immediate need for an accurate tool for detecting microsatellites in the new genomes. The current available tools have limitations. As a remedy, we proposed Look4TRs, which is the first application of self-supervised hidden Markov models to discovering microsatellites. It adapts itself to the input genomes, balancing high sensitivity and low false positive rate. It auto-calibrates itself, freeing the user from adjusting the parameters manually, leading to consistent results across different studies. We evaluated Look4TRs on eight genomes. Based on F-measure, which combines sensitivity and false positive rate, Look4TRs outperformed TRF and MISA — the most widely-used tools — by 106% and 82%. Look4TRs outperformed the second best tool, MsDetector or Tantan, by 11%. Look4TRs represents technical advances in the annotation of microsatellites.

Download Full-text

Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/bty910 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2208-2215 ◽

Cited By ~ 5

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Pantelis G Bagos

Keyword(s):

Sequence Analysis ◽

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Training Data ◽

Supplementary Information ◽

Training Procedure ◽

Partially Labeled Data

Abstract Motivation Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

aphid: an R package for analysis with profile hidden Markov models

Bioinformatics ◽

10.1093/bioinformatics/btz159 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3829-3830 ◽

Cited By ~ 8

Author(s):

Shaun P Wilkinson

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

R Package ◽

Supplementary Information ◽

Biological Sequence ◽

Profile Hmms ◽

Biological Sequence Analysis ◽

Import And Export ◽

Computationally Intensive

Abstract Summary Hidden Markov models (HMMs) and profile HMMs form an integral part of biological sequence analysis, supporting an ever-growing list of applications. The aphid R package can be used to derive, train, plot, import and export HMMs and profile HMMs in the R environment. Computationally-intensive dynamic programing recursions, such as the Viterbi, forward and backward algorithms are implemented in C++ and parallelized for increased speed and efficiency. Availability and implementation The aphid package is released under the GPL-3 license, and is freely available for download from CRAN and GitHub (https://github.com/shaunpwilkinson/aphid). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

JUCHMME: a Java Utility for Class Hidden Markov Models and Extensions for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/btz533 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5309-5312

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Georgios N Tsaousis ◽

...

Keyword(s):

Hidden Markov Models ◽

Open Source Software ◽

Markov Models ◽

Hidden Markov ◽

Research Community ◽

Supplementary Information ◽

Biological Sequence ◽

Large Collection ◽

Biological Sequence Analysis ◽

Open Source Software Package

Abstract Summary JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. Availability and implementation http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improving diagnosis of acute appendicitis with atypical findings by Tc-99m HMPAO leukocyte scan

Nuklearmedizin ◽

10.1055/s-0038-1623994 ◽

2002 ◽

Vol 41 (01) ◽

pp. 37-41 ◽

Cited By ~ 3

Author(s):

S. Shung-Shung ◽

S. Yu-Chien ◽

Y. Mei-Due ◽

W. Hwei-Chung ◽

A. Kao

Keyword(s):

Acute Appendicitis ◽

False Positive ◽

False Positive Rate ◽

Accurate Method ◽

Clinical Findings ◽

Pathological Findings ◽

Lower Quadrant ◽

Predictive Values ◽

Positive Rate

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.

Download Full-text

Predicting Fetal Chromosome Anomalies in the First Trimester Using Pregnancy Associated Plasma Protein-A: A Comparison of Statistical Methods

Methods of Information in Medicine ◽

10.1055/s-0038-1634910 ◽

1993 ◽

Vol 32 (02) ◽

pp. 175-179 ◽

Cited By ~ 7

Author(s):

B. Brambati ◽

T. Chard ◽

J. G. Grudzinskas ◽

M. C. M. Macintosh

Keyword(s):

Logistic Regression ◽

General Population ◽

Likelihood Ratio ◽

False Positive ◽

False Positive Rate ◽

Ratio Method ◽

Detection Rates ◽

Gaussian Distributions ◽

Positive Rate ◽

Likelihood Ratio Method

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.

Download Full-text

Estimating Personality Impression from Speech Record Using Hidden Markov Models

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.1517 ◽

2015 ◽

Vol 135 (12) ◽

pp. 1517-1523 ◽

Cited By ~ 1

Author(s):

Yicheng Jin ◽

Takuto Sakuma ◽

Shohei Kato ◽

Tsutomu Kunitachi

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov

Download Full-text

Hidden Markov Processes

10.23943/princeton/9780691133157.001.0001 ◽

2014 ◽

Cited By ~ 2

Author(s):

M. Vidyasagar

Keyword(s):

Hidden Markov Models ◽

Markov Processes ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Local Alignment ◽

Biological Applications ◽

Standard Material ◽

Hidden Markov Processes ◽

Genomics And Proteomics

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.

Download Full-text