scholarly journals Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression

2015 ◽  
Author(s):  
John Wiedenhoeft ◽  
Eric Brugel ◽  
Alexander Schliep

AbstractBy combining Haar wavelets with Bayesian Hidden Markov Models, we improve detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. At the same time, we achieve drastically reduced running times, as the method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://bioinformatics.rutgers.edu/Software/HaMMLET/. The web supplement is at http://bioinformatics.rutgers.edu/Supplements/HaMMLET/.Author SummaryIdentifying large-scale genome deletions and duplications, or copy number variants (CNV), accurately in populations or individual patients is a crucial step in indicating disease factors or diagnosing an individual patient's disease type. Hidden Markov Models (HMM) are a type of statistical model widely used for CNV detection, as well as other biological applications such as the analysis of gene expression time course data or the analysis of discrete-valued DNA and protein sequences.As with many statistical models, there are two fundamentally different inference approaches. In the frequentist framework, a single estimate of the model parameters would be used as a basis for subsequent inference, making the identification of CNV dependent on the quality of that estimate. This is an acute problem for HMM as methods for finding globally optimal parameters are not known. Alternatively, one can use a Bayesian approach and integrate over all possible parameter choices. While the latter is known to lead to significantly better results, the much—up to hundreds of times—larger computational effort prevents wide adaptation so far.Our proposed method addresses this by combining Haar wavelets and HMM. We greatly accelerate fully Bayesian HMMs, while simultaneously increasing convergence and thus the accuracy of the Gibbs sampler used for Bayesian computations, leading to substantial improvements over the state-of-the-art.

2017 ◽  
Vol 33 (8) ◽  
pp. 2765-2779 ◽  
Author(s):  
António Simões ◽  
José Manuel Viegas ◽  
José Torres Farinha ◽  
Inácio Fonseca

Author(s):  
Xiaoqiang Wang ◽  
Emilie Lebarbier ◽  
Julie Aubert ◽  
Stéphane Robin

Abstract Hidden Markov models provide a natural statistical framework for the detection of the copy number variations (CNV) in genomics. In this context, we define a hidden Markov process that underlies all individuals jointly in order to detect and to classify genomics regions in different states (typically, deletion, normal or amplification). Structural variations from different individuals may be dependent. It is the case in agronomy where varietal selection program exists and species share a common phylogenetic past. We propose to take into account these dependencies inthe HMM model. When dealing with a large number of series, maximum likelihood inference (performed classically using the EM algorithm) becomes intractable. We thus propose an approximate inference algorithm based on a variational approach (VEM), implemented in the CHMM R package. A simulation study is performed to assess the performance of the proposed method and an application to the detection of structural variations in plant genomes is presented.


2003 ◽  
Vol 7 (5) ◽  
pp. 652-667 ◽  
Author(s):  
M. F. Lambert ◽  
J. P. Whiting ◽  
A. V. Metcalfe

Abstract. Hidden Markov models (HMMs) can allow for the varying wet and dry cycles in the climate without the need to simulate supplementary climate variables. The fitting of a parametric HMM relies upon assumptions for the state conditional distributions. It is shown that inappropriate assumptions about state conditional distributions can lead to biased estimates of state transition probabilities. An alternative non-parametric model with a hidden state structure that overcomes this problem is described. It is shown that a two-state non-parametric model produces accurate estimates of both transition probabilities and the state conditional distributions. The non-parametric model can be used directly or as a technique for identifying appropriate state conditional distributions to apply when fitting a parametric HMM. The non-parametric model is fitted to data from ten rainfall stations and four streamflow gauging stations at varying distances inland from the Pacific coast of Australia. Evidence for hydrological persistence, though not mathematical persistence, was identified in both rainfall and streamflow records, with the latter showing hidden states with longer sojourn times. Persistence appears to increase with distance from the coast. Keywords: Hidden Markov models, non-parametric, two-state model, climate states, persistence, probability distributions


2017 ◽  
Vol 23 (4) ◽  
Author(s):  
Abdelaziz Nasroallah ◽  
Karima Elkimakh

AbstractOne of the most used variants of hidden Markov models (HMMs) is the standard case where the time is discrete and the state spaces (hidden and observed spaces) are finite. In this framework, we are interested in HMMs whose emission process results from a combination of independent Markov chains. Principally, we assume that the emission process evolves as follows: given a hidden state realization


Sign in / Sign up

Export Citation Format

Share Document