Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression

AbstractBy combining Haar wavelets with Bayesian Hidden Markov Models, we improve detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. At the same time, we achieve drastically reduced running times, as the method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://bioinformatics.rutgers.edu/Software/HaMMLET/. The web supplement is at http://bioinformatics.rutgers.edu/Supplements/HaMMLET/.Author SummaryIdentifying large-scale genome deletions and duplications, or copy number variants (CNV), accurately in populations or individual patients is a crucial step in indicating disease factors or diagnosing an individual patient's disease type. Hidden Markov Models (HMM) are a type of statistical model widely used for CNV detection, as well as other biological applications such as the analysis of gene expression time course data or the analysis of discrete-valued DNA and protein sequences.As with many statistical models, there are two fundamentally different inference approaches. In the frequentist framework, a single estimate of the model parameters would be used as a basis for subsequent inference, making the identification of CNV dependent on the quality of that estimate. This is an acute problem for HMM as methods for finding globally optimal parameters are not known. Alternatively, one can use a Bayesian approach and integrate over all possible parameter choices. While the latter is known to lead to significantly better results, the much—up to hundreds of times—larger computational effort prevents wide adaptation so far.Our proposed method addresses this by combining Haar wavelets and HMM. We greatly accelerate fully Bayesian HMMs, while simultaneously increasing convergence and thus the accuracy of the Gibbs sampler used for Bayesian computations, leading to substantial improvements over the state-of-the-art.

Download Full-text

Bayesian inference and state number determination for hidden Markov models: an application to the information content of the yield curve about inflation

Journal of Econometrics ◽

10.1016/j.jeconom.2003.12.010 ◽

2004 ◽

Vol 123 (2) ◽

pp. 327-344 ◽

Cited By ~ 19

Author(s):

Nicolas Chopin ◽

Florian Pelgrin

Keyword(s):

Bayesian Inference ◽

Hidden Markov Models ◽

Information Content ◽

Markov Models ◽

Yield Curve ◽

Hidden Markov

Download Full-text

Καινοτόμες μεθοδολογίες μηχανικής μαθήσεως

10.12681/eadd/17396 ◽

2008 ◽

Author(s):

Σωτήριος Χατζής

Keyword(s):

Bayesian Inference ◽

Gaussian Process ◽

Hidden Markov Models ◽

Fuzzy Clustering ◽

Markov Models ◽

Hidden Markov ◽

Process Models ◽

Variational Bayesian Inference ◽

Gaussian Process Models ◽

Statistical Clustering

Στόχος της διδακτορικής αυτής διατριβής ήταν η ενδελεχής μελέτη των μεθοδολογιών μηχανικής μάθησης, και η χάραξη νέων δρόμων στον χώρο, με την εισαγωγή πρωτοτύπων μεθοδολογιών και καινοτόμων επαναστατικών θεωρήσεων αναγνώρισης προτύπων. Μεγάλη έμφαση εδόθη στις τεχνικές Variational Bayesian inference, που κατά την γνώμη του συγγραφέως αποτελούν το αύριο των μεθοδολογιών αναγνώρισης προτύπων των βασισμένων σε προσεγγίσεις statistical clustering, με συνεισφορά ενός πρωτότυπου μοντέλου εύρωστης αναγνώρισης προτύπων για πολυδιάστατα δεδομένα, καθώς και οι μεθοδολογίες fuzzy clustering. Σε αυτό τον τελευταίο χώρο εντοπίζεται και η μεγαλύτερη και σημαντικότερη συνεισφορά της παρούσης διατριβής, με την εισαγωγή μιας νέας θεώρησης του τι είναι fuzzy clustering, υπό την έννοια του τι εργασίες μηχανικής μάθησης μπορεί κανείς να περαιώσει με χρήση fuzzy clustering, κατά την οποία ο αλγόριθμος FCM αναδεικνύεται σε μια πλεονεκτηματική εναλλακτική του ΕΜ αλγορίθμου (και λοιπών statistical clustering προσεγγίσεων) για την εκπαίδευση πολλών μορφών πιθανοτικών παραγωγικών μοντέλων. Πλέον αυτών, η εργασία αυτή παρείχε ακόμα ένα καινοτόμο αλγόριθμο hidden Markov models, προσφέρων εξαιρετικά πλεονεκτήματα σε ένα πολύ μεγάλο εύρος εφαρμογών σε σχέση με τις σημερινές τεχνικές, και τέλος, μια νέα μέθοδο ταυτοποίησης ομιλητή, στηριγμένη σε Gaussian process models.

Download Full-text

Variational Inference for Coupled Hidden Markov Models Applied to the Joint Detection of Copy Number Variations

The International Journal of Biostatistics ◽

10.1515/ijb-2018-0023 ◽

2019 ◽

Vol 15 (1) ◽

Cited By ~ 1

Author(s):

Xiaoqiang Wang ◽

Emilie Lebarbier ◽

Julie Aubert ◽

Stéphane Robin

Keyword(s):

Hidden Markov Models ◽

Copy Number ◽

Markov Models ◽

Hidden Markov ◽

R Package ◽

Copy Number Variations ◽

Structural Variations ◽

Statistical Framework ◽

Coupled Hidden Markov Models ◽

Hidden Markov Process

Abstract Hidden Markov models provide a natural statistical framework for the detection of the copy number variations (CNV) in genomics. In this context, we define a hidden Markov process that underlies all individuals jointly in order to detect and to classify genomics regions in different states (typically, deletion, normal or amplification). Structural variations from different individuals may be dependent. It is the case in agronomy where varietal selection program exists and species share a common phylogenetic past. We propose to take into account these dependencies inthe HMM model. When dealing with a large number of series, maximum likelihood inference (performed classically using the EM algorithm) becomes intractable. We thus propose an approximate inference algorithm based on a variational approach (VEM), implemented in the CHMM R package. A simulation study is performed to assess the performance of the proposed method and an application to the detection of structural variations in plant genomes is presented.

Download Full-text