scholarly journals An improved hidden Markov model for the characterization of homozygous-by-descent segments in individual genomes

2021 ◽  
Author(s):  
Tom Druet ◽  
Mathieu Gautier

Inbreeding results from the mating of related individuals and has negative consequences because it brings together deleterious variants in one individual. Genomic estimates of the inbreeding coefficients are preferred to pedigree-based estimators as they measure the realized inbreeding levels and they are more robust to pedigree errors. Several methods identifying homozygous-by-descent (HBD) segments with hidden Markov models (HMM) have been recently developed and are particularly valuable when the information is degraded or heterogeneous (e.g., low-fold sequencing, low marker density, heterogeneous genotype quality or variable marker spacing). We previously developed a multiple HBD class HMM where HBD segments are classified in different groups based on their length (e.g., recent versus old HBD segments) but we recently observed that for high inbreeding levels with many HBD segments, the estimated contributions might be biased towards more recent classes (i.e., associated with large HBD segments) although the overall estimated level of inbreeding remained unbiased. We herein propose an updated multiple HBD classes model in which the HBD classification is modeled in successive nested levels. In each level, the rate specifying the expected length of HBD segments, and that is directly related to the number of generations to the ancestors, is distinct. The non-HBD classes are now modeled as a mixture of HBD segments from later generations and shorter non-HBD segments (i.e., both with higher rates). The updated model had better statistical properties and performed better on simulated data compared to our previous version. We also show that the parameters of the model are easier to interpret and that the model is more robust to the choice of the number of classes. Overall, the new model results in an improved partitioning of inbreeding in different HBD classes and should be preferred in applications relying on the length of estimated HBD segments.

2018 ◽  
Vol 30 (1) ◽  
pp. 216-236
Author(s):  
Rasmus Troelsgaard ◽  
Lars Kai Hansen

Model-based classification of sequence data using a set of hidden Markov models is a well-known technique. The involved score function, which is often based on the class-conditional likelihood, can, however, be computationally demanding, especially for long data sequences. Inspired by recent theoretical advances in spectral learning of hidden Markov models, we propose a score function based on third-order moments. In particular, we propose to use the Kullback-Leibler divergence between theoretical and empirical third-order moments for classification of sequence data with discrete observations. The proposed method provides lower computational complexity at classification time than the usual likelihood-based methods. In order to demonstrate the properties of the proposed method, we perform classification of both simulated data and empirical data from a human activity recognition study.


2020 ◽  
Author(s):  
Malte Lüken ◽  
Šimon Kucharský ◽  
Ingmar Visser

Eye-tracking allows researchers to infer cognitive processes from eye movements that are classified into distinct events. Parsing the events is typically done by algorithms. Previous algorithms have successfully used hidden Markov models (HMMs) for classification but can still be improved in several aspects. To address these aspects, we developed gazeHMM, an algorithm that uses an HMM as a generative model, has no critical parameters to be set by users, and does not require human coded data as input. The algorithm classifies gaze data into fixations, saccades, and optionally postsaccadic oscillations and smooth pursuits. We evaluated gazeHMM’s performance in a simulation study, showing that it successfully recovered HMM parameters and hidden states. Parameters were less well recovered when we included a smooth pursuit state and/or added even small noise to simulated data. We applied generative models with different numbers of events to benchmark data. Comparing them indicated that HMMs with more events than expected had most likely generated the data. We also applied the full algorithm to benchmark data and assessed its similarity to human coding. For static stimuli, gazeHMM showed high similarity and outperformed other algorithms in this regard. For dynamic stimuli, gazeHMM tended to rapidly switch between fixations and smooth pursuits but still displayed higher similarity than other algorithms. Concluding that gazeHMM can be used in practice, we recommend parsing smooth pursuits only for exploratory purposes. Future HMM algorithms could use covariates to better capture eye movement processes and explicitly model event durations to classify smooth pursuits more accurately.


2013 ◽  
Author(s):  
Olivier Gimenez ◽  
Laetitia Blanc ◽  
Aurélien Besnard ◽  
Roger Pradel ◽  
Paul Doherty ◽  
...  

1. Occupancy – the proportion of area occupied by a species – is a key notion for addressing important questions in ecology, biogeography and conservation biology. Occupancy models allow estimating and inferring about species occurrence while accounting for false absences (or imperfect species detection). 2. Most occupancy models can be formulated as hidden Markov models (HMM) in which the state process captures the Markovian dynamic of the actual but latent states while the observation process consists of observations that are made from these underlying states. 3. We show how occupancy models can be implemented in program E-SURGE, which was initially developed to analyse capture-recapture data in the HMM framework. Replacing individuals by sites provides the user with access to several features of E-SURGE that are not available altogether or just not available in standard occupancy software: i) user-friendly model specification through a SAS/R-like syntax without having to write custom code, ii) decomposition of the observation and state processes in several steps to provide flexible parameterisation, iii) up-to-date diagnostics of model identifiability and iv) advanced numerical algorithms to produce fast and reliable results (including site random effects). 4. To illustrate E-SURGE features, we provide simulated data and the details of the implementation on the analysis of several occupancy models. These detailed examples are gathered in a companion wiki platform http://occupancyinesurge.wikidot.com/ .


2011 ◽  
Vol 23 (5) ◽  
pp. 1071-1132 ◽  
Author(s):  
Sean Escola ◽  
Alfredo Fontanini ◽  
Don Katz ◽  
Liam Paninski

Given recent experimental results suggesting that neural circuits may evolve through multiple firing states, we develop a framework for estimating state-dependent neural response properties from spike train data. We modify the traditional hidden Markov model (HMM) framework to incorporate stimulus-driven, non-Poisson point-process observations. For maximal flexibility, we allow external, time-varying stimuli and the neurons’ own spike histories to drive both the spiking behavior in each state and the transitioning behavior between states. We employ an appropriately modified expectation-maximization algorithm to estimate the model parameters. The expectation step is solved by the standard forward-backward algorithm for HMMs. The maximization step reduces to a set of separable concave optimization problems if the model is restricted slightly. We first test our algorithm on simulated data and are able to fully recover the parameters used to generate the data and accurately recapitulate the sequence of hidden states. We then apply our algorithm to a recently published data set in which the observed neuronal ensembles displayed multistate behavior and show that inclusion of spike history information significantly improves the fit of the model. Additionally, we show that a simple reformulation of the state space of the underlying Markov chain allows us to implement a hybrid half-multistate, half-histogram model that may be more appropriate for capturing the complexity of certain data sets than either a simple HMM or a simple peristimulus time histogram model alone.


2020 ◽  
Vol 43 (1) ◽  
pp. 71-82
Author(s):  
Sebastian George ◽  
Ambily Jose

The most suitable statistical method for explaining serial dependency in time series count data is that based on Hidden Markov Models (HMMs). These models assume that the observations are generated from a finite mixture of distributions governed by the principle of Markov chain (MC). Poisson-Hidden Markov Model (P-HMM) may be the most widely used method for modelling the above said situations. However, in real life scenario, this model cannot be considered as the best choice. Taking this fact into account, we, in this paper, go for Generalised Poisson Distribution (GPD) for modelling count data. This method can rectify the overdispersion and underdispersion in the Poisson model. Here, we develop Generalised Poisson Hidden Markov model (GP-HMM) by combining GPD with HMM for modelling such data. The results of the study on simulated data and an application of real data, monthly cases of Leptospirosis in the state of Kerala in South India, show good convergence properties, proving that the GP-HMM is a better method compared to P-HMM.


2013 ◽  
Author(s):  
Olivier Gimenez ◽  
Laetitia Blanc ◽  
Aurélien Besnard ◽  
Roger Pradel ◽  
Paul Doherty ◽  
...  

1. Occupancy – the proportion of area occupied by a species – is a key notion for addressing important questions in ecology, biogeography and conservation biology. Occupancy models allow estimating and inferring about species occurrence while accounting for false absences (or imperfect species detection). 2. Most occupancy models can be formulated as hidden Markov models (HMM) in which the state process captures the Markovian dynamic of the actual but latent states while the observation process consists of observations that are made from these underlying states. 3. We show how occupancy models can be implemented in program E-SURGE, which was initially developed to analyse capture-recapture data in the HMM framework. Replacing individuals by sites provides the user with access to several features of E-SURGE that are not available altogether or just not available in standard occupancy software: i) user-friendly model specification through a SAS/R-like syntax without having to write custom code, ii) decomposition of the observation and state processes in several steps to provide flexible parameterisation, iii) up-to-date diagnostics of model identifiability and iv) advanced numerical algorithms to produce fast and reliable results (including site random effects). 4. To illustrate E-SURGE features, we provide simulated data and the details of the implementation on the analysis of several occupancy models. These detailed examples are gathered in a companion wiki platform http://occupancyinesurge.wikidot.com/ .


Sign in / Sign up

Export Citation Format

Share Document