Complex ecological phenotypes on phylogenetic trees: a hidden Markov model for comparative analysis of multivariate count data

Mapping Intimacies ◽

10.1101/640334 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael C. Grundler ◽

Daniel L. Rabosky

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Resource Use ◽

Count Data ◽

Phylogenetic Trees ◽

Hidden Markov ◽

Categorical Variables ◽

Individual Species ◽

Ecological Traits ◽

Sampling Variation

ABSTRACTThe evolutionary dynamics of complex ecological traits – including multistate representations of diet, habitat, and behavior – remain poorly understood. Reconstructing the tempo, mode, and historical sequence of transitions involving such traits poses many challenges for comparative biologists, owing to their multidimensional nature and intraspecific variability. Continuous-time Markov chains (CTMC) are commonly used to model ecological niche evolution on phylogenetic trees but are limited by the assumption that taxa are monomorphic and that states are univariate categorical variables. Thus, a necessary first step when using standard CTMC models is to categorize species into a pre-determined number of ecological states. This approach potentially confounds interpretation of state assignments with effects of sampling variation because it does not directly incorporate empirical observations of resource use into the statistical inference model. The neglect of sampling variation, along with univariate representations of true multivariate phenotypes, potentially leads to the distortion and loss of information, with substantial implications for downstream macroevolutionary analyses. In this study, we develop a hidden Markov model using a Dirichlet-multinomial framework to model resource use evolution on phylogenetic trees. Unlike existing CTMC implementations, states are unobserved probability distributions from which observed data are sampled. Our approach is expressly designed to model ecological traits that are intra-specifically variable and to account for uncertainty in state assignments of terminal taxa arising from effects of sampling variation. The method uses multivariate count data for individual species to simultaneously infer the number of ecological states, the proportional utilization of different resources by different states, and the phylogenetic distribution of ecological states among living species and their ancestors. The method is general and may be applied to any data expressible as a set of observational counts from different categories.

Complex Ecological Phenotypes on Phylogenetic Trees: A Markov Process Model for Comparative Analysis of Multivariate Count Data

Systematic Biology ◽

10.1093/sysbio/syaa031 ◽

2020 ◽

Vol 69 (6) ◽

pp. 1200-1211

Author(s):

Michael Grundler ◽

Daniel L Rabosky

Keyword(s):

Count Data ◽

Ecological Niche ◽

Phylogenetic Trees ◽

Process Model ◽

Evolutionary Dynamics ◽

Categorical Variables ◽

Individual Species ◽

Niche Evolution ◽

Ecological Traits ◽

Sampling Variation

Abstract The evolutionary dynamics of complex ecological traits—including multistate representations of diet, habitat, and behavior—remain poorly understood. Reconstructing the tempo, mode, and historical sequence of transitions involving such traits poses many challenges for comparative biologists, owing to their multidimensional nature. Continuous-time Markov chains are commonly used to model ecological niche evolution on phylogenetic trees but are limited by the assumption that taxa are monomorphic and that states are univariate categorical variables. A necessary first step in the analysis of many complex traits is therefore to categorize species into a predetermined number of univariate ecological states, but this procedure can lead to distortion and loss of information. This approach also confounds interpretation of state assignments with effects of sampling variation because it does not directly incorporate empirical observations for individual species into the statistical inference model. In this study, we develop a Dirichlet-multinomial framework to model resource use evolution on phylogenetic trees. Our approach is expressly designed to model ecological traits that are multidimensional and to account for uncertainty in state assignments of terminal taxa arising from effects of sampling variation. The method uses multivariate count data across a set of discrete resource categories sampled for individual species to simultaneously infer the number of ecological states, the proportional utilization of different resources by different states, and the phylogenetic distribution of ecological states among living species and their ancestors. The method is general and may be applied to any data expressible as a set of observational counts from different categories. [Comparative methods; Dirichlet multinomial; ecological niche evolution; macroevolution; Markov model.]

Hidden Markov model in multiple testing on dependent count data

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2019.1710507 ◽

2020 ◽

Vol 90 (5) ◽

pp. 889-906

Author(s):

Weizhe Su ◽

Xia Wang

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Count Data ◽

Multiple Testing ◽

Hidden Markov

ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function

BMC Bioinformatics ◽

10.1186/s12859-019-3132-7 ◽

2019 ◽

Vol 20 (S18) ◽

Cited By ~ 1

Author(s):

Qing Zhan ◽

Nan Wang ◽

Shuilin Jin ◽

Renjie Tan ◽

Qinghua Jiang ◽

...

Keyword(s):

Partition Function ◽

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Trees ◽

Hidden Markov ◽

Particle Swarm ◽

Alignment Algorithm ◽

Multiple Sequence

Abstract Background During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment’s accuracy, however, was ignored by these researches. Results A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM’s parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. Conclusions We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment’s accuracy.

Profile Comparer Extended: phylogeny of lytic polysaccharide monooxygenase families using profile hidden Markov model alignments

F1000Research ◽

10.12688/f1000research.21104.1 ◽

2019 ◽

Vol 8 ◽

pp. 1834 ◽

Cited By ~ 3

Author(s):

Gerben P. Voshol ◽

Peter J. Punt ◽

Erik Vijgenboom

Keyword(s):

Markov Model ◽

Phylogenetic Tree ◽

Hidden Markov Model ◽

Family Relationships ◽

Phylogenetic Trees ◽

Hidden Markov ◽

Lytic Polysaccharide Monooxygenase ◽

Profile Hidden Markov Model ◽

Polysaccharide Monooxygenase ◽

Intra Family

Insight into the inter- and intra-family relationship of protein families is important, since it can aid understanding of substrate specificity evolution and assign putative functions to proteins with unknown function. To study both these inter- and intra-family relationships, the ability to build phylogenetic trees using the most sensitive sequence similarity search methods (e.g. profile hidden Markov model (pHMM)–pHMM alignments) is required. However, existing solutions require a very long calculation time to obtain the phylogenetic tree. Therefore, a faster protocol is required to make this approach efficient for research. To contribute to this goal, we extended the original Profile Comparer program (PRC) for the construction of large pHMM phylogenetic trees at speeds several orders of magnitude faster compared to pHMM-tree. As an example, PRC Extended (PRCx) was used to study the phylogeny of over 10,000 sequences of lytic polysaccharide monooxygenase (LPMO) from over seven families. Using the newly developed program we were able to reveal previously unknown homologs of LPMOs, namely the PFAM Egh16-like family. Moreover, we show that the substrate specificities have evolved independently several times within the LPMO superfamily. Furthermore, the LPMO phylogenetic tree, does not seem to follow taxonomy-based classification.

Generalized Poisson Hidden Markov Model for Overdispersed or Underdispersed Count Data

Revista Colombiana de Estadística ◽

10.15446/rce.v43n1.77542 ◽

2020 ◽

Vol 43 (1) ◽

pp. 71-82

Author(s):

Sebastian George ◽

Ambily Jose

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Count Data ◽

Markov Models ◽

Hidden Markov ◽

Real Life ◽

Simulated Data ◽

Real Data ◽

Good Convergence ◽

Finite Mixture Of Distributions

The most suitable statistical method for explaining serial dependency in time series count data is that based on Hidden Markov Models (HMMs). These models assume that the observations are generated from a finite mixture of distributions governed by the principle of Markov chain (MC). Poisson-Hidden Markov Model (P-HMM) may be the most widely used method for modelling the above said situations. However, in real life scenario, this model cannot be considered as the best choice. Taking this fact into account, we, in this paper, go for Generalised Poisson Distribution (GPD) for modelling count data. This method can rectify the overdispersion and underdispersion in the Poisson model. Here, we develop Generalised Poisson Hidden Markov model (GP-HMM) by combining GPD with HMM for modelling such data. The results of the study on simulated data and an application of real data, monthly cases of Leptospirosis in the state of Kerala in South India, show good convergence properties, proving that the GP-HMM is a better method compared to P-HMM.

Traffic Analysis Of Anonymity Protocol Using Hidden Markov Model (hmm)-based Model Confidence In The Media

10.33811/1847-003-013-010 ◽

2018 ◽

pp. 187

Author(s):

قاسم عبود مهدى

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Traffic Analysis ◽

Model Confidence ◽

The Media

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

International Journal of Scientific Research ◽

10.15373/22778179/june2013/66 ◽

2012 ◽

Vol 2 (6) ◽

pp. 208-211

Author(s):

Navjot Kaur ◽

◽

Rajbir Singh Cheema ◽

Harmandeep Singh Harmandeep Singh

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Profile Analysis ◽

Hidden Markov ◽

Protein Family ◽

Multiple Sequence

Auscultating Diagnosis for Hemodialysis Shunt Stenosis using a Self-Organizing Map and Hidden Markov Model

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.1589 ◽

2012 ◽

Vol 132 (10) ◽

pp. 1589-1594 ◽

Cited By ~ 2

Author(s):

Hayato Waki ◽

Yutaka Suzuki ◽

Osamu Sakata ◽

Mizuya Fukasawa ◽

Hatsuhiro Kato

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Self Organizing Map ◽

Self Organizing

A Fast Voice Command Recognition Algorithm Based on the Hidden Markov Model Stationary Distribution

Vestnik MEI ◽

10.24160/1993-6982-2018-5-65-72 ◽

2018 ◽

Vol 5 (5) ◽

pp. 65-72

Author(s):

Pavel A. Paramonov ◽

◽

Ivan V. Ognev ◽

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Stationary Distribution ◽

Hidden Markov ◽

Recognition Algorithm ◽

Voice Command

Engaging Voluntary Contributions in Online Communities: A Hidden Markov Model

MIS Quarterly ◽

10.25300/misq/2018/14196 ◽

2018 ◽

Vol 42 (1) ◽

pp. 83-100 ◽

Cited By ~ 21

Author(s):

Wei Chen ◽

◽

Xiahua Wei ◽

Kevin Xiaoguo Zhu ◽

◽

...

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Online Communities ◽

Hidden Markov ◽

Voluntary Contributions