scholarly journals No-U-Turn sampling for phylogenetic trees

2021 ◽  
Author(s):  
Johannes Wahle

The inference of phylogenetic trees from sequence data has become a staple in evolutionary research. Bayesian inference of such trees is predominantly based on the Metropolis-Hastings algorithm. For high dimensional and correlated data this algorithm is known to be inefficient. There are gradient based algorithms to speed up such inference. Building on recent research which uses gradient based approaches for the inference of phylogenetic trees in a Bayesian framework, I present an algorithm which is capable of performing No-U-Turn sampling for phylogenetic trees. As an extension to Hamiltonian Monte Carlo methods, No-U-Turn sampling comes with the same benefits, such as proposing distant new states with a high acceptance probability, but eliminates the need to manually tune hyper parameters. Evaluated on real data sets, the new sampler shows that it converges faster to the target distribution. The results also indicate that a higher number of topologies are traversed during sampling by the new algorithm in comparison to traditional Markov Chain Monte Carlo approaches. This new algorithm leads to a more efficient exploration of the posterior distribution of phylogenetic tree topologies.

2015 ◽  
Vol 2015 ◽  
pp. 1-13
Author(s):  
Jianwei Ding ◽  
Yingbo Liu ◽  
Li Zhang ◽  
Jianmin Wang

Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. The main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model.


2006 ◽  
Vol 63 (3) ◽  
pp. 576-596 ◽  
Author(s):  
Jerome Pella ◽  
Michele Masuda

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.


2013 ◽  
Vol 58 (2) ◽  
Author(s):  
Yanzhen Bu ◽  
Hongxing Niu ◽  
Luping Zhang

AbstractSeven species of Cylicocyclus Ihle, 1922 (Nematoda: Strongylidae) were collected from donkeys from Henan Province, China. Five samples of each species were selected for sequencing. Sixteen different internal transcribed spacer (ITS) sequences representing the seven species of Cylicocyclus were obtained. Sequence differences in the first internal transcribed spacer (ITS-1) among species was lower than that of the second internal transcribed spacer (ITS-2). Phylogenetic analyses were conducted using the combined ITS-1 and ITS-2 data sets from the present study and using reference sequences from the GenBank database. The MP and ML trees were similar in topology. The phylogenetic trees were divided into two clades. Clade I included 8 species of Cylicocyclus; within this group, Cylicocyclus leptostomus (Kotlan, 1920) is nested between different samples of Cylicocyclus ashworthi (LeRoux, 1924), suggesting C. ashworthi may represent a species complex. Clade II included Cylicocyclus elongatus (Looss, 1900) and Cylicocyclus ultrajectinus (Ihle, 1920); however, these two species always clustered with the comparative species (Petrovinema poculatum (Looss, 1900) and Poteriostomum imparidentatum Quiel, 1919), suggesting that C. elongatus and C. ultrajectinus represent members of other genera.


2019 ◽  
Vol 42 (2) ◽  
pp. 225-243
Author(s):  
Emilio A. Coelho-Barros ◽  
Jorge A. Achcar ◽  
Edson Z. Martinez ◽  
Nasser Davarzani ◽  
Heike I. Grabsch

In this paper, we introduce a Bayesian approach for segmented Weibull distributions which could be a good alternative to analyze medical survival data in the presence of censored observations and covariates. With the obtained Bayesian estimated change-points we could get an excellent fit of the proposed model to any data sets. With the proposed methodology, it is also possible to identify survival times intervals where a covariate could have significantly different efects when compared to other lifetime intervals, an important point under a clinical view. The obtained Bayesian estimates are obtained using standard Markov Chain Monte Carlo methods. Some examples with real data sets illustrate the proposed methodology and its potential clinical value.


2020 ◽  
Vol 7 (3) ◽  
pp. 191315
Author(s):  
Amani A. Alahmadi ◽  
Jennifer A. Flegg ◽  
Davis G. Cochrane ◽  
Christopher C. Drovandi ◽  
Jonathan M. Keith

The behaviour of many processes in science and engineering can be accurately described by dynamical system models consisting of a set of ordinary differential equations (ODEs). Often these models have several unknown parameters that are difficult to estimate from experimental data, in which case Bayesian inference can be a useful tool. In principle, exact Bayesian inference using Markov chain Monte Carlo (MCMC) techniques is possible; however, in practice, such methods may suffer from slow convergence and poor mixing. To address this problem, several approaches based on approximate Bayesian computation (ABC) have been introduced, including Markov chain Monte Carlo ABC (MCMC ABC) and sequential Monte Carlo ABC (SMC ABC). While the system of ODEs describes the underlying process that generates the data, the observed measurements invariably include errors. In this paper, we argue that several popular ABC approaches fail to adequately model these errors because the acceptance probability depends on the choice of the discrepancy function and the tolerance without any consideration of the error term. We observe that the so-called posterior distributions derived from such methods do not accurately reflect the epistemic uncertainties in parameter values. Moreover, we demonstrate that these methods provide minimal computational advantages over exact Bayesian methods when applied to two ODE epidemiological models with simulated data and one with real data concerning malaria transmission in Afghanistan.


Author(s):  
Uyen Mai ◽  
Siavash Mirarab

Abstract Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a nonconvex optimization problem where the variance of log-transformed rate multipliers is minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.


2015 ◽  
Vol 112 (7) ◽  
pp. 2058-2063 ◽  
Author(s):  
Marc Hellmuth ◽  
Nicolas Wieseke ◽  
Marcus Lechner ◽  
Hans-Peter Lenhof ◽  
Martin Middendorf ◽  
...  

Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.


Author(s):  
Haitham Yousof ◽  
Ahmed Z Afify ◽  
Morad Alizadeh ◽  
G. G. Hamedani ◽  
S. Jahanshahi ◽  
...  

In this work, we introduce a new class of continuous distributions called the generalized poissonfamily which extends the quadratic rank transmutation map. We provide some special models for thenew family. Some of its mathematical properties including Rényi and q-entropies, order statistics andcharacterizations are derived. The estimations of the model parameters is performed by maximumlikelihood method. The Monte Carlo simulations is used for assessing the performance of the maximumlikelihood estimators. The ‡exibility of the proposed family is illustrated by means of two applicationsto real data sets.


Author(s):  
Mohamed Ibrahim Mohamed

In this work, we introduce a new extension of the Fréchet distribution. A sufficient set of the mathematical and statistical properties have been derived. The estimation of the parameters is carried out by considering the different method of estimation. The performances of the proposed estimation methods are studied by Monte Carlo simulations. The potentiality of the proposed model has been analyzed through two data sets. The weighted least square method is the best method for modelling breaking stress data, the least square method is the best method for modelling strengths data, however all other methods performed well for both data sets. On the other hand, the new model gives the best …ts among all other …fitted extensions of the Fréchet models to these data. So, it could be chosen as the best model for modeling breaking stress and strengths real data.


2020 ◽  
Vol 8 (1) ◽  
pp. 304-317 ◽  
Author(s):  
Hamid Esmaeili ◽  
Fazlollah Lak ◽  
Morad Alizadeh ◽  
Mohammad esmail Dehghan monfared

A new family of skew distributions is introduced by extending the alpha skew logistic distribution proposed by Hazarika-Chakraborty [9]. This family of distributions is called the alpha-beta skew logistic (ABSLG) distribution.Density function, moments, skewness and kurtosis coefficients are derived. The parameters of the new family are estimated by maximum likelihood and moments methods. The performance of the obtained estimators examined via a Monte carlo simulation. Flexibility, usefulness and suitability of ABSLG is illustrated by analyzing two real data sets.


Sign in / Sign up

Export Citation Format

Share Document