scholarly journals Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes

2008 ◽  
Vol 363 (1512) ◽  
pp. 4013-4021 ◽  
Author(s):  
Mark T Holder ◽  
Derrick J Zwickl ◽  
Christophe Dessimoz

Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets. We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences. We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b . Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.

PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0249027
Author(s):  
Abdulhakim A. Al-Babtain ◽  
Ibrahim Elbatal ◽  
Christophe Chesneau ◽  
Mohammed Elgarhy

The estimation of the entropy of a random system or process is of interest in many scientific applications. The aim of this article is the analysis of the entropy of the famous Kumaraswamy distribution, an aspect which has not been the subject of particular attention previously as surprising as it may seem. With this in mind, six different entropy measures are considered and expressed analytically via the beta function. A numerical study is performed to discuss the behavior of these measures. Subsequently, we investigate their estimation through a semi-parametric approach combining the obtained expressions and the maximum likelihood estimation approach. Maximum likelihood estimates for the considered entropy measures are thus derived. The convergence properties of these estimates are proved through a simulated data, showing their numerical efficiency. Concrete applications to two real data sets are provided.


2020 ◽  
Vol 9 (1) ◽  
pp. 61-81
Author(s):  
Lazhar BENKHELIFA

A new lifetime model, with four positive parameters, called the Weibull Birnbaum-Saunders distribution is proposed. The proposed model extends the Birnbaum-Saunders distribution and provides great flexibility in modeling data in practice. Some mathematical properties of the new distribution are obtained including expansions for the cumulative and density functions, moments, generating function, mean deviations, order statistics and reliability. Estimation of the model parameters is carried out by the maximum likelihood estimation method. A simulation study is presented to show the performance of the maximum likelihood estimates of the model parameters. The flexibility of the new model is examined by applying it to two real data sets.


2020 ◽  
Author(s):  
Yoonjee Kang ◽  
Denis Thieffry ◽  
Laura Cantini

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.


Author(s):  
Fiaz Ahmad Bhatti ◽  
G. G. Hamedani ◽  
Haitham M. Yousof ◽  
Azeem Ali ◽  
Munir Ahmad

A flexible lifetime distribution with increasing, decreasing, inverted bathtub and modified bathtub hazard rate called Modified Burr XII-Inverse Weibull (MBXII-IW) is introduced and studied. The density function of MBXII-IW is exponential, left-skewed, right-skewed and symmetrical shaped.  Descriptive measures on the basis of quantiles, moments, order statistics and reliability measures are theoretically established. The MBXII-IW distribution is characterized via different techniques. Parameters of MBXII-IW distribution are estimated using maximum likelihood method. The simulation study is performed to illustrate the performance of the maximum likelihood estimates (MLEs). The potentiality of MBXII-IW distribution is demonstrated by its application to real data sets: serum-reversal times and quarterly earnings.


2020 ◽  
Vol 15 (4) ◽  
pp. 2481-2510
Author(s):  
Fastel Chipepa ◽  
Divine Wanduku ◽  
Broderick Olusegun Oluyede

A new flexible and versatile generalized family of distributions, namely, half logistic odd Weibull-Topp-Leone-G (HLOW-TL-G) distribution is presented. The distribution can be traced back to the exponentiated-G distribution. We derive the statistical properties of the proposed family of distributions. Maximum likelihood estimates of the HLOW-TL-G family of distributions are also presented. Five special cases of the proposed family are presented. A simulation study and real data applications on one of the special cases are also presented


2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Vasileios Pappas ◽  
Konstantinos Adamidis ◽  
Sotirios Loukas

A four-parameter family of Weibull distributions is introduced, as an example of a more general class created along the lines of Marshall and Olkin, 1997. Various properties of the distribution are explored and its usefulness in modelling real data is demonstrated using maximum likelihood estimates.


Author(s):  
JIJU GILLARIOSE ◽  
Lishamol Tomy

In this article, we dened a new four-parameter model called Marshall-Olkin extended power Lomax distribution and studied its properties. Limiting distributions of sample maxima and sample minima are derived. The reliability of a system when both stress and strength follows the new distribution is discussed and associated characteristics are computed for simulated data. Finally, utilizing maximum likelihood estimation, the goodness of the distribution is tested for real data.


Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 510
Author(s):  
Bo Peng ◽  
Zhengqiu Xu ◽  
Min Wang

We introduce a new three-parameter lifetime distribution, the exponentiated Lindley geometric distribution, which exhibits increasing, decreasing, unimodal, and bathtub shaped hazard rates. We provide statistical properties of the new distribution, including shape of the probability density function, hazard rate function, quantile function, order statistics, moments, residual life function, mean deviations, Bonferroni and Lorenz curves, and entropies. We use maximum likelihood estimation of the unknown parameters, and an Expectation-Maximization algorithm is also developed to find the maximum likelihood estimates. The Fisher information matrix is provided to construct the asymptotic confidence intervals. Finally, two real-data examples are analyzed for illustrative purposes.


2012 ◽  
Vol 94 (3) ◽  
pp. 151-161 ◽  
Author(s):  
NATHAN HALL ◽  
LAINA MERCER ◽  
DAISY PHILLIPS ◽  
JONATHAN SHAW ◽  
AMY D. ANDERSON

SummaryIn this paper, we developed and compared several expectation–maximization (EM) algorithms to find maximum likelihood estimates of individual inbreeding coefficients using molecular marker information. The first method estimates the inbreeding coefficient for a single individual and assumes that allele frequencies are known without error. The second method jointly estimates inbreeding coefficients and allele frequencies for a set of individuals that have been genotyped at several loci. The third method generalizes the second method to include the case in which null alleles may be present. In particular, it is able to jointly estimate individual inbreeding coefficients and allele frequencies, including the frequencies of null alleles, and accounts for missing data. We compared our methods with several other estimation procedures using simulated data and found that our methods perform well. The maximum likelihood estimators consistently gave among the lowest root-mean-square-error (RMSE) of all the estimators that were compared. Our estimator that accounts for null alleles performed particularly well and was able to tease apart the effects of null alleles, randomly missing genotypes and differing degrees of inbreeding among members of the datasets we analysed. To illustrate the performance of our estimators, we analysed previously published datasets on mice (Mus musculus) and white-tailed deer (Odocoileus virginianus).


2018 ◽  
Vol 3 ◽  
pp. 33 ◽  
Author(s):  
John A. Lees ◽  
Michelle Kendall ◽  
Julian Parkhill ◽  
Caroline Colijn ◽  
Stephen D. Bentley ◽  
...  

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined 'true tree' using a realistic evolutionary model. We  built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. For the most accurate tree, use of either RAxML or IQ-TREE with an alignment of variable sites produced by mapping to a reference genome is best. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology.  We have publicly released our simulated data and code to enable further comparisons.


Sign in / Sign up

Export Citation Format

Share Document