Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes

Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets. We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences. We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b . Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.

Download Full-text

Estimation of different types of entropies for the Kumaraswamy distribution

PLoS ONE ◽

10.1371/journal.pone.0249027 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0249027

Author(s):

Abdulhakim A. Al-Babtain ◽

Ibrahim Elbatal ◽

Christophe Chesneau ◽

Mohammed Elgarhy

Keyword(s):

Maximum Likelihood ◽

Numerical Study ◽

Beta Function ◽

Simulated Data ◽

Likelihood Estimation ◽

Real Data ◽

Maximum Likelihood Estimates ◽

Kumaraswamy Distribution ◽

Random System ◽

Entropy Measures

The estimation of the entropy of a random system or process is of interest in many scientific applications. The aim of this article is the analysis of the entropy of the famous Kumaraswamy distribution, an aspect which has not been the subject of particular attention previously as surprising as it may seem. With this in mind, six different entropy measures are considered and expressed analytically via the beta function. A numerical study is performed to discuss the behavior of these measures. Subsequently, we investigate their estimation through a semi-parametric approach combining the obtained expressions and the maximum likelihood estimation approach. Maximum likelihood estimates for the considered entropy measures are thus derived. The convergence properties of these estimates are proved through a simulated data, showing their numerical efficiency. Concrete applications to two real data sets are provided.

Download Full-text

The Weibull Birnbaum-Saunders Distribution And Its Applications

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-887 ◽

2020 ◽

Vol 9 (1) ◽

pp. 61-81

Author(s):

Lazhar BENKHELIFA

Keyword(s):

Maximum Likelihood ◽

Estimation Method ◽

Likelihood Estimation ◽

Real Data ◽

Reliability Estimation ◽

Maximum Likelihood Estimates ◽

Model Parameters ◽

Data Sets ◽

Proposed Model ◽

Modeling Data

A new lifetime model, with four positive parameters, called the Weibull Birnbaum-Saunders distribution is proposed. The proposed model extends the Birnbaum-Saunders distribution and provides great flexibility in modeling data in practice. Some mathematical properties of the new distribution are obtained including expansions for the cumulative and density functions, moments, generating function, mean deviations, order statistics and reliability. Estimation of the model parameters is carried out by the maximum likelihood estimation method. A simulation study is presented to show the performance of the maximum likelihood estimates of the model parameters. The flexibility of the new model is examined by applying it to two real data sets.

Download Full-text

Evaluating the reproducibility of single-cell gene regulatory network inference algorithms

10.1101/2020.11.10.375923 ◽

2020 ◽

Author(s):

Yoonjee Kang ◽

Denis Thieffry ◽

Laura Cantini

Keyword(s):

Single Cell ◽

Network Inference ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Gene Regulatory Network Inference ◽

Sequencing Platform ◽

Cell Network ◽

Inference Algorithms ◽

Inference Methods

AbstractNetworks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth.Here, we benchmark four single-cell network inference methods based on their reproducibility, i.e. their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis.GENIE3 results to be the most reproducible algorithm, independently from the single-cell sequencing platform, the cell type annotation system, the number of cells constituting the dataset, or the thresholding applied to the links of the inferred networks. In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.

Download Full-text

On Modified Burr XII-Inverse Weibull Distribution: Development, Properties, Characterizations and Applications

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v16i4.2622 ◽

2020 ◽

pp. 721-735

Author(s):

Fiaz Ahmad Bhatti ◽

G. G. Hamedani ◽

Haitham M. Yousof ◽

Azeem Ali ◽

Munir Ahmad

Keyword(s):

Maximum Likelihood ◽

Hazard Rate ◽

Maximum Likelihood Method ◽

Real Data ◽

Lifetime Distribution ◽

Maximum Likelihood Estimates ◽

Likelihood Method ◽

Data Sets ◽

Quarterly Earnings ◽

Inverse Weibull Distribution

A flexible lifetime distribution with increasing, decreasing, inverted bathtub and modified bathtub hazard rate called Modified Burr XII-Inverse Weibull (MBXII-IW) is introduced and studied. The density function of MBXII-IW is exponential, left-skewed, right-skewed and symmetrical shaped. Descriptive measures on the basis of quantiles, moments, order statistics and reliability measures are theoretically established. The MBXII-IW distribution is characterized via different techniques. Parameters of MBXII-IW distribution are estimated using maximum likelihood method. The simulation study is performed to illustrate the performance of the maximum likelihood estimates (MLEs). The potentiality of MBXII-IW distribution is demonstrated by its application to real data sets: serum-reversal times and quarterly earnings.

Download Full-text

Half Logistic Odd Weibull-Topp-Leone-G Family of Distributions: Model, Properties and Applications

Afrika Statistika ◽

10.16929/as/2020.2481.169 ◽

2020 ◽

Vol 15 (4) ◽

pp. 2481-2510

Author(s):

Fastel Chipepa ◽

Divine Wanduku ◽

Broderick Olusegun Oluyede

Keyword(s):

Maximum Likelihood ◽

Simulation Study ◽

Real Data ◽

Statistical Properties ◽

Maximum Likelihood Estimates ◽

Special Cases ◽

Family Of Distributions

A new flexible and versatile generalized family of distributions, namely, half logistic odd Weibull-Topp-Leone-G (HLOW-TL-G) distribution is presented. The distribution can be traced back to the exponentiated-G distribution. We derive the statistical properties of the proposed family of distributions. Maximum likelihood estimates of the HLOW-TL-G family of distributions are also presented. Five special cases of the proposed family are presented. A simulation study and real data applications on one of the special cases are also presented

Download Full-text

A Family of Lifetime Distributions

International Journal of Quality Statistics and Reliability ◽

10.1155/2012/760687 ◽

2012 ◽

Vol 2012 ◽

pp. 1-6 ◽

Cited By ~ 12

Author(s):

Vasileios Pappas ◽

Konstantinos Adamidis ◽

Sotirios Loukas

Keyword(s):

Maximum Likelihood ◽

General Class ◽

Real Data ◽

Parameter Family ◽

Maximum Likelihood Estimates ◽

Weibull Distributions ◽

Lifetime Distributions

A four-parameter family of Weibull distributions is introduced, as an example of a more general class created along the lines of Marshall and Olkin, 1997. Various properties of the distribution are explored and its usefulness in modelling real data is demonstrated using maximum likelihood estimates.

Download Full-text

The Marshall-Olkin Extended Power Lomax Distribution with Applications

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v16i2.2805 ◽

2020 ◽

pp. 331-341

Author(s):

JIJU GILLARIOSE ◽

Lishamol Tomy

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Simulated Data ◽

Likelihood Estimation ◽

Real Data ◽

Limiting Distributions ◽

Lomax Distribution ◽

New Distribution ◽

Extended Power

In this article, we dened a new four-parameter model called Marshall-Olkin extended power Lomax distribution and studied its properties. Limiting distributions of sample maxima and sample minima are derived. The reliability of a system when both stress and strength follows the new distribution is discussed and associated characteristics are computed for simulated data. Finally, utilizing maximum likelihood estimation, the goodness of the distribution is tested for real data.

Download Full-text

The Exponentiated Lindley Geometric Distribution with Applications

Entropy ◽

10.3390/e21050510 ◽

2019 ◽

Vol 21 (5) ◽

pp. 510

Author(s):

Bo Peng ◽

Zhengqiu Xu ◽

Min Wang

Keyword(s):

Maximum Likelihood ◽

Geometric Distribution ◽

Residual Life ◽

Information Matrix ◽

Expectation Maximization Algorithm ◽

Likelihood Estimation ◽

Real Data ◽

Quantile Function ◽

Maximum Likelihood Estimates ◽

Unknown Parameters

We introduce a new three-parameter lifetime distribution, the exponentiated Lindley geometric distribution, which exhibits increasing, decreasing, unimodal, and bathtub shaped hazard rates. We provide statistical properties of the new distribution, including shape of the probability density function, hazard rate function, quantile function, order statistics, moments, residual life function, mean deviations, Bonferroni and Lorenz curves, and entropies. We use maximum likelihood estimation of the unknown parameters, and an Expectation-Maximization algorithm is also developed to find the maximum likelihood estimates. The Fisher information matrix is provided to construct the asymptotic confidence intervals. Finally, two real-data examples are analyzed for illustrative purposes.

Download Full-text

Maximum likelihood estimation of individual inbreeding coefficients and null allele frequencies

Genetics Research ◽

10.1017/s0016672312000341 ◽

2012 ◽

Vol 94 (3) ◽

pp. 151-161 ◽

Cited By ~ 9

Author(s):

NATHAN HALL ◽

LAINA MERCER ◽

DAISY PHILLIPS ◽

JONATHAN SHAW ◽

AMY D. ANDERSON

Keyword(s):

Maximum Likelihood ◽

Simulated Data ◽

Likelihood Estimation ◽

Allele Frequencies ◽

Maximum Likelihood Estimates ◽

Null Alleles ◽

Single Individual ◽

Em Algorithms ◽

Inbreeding Coefficients ◽

Missing Genotypes

SummaryIn this paper, we developed and compared several expectation–maximization (EM) algorithms to find maximum likelihood estimates of individual inbreeding coefficients using molecular marker information. The first method estimates the inbreeding coefficient for a single individual and assumes that allele frequencies are known without error. The second method jointly estimates inbreeding coefficients and allele frequencies for a set of individuals that have been genotyped at several loci. The third method generalizes the second method to include the case in which null alleles may be present. In particular, it is able to jointly estimate individual inbreeding coefficients and allele frequencies, including the frequencies of null alleles, and accounts for missing data. We compared our methods with several other estimation procedures using simulated data and found that our methods perform well. The maximum likelihood estimators consistently gave among the lowest root-mean-square-error (RMSE) of all the estimators that were compared. Our estimator that accounts for null alleles performed particularly well and was able to tease apart the effects of null alleles, randomly missing genotypes and differing degrees of inbreeding among members of the datasets we analysed. To illustrate the performance of our estimators, we analysed previously published datasets on mice (Mus musculus) and white-tailed deer (Odocoileus virginianus).

Download Full-text

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study

Wellcome Open Research ◽

10.12688/wellcomeopenres.14265.2 ◽

2018 ◽

Vol 3 ◽

pp. 33 ◽

Cited By ~ 18

Author(s):

John A. Lees ◽

Michelle Kendall ◽

Julian Parkhill ◽

Caroline Colijn ◽

Stephen D. Bentley ◽

...

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Reconstruction ◽

Simulated Data ◽

Real Data ◽

Tree Topology ◽

Computational Time ◽

Advantages And Disadvantages ◽

Phylogenetic Reconstructions ◽

Reconstruction Methods ◽

True Tree

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined 'true tree' using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. For the most accurate tree, use of either RAxML or IQ-TREE with an alignment of variable sites produced by mapping to a reference genome is best. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons.

Download Full-text