scholarly journals The Influence of Model Violation on Phylogenetic Inference: A Simulation Study

2021 ◽  
Author(s):  
Suha Naser-Khdour ◽  
Rob Lanfear ◽  
Bui Quang Minh

Phylogenetic inference typically assumes that the data has evolved under Stationary, Reversible and Homogeneous (SRH) conditions. Many empirical and simulation studies have shown that assuming SRH conditions can lead to significant errors in phylogenetic inference when the data violates these assumptions. Yet, many simulation studies focused on extreme non-SRH conditions that represent worst-case scenarios and not the average empirical dataset. In this study, we simulate datasets under various degrees of non-SRH conditions using empirically derived parameters to mimic real data and examine the effects of incorrectly assuming SRH conditions on inferring phylogenies. Our results show that maximum likelihood inference is generally quite robust to a wide range of SRH model violations but is inaccurate under extreme convergent evolution.

2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


2020 ◽  
Vol 15 (4) ◽  
pp. 2481-2510
Author(s):  
Fastel Chipepa ◽  
Divine Wanduku ◽  
Broderick Olusegun Oluyede

A new flexible and versatile generalized family of distributions, namely, half logistic odd Weibull-Topp-Leone-G (HLOW-TL-G) distribution is presented. The distribution can be traced back to the exponentiated-G distribution. We derive the statistical properties of the proposed family of distributions. Maximum likelihood estimates of the HLOW-TL-G family of distributions are also presented. Five special cases of the proposed family are presented. A simulation study and real data applications on one of the special cases are also presented


2018 ◽  
Author(s):  
Rafael F. Guerrero ◽  
Matthew W. Hahn

AbstractConvergent evolution is often inferred when a trait is incongruent with the species tree. However, trait incongruence can also arise from changes that occur on discordant gene trees, a process referred to as hemiplasy. Hemiplasy is rarely taken into account in studies of convergent evolution, despite the fact that phylogenomic studies have revealed rampant discordance. Here, we study the relative probabilities of homoplasy (including convergence and reversal) and hemiplasy for an incongruent trait. We derive expressions for the probabilities of the two events, showing that they depend on many of the same parameters. We find that hemiplasy is as likely— or more likely—than homoplasy for a wide range of conditions, even when levels of discordance are low. We also present a new method to calculate the ratio of these two probabilities (the “hemiplasy risk factor”) along the branches of a phylogeny of arbitrary length. Such calculations can be applied to any tree in order to identify when and where incongruent traits may be more likely to be due to hemiplasy than homoplasy.


2017 ◽  
Vol 40 (1) ◽  
pp. 105-121 ◽  
Author(s):  
Marwa Khalil

The problem of estimation reliability in a multicomponent stress-strength model, when the system consists of k components have strength each compo- nent experiencing a random stress, is considered in this paper. The reliability of such a system is obtained when strength and stress variables are given by Lindley distribution. The system is regarded as alive only if at least r out of k (r < k) strength exceeds the stress. The multicomponent reliability of the system is given by Rr,k . The maximum likelihood estimator (M LE), uniformly minimum variance unbiased estimator (UMVUE) and Bayes esti- mator of Rr,k are obtained. A simulation study is performed to compare the different estimators of Rr,k . Real data is used as a practical application of the proposed model.


2018 ◽  
Vol 33 (1) ◽  
pp. 31-43
Author(s):  
Bol A. M. Atem ◽  
Suleman Nasiru ◽  
Kwara Nantomah

Abstract This article studies the properties of the Topp–Leone linear exponential distribution. The parameters of the new model are estimated using maximum likelihood estimation, and simulation studies are performed to examine the finite sample properties of the parameters. An application of the model is demonstrated using a real data set. Finally, a bivariate extension of the model is proposed.


2021 ◽  
pp. 096228022110342
Author(s):  
Denis Talbot ◽  
Awa Diop ◽  
Mathilde Lavigne-Robichaud ◽  
Chantal Brisson

Background The change in estimate is a popular approach for selecting confounders in epidemiology. It is recommended in epidemiologic textbooks and articles over significance test of coefficients, but concerns have been raised concerning its validity. Few simulation studies have been conducted to investigate its performance. Methods An extensive simulation study was realized to compare different implementations of the change in estimate method. The implementations were also compared when estimating the association of body mass index with diastolic blood pressure in the PROspective Québec Study on Work and Health. Results All methods were susceptible to introduce important bias and to produce confidence intervals that included the true effect much less often than expected in at least some scenarios. Overall mixed results were obtained regarding the accuracy of estimators, as measured by the mean squared error. No implementation adequately differentiated confounders from non-confounders. In the real data analysis, none of the implementation decreased the estimated standard error. Conclusion Based on these results, it is questionable whether change in estimate methods are beneficial in general, considering their low ability to improve the precision of estimates without introducing bias and inability to yield valid confidence intervals or to identify true confounders.


2021 ◽  
Vol 2 ◽  
pp. 1
Author(s):  
Haitham M. Yousof ◽  
Mustafa C. Korkmaz ◽  
G.G. Hamedani ◽  
Mohamed Ibrahim

In this work, we derive a novel extension of Chen distribution. Some statistical properties of the new model are derived. Numerical analysis for mean, variance, skewness and kurtosis is presented. Some characterizations of the proposed distribution are presented. Different classical estimation methods under uncensored schemes such as the maximum likelihood, Anderson-Darling, weighted least squares and right-tail Anderson–Darling methods are considered. Simulation studies are performed in order to compare and assess the above-mentioned estimation methods. For comparing the applicability of the four classical methods, two application to real data set are analyzed.


2019 ◽  
Vol 56 (2) ◽  
pp. 185-210 ◽  
Author(s):  
Abraão D. C. Nascimento ◽  
Kássio F. Silva ◽  
Gauss M. Cordeiro ◽  
Morad Alizadeh ◽  
Haitham M. Yousof ◽  
...  

Abstract We study some mathematical properties of a new generator of continuous distributions called the Odd Nadarajah-Haghighi (ONH) family. In particular, three special models in this family are investigated, namely the ONH gamma, beta and Weibull distributions. The family density function is given as a linear combination of exponentiated densities. Further, we propose a bivariate extension and various characterization results of the new family. We determine the maximum likelihood estimates of ONH parameters for complete and censored data. We provide a simulation study to verify the precision of these estimates. We illustrate the performance of the new family by means of a real data set.


2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
Guoqi Qian ◽  
Yuehua Wu ◽  
Davide Ferrari ◽  
Puxue Qiao ◽  
Frédéric Hollande

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.


Sign in / Sign up

Export Citation Format

Share Document