scholarly journals Legofit: Estimating Population History from Genetic Data

2019 ◽  
Author(s):  
Alan R. Rogers

AbstractBackgroundOur current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software.ResultsThe Legofit software package allows simultaneous estimation of parameters describing admixture and other aspects of population history. It includes facilities for data manipulation, estimation, model selection, and model averaging. It outperforms several statistical methods that have been widely used to study archaic admixture in humans.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Alan R. Rogers

Abstract Background Our current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software. Results The Legofit software package allows simultaneous estimation of parameters describing admixture, and the sizes and separation times of ancestral populations. It includes facilities for data manipulation, estimation, analysis of residuals, model selection, and model averaging. Conclusions Legofit uses genetic data to study the history of a subdivided population. It is unaffected by recent history and can therefore focus on the deep history of population size, subdivision, and admixture. It outperforms several statistical methods that have been widely used to study population history and should be useful in any species for which DNA sequence data is available from several populations.


Author(s):  
Suzana de Siqueira Santos ◽  
Daniel Yasumasa Takahashi ◽  
João Ricardo Sato ◽  
Carlos Eduardo Ferreira ◽  
André Fujita

2019 ◽  
Vol 50 (6) ◽  
pp. 1665-1678 ◽  
Author(s):  
Kenechukwu Okoli ◽  
Maurizio Mazzoleni ◽  
Korbinian Breinl ◽  
Giuliano Di Baldassarre

Abstract We compare statistical and hydrological methods to estimate design floods by proposing a framework that is based on assuming a synthetic scenario considered as ‘truth’ and use it as a benchmark for analysing results. To illustrate the framework, we used probability model selection and model averaging as statistical methods, while continuous simulations made with a simple and relatively complex rainfall–runoff model are used as hydrological methods. The results of our numerical exercise show that design floods estimated by using a simple rainfall–runoff model have small parameter uncertainty and limited errors, even for high return periods. Statistical methods perform better than the linear reservoir model in terms of median errors for high return periods, but their uncertainty (i.e., variance of the error) is larger. Moreover, selecting the best fitting probability distribution is associated with numerous outliers. On the contrary, using multiple probability distributions, regardless of their capability in fitting the data, leads to significantly fewer outliers, while keeping a similar accuracy. Thus, we find that, among the statistical methods, model averaging is a better option than model selection. Our results also show the relevance of the precautionary principle in design flood estimation, and thus help develop general recommendations for practitioners and experts involved in flood risk reduction.


Genetics ◽  
2000 ◽  
Vol 155 (3) ◽  
pp. 1429-1437
Author(s):  
Oliver G Pybus ◽  
Andrew Rambaut ◽  
Paul H Harvey

Abstract We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.


Radiocarbon ◽  
2013 ◽  
Vol 55 (2) ◽  
pp. 720-730 ◽  
Author(s):  
Christopher Bronk Ramsey ◽  
Sharen Lee

OxCal is a widely used software package for the calibration of radiocarbon dates and the statistical analysis of 14C and other chronological information. The program aims to make statistical methods easily available to researchers and students working in a range of different disciplines. This paper will look at the recent and planned developments of the package. The recent additions to the statistical methods are primarily aimed at providing more robust models, in particular through model averaging for deposition models and through different multiphase models. The paper will look at how these new models have been implemented and explore the implications for researchers who might benefit from their use. In addition, a new approach to the evaluation of marine reservoir offsets will be presented. As the quantity and complexity of chronological data increase, it is also important to have efficient methods for the visualization of such extensive data sets and methods for the presentation of spatial and geographical data embedded within planned future versions of OxCal will also be discussed.


2017 ◽  
Author(s):  
Rebecca L. Koscik ◽  
Derek L. Norton ◽  
Samantha L. Allison ◽  
Erin M. Jonaitis ◽  
Lindsay R. Clark ◽  
...  

ObjectiveIn this paper we apply Information-Theoretic (IT) model averaging to characterize a set of complex interactions in a longitudinal study on cognitive decline. Prior research has identified numerous genetic (including sex), education, health and lifestyle factors that predict cognitive decline. Traditional model selection approaches (e.g., backward or stepwise selection) attempt to find models that best fit the observed data; these techniques risk interpretations that only the selected predictors are important. In reality, several models may fit similarly well but result in different conclusions (e.g., about size and significance of parameter estimates); inference from traditional model selection approaches can lead to overly confident conclusions.MethodHere we use longitudinal cognitive data from ~1550 late-middle aged adults the Wisconsin Registry for Alzheimer’s Prevention study to examine the effects of sex, Apolipoprotein E (APOE) ɛ4 allele (non-modifiable factors), and literacy achievement (modifiable) on cognitive decline. For each outcome, we applied IT model averaging to a model set with combinations of interactions among sex, APOE, literacy, and age.ResultsFor a list-learning test, model-averaged results showed better performance for women vs men, with faster decline among men; increased literacy was associated with better performance, particularly among men. APOE had less of an effect on cognitive performance in this age range (~40-70).ConclusionsThese results illustrate the utility of the IT approach and point to literacy as a potential modifier of decline. Whether the protective effect of literacy is due to educational attainment or intrinsic verbal intellectual ability is the topic of ongoing work.


2020 ◽  
Vol 19 (2) ◽  
Author(s):  
Beata Śpiewak ◽  
Anna Barańska

The paper contains the comparison of mechanism of two separately constructed statistical methods for the detection of outliers in real estate market analysis. For this purpose, databases with various types of real estate from local markets were created. Then the estimation of parameters of functional models describing dependencies prevailing on the examined markets was carried out. Subsequently, statistical tools called Baarda method and model residue analysis were used to detect outliers in the collected datasets. The last stage was a comparison of the obtained results of the parameters' estimation of the analyzed models and the measures of their quality, before and after the removal of outliers. The obtained results indicate that algorithms of chosen statistical methods, detecting outliers, allow to eliminate a smaller number of them, at the same time obtaining an improvement of the parameters of the functional model and its adjustment to the analyzed dataset. This gives the premise for the development of criteria for the selection of statistical methods that look for gross errors in the analyzed databases, among others, depending on functional model used, type of property and number of properties.


2021 ◽  
Author(s):  
Carlos R Oliveira ◽  
Eugene D Shapiro ◽  
Daniel M Weinberger

Vaccine effectiveness (VE) studies are often conducted after the introduction of new vaccines to ensure they provide protection in real-world settings. Although susceptible to confounding, the test-negative case-control study design is the most efficient method to assess VE post-licensure. Control of confounding is often needed during the analyses, which is most efficiently done through multivariable modeling. When a large number of potential confounders are being considered, it can be challenging to know which variables need to be included in the final model. This paper highlights the importance of considering model uncertainty by re-analyzing a Lyme VE study using several confounder selection methods. We propose an intuitive Bayesian Model Averaging (BMA) framework for this task and compare the performance of BMA to that of traditional single-best-model-selection methods. We demonstrate how BMA can be advantageous in situations when there is uncertainty about model selection by systematically considering alternative models and increasing transparency.


Sign in / Sign up

Export Citation Format

Share Document