Legofit: Estimating Population History from Genetic Data

Mapping Intimacies ◽

10.1101/613067 ◽

2019 ◽

Cited By ~ 3

Author(s):

Alan R. Rogers

Keyword(s):

Model Selection ◽

Statistical Methods ◽

Model Averaging ◽

Population History ◽

Simultaneous Estimation ◽

Estimation Of Parameters ◽

Estimation Model ◽

Data Manipulation ◽

Archaic Admixture ◽

Parameter Values

AbstractBackgroundOur current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software.ResultsThe Legofit software package allows simultaneous estimation of parameters describing admixture and other aspects of population history. It includes facilities for data manipulation, estimation, model selection, and model averaging. It outperforms several statistical methods that have been widely used to study archaic admixture in humans.

Download Full-text

Legofit: estimating population history from genetic data

BMC Bioinformatics ◽

10.1186/s12859-019-3154-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Alan R. Rogers

Keyword(s):

Model Selection ◽

Statistical Methods ◽

Sequence Data ◽

Model Averaging ◽

Genetic Data ◽

Population History ◽

Simultaneous Estimation ◽

Estimation Of Parameters ◽

History Of ◽

Parameter Values

Abstract Background Our current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software. Results The Legofit software package allows simultaneous estimation of parameters describing admixture, and the sizes and separation times of ancestral populations. It includes facilities for data manipulation, estimation, analysis of residuals, model selection, and model averaging. Conclusions Legofit uses genetic data to study the history of a subdivided population. It is unaffected by recent history and can therefore focus on the deep history of population size, subdivision, and admixture. It outperforms several statistical methods that have been widely used to study population history and should be useful in any species for which DNA sequence data is available from several populations.

Download Full-text

Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

Mathematical Foundations and Applications of Graph Entropy ◽

10.1002/9783527693245.ch6 ◽

2016 ◽

pp. 183-202 ◽

Cited By ~ 1

Author(s):

Suzana de Siqueira Santos ◽

Daniel Yasumasa Takahashi ◽

João Ricardo Sato ◽

Carlos Eduardo Ferreira ◽

André Fujita

Keyword(s):

Parameter Estimation ◽

Model Selection ◽

Statistical Methods ◽

Hypothesis Test ◽

Estimation Model

Download Full-text

A systematic comparison of statistical and hydrological methods for design flood estimation

Hydrology Research ◽

10.2166/nh.2019.188 ◽

2019 ◽

Vol 50 (6) ◽

pp. 1665-1678 ◽

Cited By ~ 6

Author(s):

Kenechukwu Okoli ◽

Maurizio Mazzoleni ◽

Korbinian Breinl ◽

Giuliano Di Baldassarre

Keyword(s):

Model Selection ◽

Statistical Methods ◽

Model Averaging ◽

Return Periods ◽

Rainfall Runoff ◽

Design Flood ◽

High Return ◽

Flood Estimation ◽

Rainfall Runoff Model ◽

Runoff Model

Abstract We compare statistical and hydrological methods to estimate design floods by proposing a framework that is based on assuming a synthetic scenario considered as ‘truth’ and use it as a benchmark for analysing results. To illustrate the framework, we used probability model selection and model averaging as statistical methods, while continuous simulations made with a simple and relatively complex rainfall–runoff model are used as hydrological methods. The results of our numerical exercise show that design floods estimated by using a simple rainfall–runoff model have small parameter uncertainty and limited errors, even for high return periods. Statistical methods perform better than the linear reservoir model in terms of median errors for high return periods, but their uncertainty (i.e., variance of the error) is larger. Moreover, selecting the best fitting probability distribution is associated with numerous outliers. On the contrary, using multiple probability distributions, regardless of their capability in fitting the data, leads to significantly fewer outliers, while keeping a similar accuracy. Thus, we find that, among the statistical methods, model averaging is a better option than model selection. Our results also show the relevance of the precautionary principle in design flood estimation, and thus help develop general recommendations for practitioners and experts involved in flood risk reduction.

Download Full-text

Nonlinear predictive model selection and model averaging using information criteria

Systems Science & Control Engineering ◽

10.1080/21642583.2018.1496042 ◽

2018 ◽

Vol 6 (1) ◽

pp. 319-328 ◽

Cited By ~ 4

Author(s):

Yuanlin Gu ◽

Hua-Liang Wei ◽

Michael M. Balikhin

Keyword(s):

Model Selection ◽

Predictive Model ◽

Model Averaging ◽

Information Criteria

Download Full-text

An Integrated Framework for the Inference of Viral Population History From Reconstructed Genealogies

Genetics ◽

10.1093/genetics/155.3.1429 ◽

2000 ◽

Vol 155 (3) ◽

pp. 1429-1437

Author(s):

Oliver G Pybus ◽

Andrew Rambaut ◽

Paul H Harvey

Keyword(s):

Maximum Likelihood ◽

Sequence Data ◽

Demographic History ◽

Population History ◽

Maximum Likelihood Estimates ◽

Viral Population ◽

True Parameter ◽

Subtype B ◽

Exponential Growth Model ◽

Parameter Values

Abstract We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.

Download Full-text

Recent and Planned Developments of the Program OxCal

Radiocarbon ◽

10.1017/s0033822200057878 ◽

2013 ◽

Vol 55 (2) ◽

pp. 720-730 ◽

Cited By ~ 641

Author(s):

Christopher Bronk Ramsey ◽

Sharen Lee

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Software Package ◽

Model Averaging ◽

Data Sets ◽

Radiocarbon Dates ◽

New Approach ◽

New Models ◽

Multiphase Models ◽

Deposition Models

OxCal is a widely used software package for the calibration of radiocarbon dates and the statistical analysis of 14C and other chronological information. The program aims to make statistical methods easily available to researchers and students working in a range of different disciplines. This paper will look at the recent and planned developments of the package. The recent additions to the statistical methods are primarily aimed at providing more robust models, in particular through model averaging for deposition models and through different multiphase models. The paper will look at how these new models have been implemented and explore the implications for researchers who might benefit from their use. In addition, a new approach to the evaluation of marine reservoir offsets will be presented. As the quantity and complexity of chronological data increase, it is also important to have efficient methods for the visualization of such extensive data sets and methods for the presentation of spatial and geographical data embedded within planned future versions of OxCal will also be discussed.

Download Full-text

Parameter Estimation, Model Selection and Classification

Digital Audio Restoration ◽

10.1007/978-1-4471-1561-8_4 ◽

1998 ◽

pp. 69-95

Author(s):

Simon J. Godsill ◽

Peter J. W. Rayner

Keyword(s):

Parameter Estimation ◽

Model Selection ◽

Estimation Model

Download Full-text

Characterizing the effects of sex, APOE ɛ4, and literacy on mid-life cognitive trajectories: Application of Information-Theoretic model-averaging and multi-model inference techniques to the Wisconsin Registry for Alzheimer’s Prevention Study

10.1101/229237 ◽

2017 ◽

Author(s):

Rebecca L. Koscik ◽

Derek L. Norton ◽

Samantha L. Allison ◽

Erin M. Jonaitis ◽

Lindsay R. Clark ◽

...

Keyword(s):

Model Selection ◽

Cognitive Decline ◽

Model Averaging ◽

Parameter Estimates ◽

Theoretic Model ◽

Traditional Model ◽

Test Model ◽

Prevention Study ◽

Information Theoretic ◽

Modifiable Factors

ObjectiveIn this paper we apply Information-Theoretic (IT) model averaging to characterize a set of complex interactions in a longitudinal study on cognitive decline. Prior research has identified numerous genetic (including sex), education, health and lifestyle factors that predict cognitive decline. Traditional model selection approaches (e.g., backward or stepwise selection) attempt to find models that best fit the observed data; these techniques risk interpretations that only the selected predictors are important. In reality, several models may fit similarly well but result in different conclusions (e.g., about size and significance of parameter estimates); inference from traditional model selection approaches can lead to overly confident conclusions.MethodHere we use longitudinal cognitive data from ~1550 late-middle aged adults the Wisconsin Registry for Alzheimer’s Prevention study to examine the effects of sex, Apolipoprotein E (APOE) ɛ4 allele (non-modifiable factors), and literacy achievement (modifiable) on cognitive decline. For each outcome, we applied IT model averaging to a model set with combinations of interactions among sex, APOE, literacy, and age.ResultsFor a list-learning test, model-averaged results showed better performance for women vs men, with faster decline among men; increased literacy was associated with better performance, particularly among men. APOE had less of an effect on cognitive performance in this age range (~40-70).ConclusionsThese results illustrate the utility of the IT approach and point to literacy as a potential modifier of decline. Whether the protective effect of literacy is due to educational attainment or intrinsic verbal intellectual ability is the topic of ongoing work.

Download Full-text

Chosen statistical methods for the detection of outliers in real estate market analysis

Acta Scientiarum Polonorum Administratio Locorum ◽

10.31648/aspal.4608 ◽

2020 ◽

Vol 19 (2) ◽

Author(s):

Beata Śpiewak ◽

Anna Barańska

Keyword(s):

Real Estate ◽

Statistical Methods ◽

Residue Analysis ◽

Functional Model ◽

Real Estate Market ◽

Market Analysis ◽

Parameters Estimation ◽

Estimation Of Parameters ◽

Before And After ◽

Detection Of Outliers

The paper contains the comparison of mechanism of two separately constructed statistical methods for the detection of outliers in real estate market analysis. For this purpose, databases with various types of real estate from local markets were created. Then the estimation of parameters of functional models describing dependencies prevailing on the examined markets was carried out. Subsequently, statistical tools called Baarda method and model residue analysis were used to detect outliers in the collected datasets. The last stage was a comparison of the obtained results of the parameters' estimation of the analyzed models and the measures of their quality, before and after the removal of outliers. The obtained results indicate that algorithms of chosen statistical methods, detecting outliers, allow to eliminate a smaller number of them, at the same time obtaining an improvement of the parameters of the functional model and its adjustment to the analyzed dataset. This gives the premise for the development of criteria for the selection of statistical methods that look for gross errors in the analyzed databases, among others, depending on functional model used, type of property and number of properties.

Download Full-text

Bayesian Model Averaging to Account for Model Uncertainty in Estimates of a Vaccine's Effectiveness

10.1101/2021.05.12.21257126 ◽

2021 ◽

Author(s):

Carlos R Oliveira ◽

Eugene D Shapiro ◽

Daniel M Weinberger

Keyword(s):

Model Selection ◽

Model Uncertainty ◽

Bayesian Model ◽

Bayesian Model Averaging ◽

Model Averaging ◽

Selection Methods ◽

Final Model ◽

Negative Case ◽

Confounder Selection ◽

Control Study

Vaccine effectiveness (VE) studies are often conducted after the introduction of new vaccines to ensure they provide protection in real-world settings. Although susceptible to confounding, the test-negative case-control study design is the most efficient method to assess VE post-licensure. Control of confounding is often needed during the analyses, which is most efficiently done through multivariable modeling. When a large number of potential confounders are being considered, it can be challenging to know which variables need to be included in the final model. This paper highlights the importance of considering model uncertainty by re-analyzing a Lyme VE study using several confounder selection methods. We propose an intuitive Bayesian Model Averaging (BMA) framework for this task and compare the performance of BMA to that of traditional single-best-model-selection methods. We demonstrate how BMA can be advantageous in situations when there is uncertainty about model selection by systematically considering alternative models and increasing transparency.

Download Full-text