scholarly journals Selecting among three basic fitness landscape models: additive, multiplicative and stickbreaking

2017 ◽  
Author(s):  
Craig R. Miller ◽  
James T. Van Leuven ◽  
Holly A. Wichman ◽  
Paul Joyce

AbstractFitness landscapes map genotypes to organismal fitness. Their topography depends on how mutational effects interact–epistasis–and is important for understanding evolutionary processes such as speciation, the rate of adaptation, the advantage of recombination, and predictability versus stochasticity of evolution. The growing amount of empirical data has made it possible to better test landscape models empirically. We argue that this endeavor will benefit from the development and use of meaningful null models against which to compare more complex models. Here we develop statistical and computational methods for fitting fitness data from mutation combinatorial networks to three simple models: additive, multiplicative and stickbreaking. We employ a Bayesian framework for doing model selection. Using simulations, we demonstrate that our methods work and we explore their statistical performance: bias, error, and the power to discriminate among models. We then illustrate our approach and its flexibility by analyzing several previously published datasets. An R-package that implements our methods is available in the CRAN repository under the name Stickbreaker.

2019 ◽  
pp. 004912411988245 ◽  
Author(s):  
J. Mulder ◽  
A. E. Raftery

The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC, however, is not suitable for evaluating models with order constraints on the parameters of interest. This article explores two extensions of the BIC for evaluating order-constrained models, one where a truncated unit information prior is used under the order-constrained model and the other where a truncated local unit information prior is used. The first prior is centered on the maximum likelihood estimate, and the latter prior is centered on a null value. Several analyses show that the order-constrained BIC based on the local unit information prior works better as an Occam’s razor for evaluating order-constrained models and results in lower error probabilities. The methodology based on the local unit information prior is implemented in the R package “BICpack” which allows researchers to easily apply the method for order-constrained model selection. The usefulness of the methodology is illustrated using data from the European Values Study.


2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Georgia Tsiliki ◽  
Cristian R. Munteanu ◽  
Jose A. Seoane ◽  
Carlos Fernandez-Lozano ◽  
Haralambos Sarimveis ◽  
...  

2021 ◽  
pp. 136700692110319
Author(s):  
Lena V. Kremin ◽  
Krista Byers-Heinlein

Aims and Objectives: Bilingualism is a complex construct, and it can be difficult to define and model. This paper proposes that the field of bilingualism can draw from other fields of psychology, by integrating advanced psychometric models that incorporate both categorical and continuous properties. These models can unify the widespread use of bilingual and monolingual groups that exist in the literature with recent proposals that bilingualism should be viewed as a continuous variable. Approach: In the paper, we highlight two models of potential interest: the factor mixture model and the grade-of-membership model. These models simultaneously allow for the formation of different categories of speakers and for continuous variation to exist within these categories. We discuss how these models could be implemented in bilingualism research, including how to develop these models. When using either of the two models, researchers can conduct their analyses on either the categorical or continuous information, or a combination of the two, depending on which is most appropriate to address their research question. Conclusions: The field of bilingualism research could benefit from incorporating more complex models into definitions of bilingualism. To help various subfields of bilingualism research converge on appropriate models, we encourage researchers to pre-register their model selection and planned analyses, as well as to share their data and analysis scripts. Originality: The paper uniquely proposes the incorporation of advanced statistical psychometric methods for defining and modeling bilingualism. Significance: Conceptualizing bilingualism within the context of these more flexible models will allow a wide variety of research questions to be addressed. Ultimately, this will help to advance theory and lead to a fuller and deeper understanding of bilingualism.


2019 ◽  
Author(s):  
Javier de Velasco Oriol ◽  
Antonio Martinez-Torteya ◽  
Victor Trevino ◽  
Israel Alanis ◽  
Edgar E. Vallejo ◽  
...  

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.


2021 ◽  
Author(s):  
Joseph R Mihaljevic ◽  
Seth Borkovec ◽  
Saikanth Ratnavale ◽  
Toby D Hocking ◽  
Kelsey E Banister ◽  
...  

1. Simulating the dynamics of realistically complex models of infectious disease is conceptually challenging and computationally expensive. This results in a heavy reliance on customized software and, correspondingly, lower reproducibility across disease modeling studies. 2. SPARSEMOD stands for SPAtial Resolution-SEnsitive Models of Outbreak Dynamics. The goal of our project, encapsulated by the SPARSEMODr R package, is to offer a framework for rapidly simulating the dynamics of stochastic and spatially-explicit models of infectious disease for use in pedagogical and applied contexts. 3. We outline the universal functions of our package that allow for user-customization while demonstrating the common work flow. 4. SPARSEMODr offers an extendable framework that should allow the open-source community of disease modelers to add new model types and functionalities in future releases.


2019 ◽  
Author(s):  
Victor A. Meszaros ◽  
Miles D. Miller-Dickson ◽  
C. Brandon Ogbunugafor

In silicoapproaches have served a central role in the development of evolutionary theory for generations. This especially applies to the concept of the fitness landscape, one of the most important abstractions in evolutionary genetics, and one which has benefited from the presence of large empirical data sets only in the last decade or so. In this study, we propose a method that allows us to generate enormous data sets that walk the line betweenin silicoand empirical: word usage frequencies as catalogued by the Google ngram corpora. These data can be codified or analogized in terms of a multidimensional empirical fitness landscape towards the examination of advanced concepts—adaptive landscape by environment interactions, clonal competition, higher-order epistasis and countless others. We argue that the greaterLexical Landscapesapproach can serve as a platform that offers an astronomical number of fitness landscapes for exploration (at least) or theoretical formalism (potentially) in evolutionary biology.


2020 ◽  
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Jürgen Debus ◽  
Amir Abdollahi ◽  
André Karch ◽  
...  

Abstract Background: Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. Age-Period-Cohort (APC) models, usually based on long-term cancer registry data (>20yrs), are established for such projections. In many countries (including Germany), however, nationwide long-term data are not yet available. General guidance on statistical approaches for projections using rather short-term data is challenging and software to enable researchers to easily compare approaches is lacking. Methods: To enable a comparative analysis of the performance of statistical approaches to cancer incidence projection, we developed an R package (incAnalysis), supporting in particular Bayesian models fitted by Integrated Nested Laplace Approximations (INLA). Its use is demonstrated by an extensive empirical evaluation of operating characteristics (bias, coverage and precision) of potentially applicable models differing by complexity. Observed long-term data from three cancer registries (SEER-9, NORDCAN, Saarland) was used for benchmarking. Results: Overall, coverage was high (mostly >90%) for Bayesian APC models (BAPC), whereas less complex models showed differences in coverage dependent on projection-period. Intercept-only models yielded values below 20% for coverage. Bias increased and precision decreased for longer projection periods (>15 years) for all except intercept-only models. Precision was lowest for complex models such as BAPC models, generalized additive models with multivariate smoothers and generalized linear models with age x period interaction effects. Conclusion: The incAnalysis R package allows a straightforward comparison of cancer incidence rate projection approaches. Further detailed and targeted investigations into model performance in addition to the presented empirical results are recommended to derive guidance on appropriate statistical projection methods in a given setting.


2020 ◽  
Author(s):  
Dylan Marshall ◽  
Haobo Wang ◽  
Michael Stiffler ◽  
Justas Dauparas ◽  
Peter Koo ◽  
...  

AbstractIf disentangled properly, patterns distilled from evolutionarily related sequences of a given protein family can inform their traits - such as their structure and function. Recent years have seen an increase in the complexity of generative models towards capturing these patterns; from sitewise to pairwise to deep and variational. In this study we evaluate the degree of structure and fitness patterns learned by a suite of progressively complex models. We introduce pairwise saliency, a novel method for evaluating the degree of captured structural information. We also quantify the fitness information learned by these models by using them to predict the fitness of mutant sequences and then correlate these predictions against their measured fitness values. We observe that models that inform structure do not necessarily inform fitness and vice versa, contrasting recent claims in this field. Our work highlights a dearth of consistency across fitness assays as well as divergently provides a general approach for understanding the pairwise decomposable relations learned by a given generative sequence model.


Sign in / Sign up

Export Citation Format

Share Document