scholarly journals A community-maintained standard library of population genetic models

Author(s):  
Jeffrey R. Adrion ◽  
Christopher B. Cole ◽  
Noah Dukler ◽  
Jared G. Galloway ◽  
Ariella L. Gladstein ◽  
...  

AbstractThe explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Jeffrey R Adrion ◽  
Christopher B Cole ◽  
Noah Dukler ◽  
Jared G Galloway ◽  
Ariella L Gladstein ◽  
...  

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.


2015 ◽  
Author(s):  
Olivier Mazet ◽  
Willy Rodríguez ◽  
Simona Grusea ◽  
Simon Boitard ◽  
Lounès Chikhi

Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods ignore population structure and reconstruct a history characterized by population size changes under the assumption that species behave as panmictic units. This is potentially problematic since population structure can generate spurious signals of population size change. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. In a context of model uncertainty (panmixia \textit{versus} structure) genomic data may thus not necessarily lead to improved statistical inference. We consider two haploid genomes and develop a theory which explains why any demographic model (with or without population size changes) will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We introduce a new parameter, the IICR (inverse instantaneous coalescence rate), and show that it is equivalent to a population size only in panmictic models, and mostly misleading for structured models. We argue that this general issue affects all population genetics methods ignoring population structure. We take the PSMC method as an example and show that it infers population size changes that never took place. We apply our approach to human genomic data and find a reduction in gene flow at the start of the Pleistocene, a major increase throughout the Middle-Pleistocene, and an abrupt disconnection preceding the emergence of modern humans.


Author(s):  
John Anderson

Research in natural resource management may be characterized as a search for an understanding of patterns and processes relating to a particular resource. Modeling is a crucial tool to these efforts: resource scientists use such models to help them conceptualize, understand, test, predict, or assess various aspects of the resource being studied. One central function, however, underlies all of these uses: a model simulates the way in which a real system would behave under conditions of interest to the user, and illustrates changes over time. Such a model may be used to determine the consequences of particular situations, leaving judgment of the attractiveness of those consequences to the user. Particularly in the case of complex ecosystems, such a model may also serve to clarify interactions and contribute to a deeper understanding of ecological phenomena. In recent years, computer-based models have become the most significant tool of resource managers, for two reasons. First, any model must accurately portray the real system it represents if research based on the model is to have any reliability. The use of computer technology has greatly increased the extent and the detail to which ecosystems can be modeled, and thus the accuracy of these models. The other reason for the extensive use of computer models is the flexibility that the computer as a tool brings to the modeling process. Many ecosystems are poorly understood, and complex models for such poorly understood systems are almost never completed. Rather, modeling such a system is an iterative process, with a partial understanding generating new hypotheses, which in turn generate changes to the model based on further research. Computer technology brings flexibility and ease of modification to the modeling process, naturally supporting this iterative development. In addition, as the alternatives available in resolving resource management problems become increasingly expensive, and the resources themselves become increasingly scarce and valuable, such models become vital tools not only in the direct management of resources, but in the control of expenses associated with resource management as well.


2013 ◽  
Vol 5 (2) ◽  
pp. 55-77 ◽  
Author(s):  
Anthony H. Dekker

In this paper, the author explores epistemological aspects of simulation with a particular focus on using simulations to provide recommendations to managers and other decision-makers. The author presents formal definitions of knowledge (as justified true belief) and of simulation. The author shows that a simple model, the Kuramoto model of coupled-oscillators, satisfies the simulation definition (and therefore generates knowledge) through a justified mapping from the real world. The author argues that, for more complex models, such a justified mapping requires three techniques: using an appropriate and justified theoretical construct; using appropriate and justified values for model parameters; and testing or other verification processes to ensure that the mapping is correctly defined. The author illustrates these three techniques with experiments and models from the literature, including the Long House Valley model of Axtell et al., the SAFTE model of sleep, and the Segregation model of Wilensky.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Patrick P. Putnam ◽  
Philip A. Wilsey ◽  
Ge Zhang

Author(s):  
Martin Kapun ◽  
Joaquin C B Nunez ◽  
María Bogaerts-Márquez ◽  
Jesús Murga-Moreno ◽  
Margot Paris ◽  
...  

Abstract Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in > 20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.


Author(s):  
Aleksandar Stevanovic ◽  
Nikola Mitrovic

The current method of organizing traffic flows in urban networks uses directional right-of-way links to move traffic between urban intersections. Conflict resolution between vehicles is almost exclusively exercised at the intersections, which turns them into bottlenecks of our urban traffic systems. Even an attempt to model a different organization of traffic hits a major barrier, because the traditional simulation models do not offer enough flexibility to model bidirectional traffic on individual links in the network. This paper presents flexible arterial utilization simulation modeling (FAUSIM), a novel microsimulation platform designed to address this deficiency of traditional tools. The outputs from this tool are validated, successfully, in comparison with a commonly utilized Vissim model. The paper then illustrates the ability of FAUSIM to model conventional and unconventional traffic control scenarios. A combined alternate-direction lane assignment and reservation-based intersection control (CADLARIC) scenario is where directional driving paths are altered between neighboring lanes to align vehicles for decreased conflict for left and right turns at intersections where a reservation-based algorithm is utilized to process conflicts. This is compared with a conventional fixed-time (FT) control. The results of the experiments, executed on a small three-intersection corridor, show that CADLARIC significantly outperforms conventional driving with the FT control in relation to traffic efficiency (delays and stops). While the FT control generates fewer (potential) conflicting events, the CADLARIC confidently handles conflicting situations inside and outside the intersections. Future research should further validate the FAUSIM platform and investigate several other unconventional traffic scenarios with connected and automated vehicles.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1130-D1137 ◽  
Author(s):  
María Peña-Chilet ◽  
Gema Roldán ◽  
Javier Perez-Florido ◽  
Francisco M Ortuño ◽  
Rosario Carmona ◽  
...  

Abstract The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.


Sign in / Sign up

Export Citation Format

Share Document