scholarly journals Efficient pedigree recording for fast population genetics simulation

2018 ◽  
Author(s):  
Jerome Kelleher ◽  
Kevin R. Thornton ◽  
Jaime Ashanderf ◽  
Peter L. Ralph

AbstractIn this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly ‘simplify’ a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation’s entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.Author SummarySexually reproducing organisms are related to the others in their species by the complex web of parent-offspring relationships that constitute the pedigree. In this paper, we describe a way to record all of these relationships, as well as how genetic material is passed down through the pedigree, during a forwards-time population genetic simulation. To make effective use of this information, we describe both efficient storage methods for this embellished pedigree as well as a way to remove all information that is irrelevant to the genetic history of a given set of individuals, which dramatically reduces the required amount of storage space. Storing this information allows us to produce whole-genome sequence from simulations of large populations in which we have not explicitly recorded new genomic mutations; we find that this results in computational run times of up to 50 times faster than simulations forced to explicitly carry along that information.

2015 ◽  
Author(s):  
PingHsun Hsieh ◽  
Krishna R Veeramah ◽  
Joseph Lachance ◽  
Sarah A Tishkoff ◽  
Jeffrey D Wall ◽  
...  

African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 years ago. We also find that bi-directional asymmetric gene-flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors.


2019 ◽  
Author(s):  
Shuai Sun ◽  
Yue Wang ◽  
Xiao Du ◽  
Lei Li ◽  
Xiaoning Hong ◽  
...  

AbstractMekong tiger perch (Datnioides undecimradiatus) is one ornamental fish and a vulnerable species, which belongs to order Lobotiformes. Here, we report a ∼595 Mb D. undecimradiatus genome, which is the first whole genome sequence in the order Lobotiformes. Based on this genome, the phylogenetic tree analysis suggested that Lobotiformes and Sciaenidae are closer than Tetraodontiformes, resolving a long-time dispute. We depicted the pigment synthesis pathway in Mekong tiger perch and result confirmed that this pathway had evolved from the shared whole genome duplication. We also estimated the demographic history of Mekong tiger perch, showing the effective population size suffered a continuous reduction possibly related to the contraction of immune-related genes. Our study provided a reference genome resource for the Lobotiformes, as well as insights into the phylogeny of Eupercaria and biological conservation.


2018 ◽  
Vol 7 (12) ◽  
Author(s):  
Mohamed A. Abouelkhair ◽  
Rebecca E. Rifkin ◽  
Remiqiusz M. Grzeskowiak ◽  
Alexandru S. Biris ◽  
David E. Anderson ◽  
...  

Staphylococcus aureus is the causative agent of multiple infections, including bacteremia, infective endocarditis, osteomyelitis, septic arthritis, and prosthetic device infections. We report here the first whole-genome sequence for four S. aureus sequence type 398 isolates from clinical cases of osteomyelitis in four goats with a history of orthopedic surgery.


2003 ◽  
Vol 185 (4) ◽  
pp. 1316-1325 ◽  
Author(s):  
David H. Spencer ◽  
Arnold Kas ◽  
Eric E. Smith ◽  
Christopher K. Raymond ◽  
Elizabeth H. Sims ◽  
...  

ABSTRACT Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 474 ◽  
Author(s):  
Yoji Igarashi ◽  
Hong Zhang ◽  
Engkong Tan ◽  
Masashi Sekino ◽  
Kazutoshi Yoshitake ◽  
...  

The Japanese eel (Anguilla japonica), European eel (Anguilla anguilla), and American eel (Anguilla rostrata) are migratory, catadromous, temperate zone fish sharing several common life cycle features. The population genetics of panmixia in these eel species has already been investigated. Our extensive population genetics analysis was based on 1400 Gb of whole-genome sequence (WGS) data from 84 eels. It demonstrated that a Japanese eel group from the Kuma River differed from other populations of the same species. Even after removing the potential adapted/selected single nucleotide polymorphism (SNP) data, and with very small differences (fixation index [Fst] = 0.01), we obtained results consistently indicating that panmixia does not occur in Japanese eels. The life cycle of the Japanese eel is well-established and the Kuma River is in the center of its habitat. Nevertheless, simple reproductive isolation is not the probable cause of non-panmixia in this species. We propose that the combination of spawning area subdivision, philopatry, and habitat preference/avoidance accounts for the non-panmixia in the Japanese eel population. We named this hypothesis the “reproductive isolation like subset mapping” (RISM) model. This finding may be indicative of the initial stages of sympatric speciation in these eels.


Sign in / Sign up

Export Citation Format

Share Document