scholarly journals Widespread selection against deleterious mutations in the Drosophila genome

2020 ◽  
Author(s):  
Pavel Khromov ◽  
Alexandre V. Morozov

AbstractWe have developed a computational approach to simultaneous genome-wide inference of key population genetics parameters: selection strengths, mutation rates rescaled by the effective population size and the fraction of viable genotypes, solely from an alignment of genomic sequences sampled from the same population. Our approach is based on a generalization of the Ewens sampling formula, used to compute steady-state probabilities of allelic counts in a neutrally evolving population, to populations subjected to selective constraints. Patterns of polymorphisms observed in alignments of genomic sequences are used as input to Approximate Bayesian Computation, which employs the generalized Ewens sampling formula to infer the distributions of population genetics parameters. After carrying out extensive validation of our approach on synthetic data, we have applied it to the evolution of the Drosophila melanogaster genome, where an alignment of 197 genomic sequences is available for a single ancestral-range population from Zambia, Africa. We have divided the Drosophila genome into 100-bp windows and assumed that sequences in each window can exist in either low- or high-fitness state. Thus, the steady-state population in our model is subject to a constant influx of deleterious mutations, which shape the observed frequencies of allelic counts in each window. Our approach, which focuses on deleterious mutations and accounts for intra-window linkage and epistasis, provides an alternative description of background selection. We find that most of the Drosophila genome evolves under selective constraints imposed by deleterious mutations. These constraints are not confined to known functional regions of the genome such as coding sequences and may reflect global biological processes such as the necessity to maintain chromatin structure. Furthermore, we find that inference of mutation rates in the presence of selection leads to mutation rate estimates that are several-fold higher than neutral estimates widely used in the literature. Our computational pipeline can be used in any organism for which a sample of genomic sequences from the same population is available.

This paper is concerned with models for the genetic variation of a sample of gametes from a large population. The need for consistency between different sample sizes limits the mathematical possibilities to what are here called ‘partition structures Distinctive among them is the structure described by the Ewens sampling formula, which is shown to enjoy a characteristic property of non-interference between the different alleles. This characterization explains the robustness of the Ewens formula when neither selection nor recurrent mutation is significant, although different structures arise from selective and ‘charge-state’ models


1990 ◽  
Vol 27 (1) ◽  
pp. 28-43 ◽  
Author(s):  
Jennie C. Hansen

For each n > 0, the Ewens sampling formula from population genetics is a measure on the set of all partitions of the integer n. To determine the limiting distributions for the part sizes of a partition with respect to the measures given by this formula, we associate to each partition a step function on [0, 1]. Each jump in the function equals the number of parts in the partition of a certain size. We normalize these functions and show that the induced measures on D[0, 1] converge to Wiener measure. This result complements Kingman's frequency limit theorem [10] for the Ewens partition structure.


2016 ◽  
Author(s):  
Pavel Khromov ◽  
Constantin D. Malliaris ◽  
Alexandre V. Morozov

AbstractIn considering evolution of transcribed regions, regulatory modules, and other genomic loci of interest, we are often faced with a situation in which the number of allelic states greatly exceeds the population size. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from a population, do not change with time. In the absence of selection, probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary, possibly epistatic, fitness landscapes. Although our approach is general, we focus on the class of landscapes in which alleles are grouped into two, three, or several fitness states. This class of landscapes yields sampling probabilities that are computationally more tractable, and can form a basis for the inference of selection signatures from sequence data. We demonstrate that, for a sizeable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other key evolutionary parameters, using high-throughput sequencing of evolving populations to collect data on locus polymorphisms. We also carry out numerical investigation of various approximations involved in deriving our sampling formulas, such as the infinite allele limit and the “full connectivity” assumption in which each allele can mutate into any other allele. We find that our theory remains sufficiently accurate even if these assumptions are relaxed. Thus, our framework establishes a theoretical foundation for inferring selection signatures from samples of sequences produced by evolution on epistatic fitness landscapes.


1992 ◽  
Vol 29 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Gudrun Trieb

In recent papers by Hoppe and Donnelly it has been shown that a Pólya urn model generating the Ewens sampling formula (population genetics) parallels a construction of Kingman using a Poisson–Dirichlet ‘paintbox'. Even the jump chain of Kingman's n-coalescent can be constructed using the urn. The properties of a certain process based on the coalescent also are derived. This process was introduced by Hoppe.


1990 ◽  
Vol 27 (01) ◽  
pp. 28-43 ◽  
Author(s):  
Jennie C. Hansen

For each n > 0, the Ewens sampling formula from population genetics is a measure on the set of all partitions of the integer n. To determine the limiting distributions for the part sizes of a partition with respect to the measures given by this formula, we associate to each partition a step function on [0, 1]. Each jump in the function equals the number of parts in the partition of a certain size. We normalize these functions and show that the induced measures on D[0, 1] converge to Wiener measure. This result complements Kingman's frequency limit theorem [10] for the Ewens partition structure.


2017 ◽  
Vol 13 (3) ◽  
pp. 20160849 ◽  
Author(s):  
Tanya Singh ◽  
Meredith Hyun ◽  
Paul Sniegowski

Mutation is the ultimate source of the genetic variation—including variation for mutation rate itself—that fuels evolution. Natural selection can raise or lower the genomic mutation rate of a population by changing the frequencies of mutation rate modifier alleles associated with beneficial and deleterious mutations. Existing theory and observations suggest that where selection is minimized, rapid systematic evolution of mutation rate either up or down is unlikely. Here, we report systematic evolution of higher and lower mutation rates in replicate hypermutable Escherichia coli populations experimentally propagated at very small effective size—a circumstance under which selection is greatly reduced. Several populations went extinct during this experiment, and these populations tended to evolve elevated mutation rates. In contrast, populations that survived to the end of the experiment tended to evolve decreased mutation rates. We discuss the relevance of our results to current ideas about the evolution, maintenance and consequences of high mutation rates.


1992 ◽  
Vol 29 (01) ◽  
pp. 1-10
Author(s):  
Gudrun Trieb

In recent papers by Hoppe and Donnelly it has been shown that a Pólya urn model generating the Ewens sampling formula (population genetics) parallels a construction of Kingman using a Poisson–Dirichlet ‘paintbox'. Even the jump chain of Kingman's n-coalescent can be constructed using the urn. The properties of a certain process based on the coalescent also are derived. This process was introduced by Hoppe.


2021 ◽  
Vol 134 (5) ◽  
pp. 1343-1362
Author(s):  
Alex C. Ogbonna ◽  
Luciano Rogerio Braatz de Andrade ◽  
Lukas A. Mueller ◽  
Eder Jorge de Oliveira ◽  
Guillaume J. Bauchet

Abstract Key message Brazilian cassava diversity was characterized through population genetics and clustering approaches, highlighting contrasted genetic groups and spatial genetic differentiation. Abstract Cassava (Manihot esculenta Crantz) is a major staple root crop of the tropics, originating from the Amazonian region. In this study, 3354 cassava landraces and modern breeding lines from the Embrapa Cassava Germplasm Bank (CGB) were characterized. All individuals were subjected to genotyping-by-sequencing (GBS), identifying 27,045 single-nucleotide polymorphisms (SNPs). Identity-by-state and population structure analyses revealed a unique set of 1536 individuals and 10 distinct genetic groups with heterogeneous linkage disequilibrium (LD). On this basis, a density of 1300–4700 SNP markers were selected for large-effect quantitative trait loci (QTL) detection. Identified genetic groups were further characterized for population genetics parameters including minor allele frequency (MAF), observed heterozygosity $$({H}_{o})$$ ( H o ) , effective population size estimate $$\widehat{{(N}_{e}}$$ ( N e ^ ) and polymorphism information content (PIC). Selection footprints and introgressions of M. glaziovii were detected. Spatial population structure analysis revealed five ancestral populations related to distinct Brazilian ecoregions. Estimation of historical relationships among identified populations suggests an early population split from Amazonian to Atlantic forest and Caatinga ecoregions and active gene flows. This study provides a thorough genetic characterization of ex situ germplasm resources from cassava’s center of origin, South America, with results shedding light on Brazilian cassava characteristics and its biogeographical landscape. These findings support and facilitate the use of genetic resources in modern breeding programs including implementation of association mapping and genomic selection strategies.


Sign in / Sign up

Export Citation Format

Share Document