Widespread selection against deleterious mutations in the Drosophila genome

Mapping Intimacies ◽

10.1101/2020.02.12.946392 ◽

2020 ◽

Author(s):

Pavel Khromov ◽

Alexandre V. Morozov

Keyword(s):

Population Genetics ◽

Steady State ◽

Genomic Sequences ◽

Mutation Rates ◽

Drosophila Genome ◽

Deleterious Mutations ◽

Effective Population ◽

Ewens Sampling Formula ◽

Selective Constraints ◽

Sampling Formula

AbstractWe have developed a computational approach to simultaneous genome-wide inference of key population genetics parameters: selection strengths, mutation rates rescaled by the effective population size and the fraction of viable genotypes, solely from an alignment of genomic sequences sampled from the same population. Our approach is based on a generalization of the Ewens sampling formula, used to compute steady-state probabilities of allelic counts in a neutrally evolving population, to populations subjected to selective constraints. Patterns of polymorphisms observed in alignments of genomic sequences are used as input to Approximate Bayesian Computation, which employs the generalized Ewens sampling formula to infer the distributions of population genetics parameters. After carrying out extensive validation of our approach on synthetic data, we have applied it to the evolution of the Drosophila melanogaster genome, where an alignment of 197 genomic sequences is available for a single ancestral-range population from Zambia, Africa. We have divided the Drosophila genome into 100-bp windows and assumed that sequences in each window can exist in either low- or high-fitness state. Thus, the steady-state population in our model is subject to a constant influx of deleterious mutations, which shape the observed frequencies of allelic counts in each window. Our approach, which focuses on deleterious mutations and accounts for intra-window linkage and epistasis, provides an alternative description of background selection. We find that most of the Drosophila genome evolves under selective constraints imposed by deleterious mutations. These constraints are not confined to known functional regions of the genome such as coding sequences and may reflect global biological processes such as the necessity to maintain chromatin structure. Furthermore, we find that inference of mutation rates in the presence of selection leads to mutation rate estimates that are several-fold higher than neutral estimates widely used in the literature. Our computational pipeline can be used in any organism for which a sample of genomic sequences from the same population is available.

Download Full-text

Random partitions in population genetics

Proceedings of the Royal Society of London Series A - Mathematical and Physical Sciences ◽

10.1098/rspa.1978.0089 ◽

1978 ◽

Vol 361 (1704) ◽

pp. 1-20 ◽

Cited By ~ 96

Keyword(s):

Population Genetics ◽

Genetic Variation ◽

Charge State ◽

Characteristic Property ◽

Large Population ◽

Recurrent Mutation ◽

Random Partitions ◽

Ewens Sampling Formula ◽

Sampling Formula ◽

State Models

This paper is concerned with models for the genetic variation of a sample of gametes from a large population. The need for consistency between different sample sizes limits the mathematical possibilities to what are here called ‘partition structures Distinctive among them is the structure described by the Ewens sampling formula, which is shown to enjoy a characteristic property of non-interference between the different alleles. This characterization explains the robustness of the Ewens formula when neither selection nor recurrent mutation is significant, although different structures arise from selective and ‘charge-state’ models

Download Full-text

A functional central limit theorem for the Ewens sampling formula

Journal of Applied Probability ◽

10.2307/3214593 ◽

1990 ◽

Vol 27 (1) ◽

pp. 28-43 ◽

Cited By ~ 13

Author(s):

Jennie C. Hansen

Keyword(s):

Population Genetics ◽

Central Limit Theorem ◽

Limit Theorem ◽

Central Limit ◽

Step Function ◽

Frequency Limit ◽

Ewens Sampling Formula ◽

Sampling Formula ◽

Partition Structure ◽

Functional Central Limit

For each n > 0, the Ewens sampling formula from population genetics is a measure on the set of all partitions of the integer n. To determine the limiting distributions for the part sizes of a partition with respect to the measures given by this formula, we associate to each partition a step function on [0, 1]. Each jump in the function equals the number of parts in the partition of a certain size. We normalize these functions and show that the induced measures on D[0, 1] converge to Wiener measure. This result complements Kingman's frequency limit theorem [10] for the Ewens partition structure.

Download Full-text

Generalization of the Ewens sampling formula to arbitrary fitness landscapes

10.1101/065011 ◽

2016 ◽

Author(s):

Pavel Khromov ◽

Constantin D. Malliaris ◽

Alexandre V. Morozov

Keyword(s):

Steady State ◽

High Throughput Sequencing ◽

Sequence Data ◽

Allelic Diversity ◽

Fitness Landscapes ◽

Selection Signatures ◽

Ewens Sampling Formula ◽

Regulatory Modules ◽

Sampling Formula ◽

Selection Probabilities

AbstractIn considering evolution of transcribed regions, regulatory modules, and other genomic loci of interest, we are often faced with a situation in which the number of allelic states greatly exceeds the population size. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from a population, do not change with time. In the absence of selection, probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary, possibly epistatic, fitness landscapes. Although our approach is general, we focus on the class of landscapes in which alleles are grouped into two, three, or several fitness states. This class of landscapes yields sampling probabilities that are computationally more tractable, and can form a basis for the inference of selection signatures from sequence data. We demonstrate that, for a sizeable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other key evolutionary parameters, using high-throughput sequencing of evolving populations to collect data on locus polymorphisms. We also carry out numerical investigation of various approximations involved in deriving our sampling formulas, such as the infinite allele limit and the “full connectivity” assumption in which each allele can mutate into any other allele. We find that our theory remains sufficiently accurate even if these assumptions are relaxed. Thus, our framework establishes a theoretical foundation for inferring selection signatures from samples of sequences produced by evolution on epistatic fitness landscapes.

Download Full-text

A Pólya urn model and the coalescent

Journal of Applied Probability ◽

10.2307/3214786 ◽

1992 ◽

Vol 29 (1) ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Gudrun Trieb

Keyword(s):

Population Genetics ◽

Urn Model ◽

Ewens Sampling Formula ◽

Sampling Formula ◽

Pólya Urn ◽

Polya Urn ◽

Pólya Urn Model

In recent papers by Hoppe and Donnelly it has been shown that a Pólya urn model generating the Ewens sampling formula (population genetics) parallels a construction of Kingman using a Poisson–Dirichlet ‘paintbox'. Even the jump chain of Kingman's n-coalescent can be constructed using the urn. The properties of a certain process based on the coalescent also are derived. This process was introduced by Hoppe.

Download Full-text

A functional central limit theorem for the Ewens sampling formula

Journal of Applied Probability ◽

10.1017/s0021900200038407 ◽

1990 ◽

Vol 27 (01) ◽

pp. 28-43 ◽

Cited By ~ 4

Author(s):

Jennie C. Hansen

Keyword(s):

Population Genetics ◽

Central Limit Theorem ◽

Limit Theorem ◽

Central Limit ◽

Step Function ◽

Frequency Limit ◽

Ewens Sampling Formula ◽

Sampling Formula ◽

Partition Structure ◽

Functional Central Limit

Download Full-text

Evolution of mutation rates in hypermutable populations of Escherichia coli propagated at very small effective population size

Biology Letters ◽

10.1098/rsbl.2016.0849 ◽

2017 ◽

Vol 13 (3) ◽

pp. 20160849 ◽

Cited By ~ 7

Author(s):

Tanya Singh ◽

Meredith Hyun ◽

Paul Sniegowski

Keyword(s):

Escherichia Coli ◽

Genetic Variation ◽

Mutation Rate ◽

Mutation Rates ◽

Deleterious Mutations ◽

Effective Population ◽

Ultimate Source ◽

Systematic Evolution ◽

Small Effective Size ◽

Small Effective Population Size

Mutation is the ultimate source of the genetic variation—including variation for mutation rate itself—that fuels evolution. Natural selection can raise or lower the genomic mutation rate of a population by changing the frequencies of mutation rate modifier alleles associated with beneficial and deleterious mutations. Existing theory and observations suggest that where selection is minimized, rapid systematic evolution of mutation rate either up or down is unlikely. Here, we report systematic evolution of higher and lower mutation rates in replicate hypermutable Escherichia coli populations experimentally propagated at very small effective size—a circumstance under which selection is greatly reduced. Several populations went extinct during this experiment, and these populations tended to evolve elevated mutation rates. In contrast, populations that survived to the end of the experiment tended to evolve decreased mutation rates. We discuss the relevance of our results to current ideas about the evolution, maintenance and consequences of high mutation rates.

Download Full-text

A Pólya urn model and the coalescent

Journal of Applied Probability ◽

10.1017/s0021900200106576 ◽

1992 ◽

Vol 29 (01) ◽

pp. 1-10

Author(s):

Gudrun Trieb

Keyword(s):

Population Genetics ◽

Urn Model ◽

Ewens Sampling Formula ◽

Sampling Formula ◽

Pólya Urn ◽

Polya Urn ◽

Pólya Urn Model

Download Full-text

Central limit theorem for the prefix exchange distance under Ewens sampling formula

Discrete Mathematics ◽

10.1016/j.disc.2020.112206 ◽

2021 ◽

Vol 344 (2) ◽

pp. 112206

Author(s):

Simona Grusea ◽

Anthony Labarre

Keyword(s):

Central Limit Theorem ◽

Limit Theorem ◽

Central Limit ◽

Ewens Sampling Formula ◽

Sampling Formula

Download Full-text

Comprehensive genotyping of a Brazilian cassava (Manihot esculenta Crantz) germplasm bank: insights into diversification and domestication

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03775-5 ◽

2021 ◽

Vol 134 (5) ◽

pp. 1343-1362

Author(s):

Alex C. Ogbonna ◽

Luciano Rogerio Braatz de Andrade ◽

Lukas A. Mueller ◽

Eder Jorge de Oliveira ◽

Guillaume J. Bauchet

Keyword(s):

Population Genetics ◽

Population Structure ◽

Manihot Esculenta ◽

Ex Situ ◽

Germplasm Bank ◽

Effective Population ◽

Manihot Esculenta Crantz ◽

Genetic Groups ◽

Population Structure Analysis ◽

Modern Breeding

Abstract Key message Brazilian cassava diversity was characterized through population genetics and clustering approaches, highlighting contrasted genetic groups and spatial genetic differentiation. Abstract Cassava (Manihot esculenta Crantz) is a major staple root crop of the tropics, originating from the Amazonian region. In this study, 3354 cassava landraces and modern breeding lines from the Embrapa Cassava Germplasm Bank (CGB) were characterized. All individuals were subjected to genotyping-by-sequencing (GBS), identifying 27,045 single-nucleotide polymorphisms (SNPs). Identity-by-state and population structure analyses revealed a unique set of 1536 individuals and 10 distinct genetic groups with heterogeneous linkage disequilibrium (LD). On this basis, a density of 1300–4700 SNP markers were selected for large-effect quantitative trait loci (QTL) detection. Identified genetic groups were further characterized for population genetics parameters including minor allele frequency (MAF), observed heterozygosity $$({H}_{o})$$ ( H o ) , effective population size estimate $$\widehat{{(N}_{e}}$$ ( N e ^ ) and polymorphism information content (PIC). Selection footprints and introgressions of M. glaziovii were detected. Spatial population structure analysis revealed five ancestral populations related to distinct Brazilian ecoregions. Estimation of historical relationships among identified populations suggests an early population split from Amazonian to Atlantic forest and Caatinga ecoregions and active gene flows. This study provides a thorough genetic characterization of ex situ germplasm resources from cassava’s center of origin, South America, with results shedding light on Brazilian cassava characteristics and its biogeographical landscape. These findings support and facilitate the use of genetic resources in modern breeding programs including implementation of association mapping and genomic selection strategies.

Download Full-text

The Y‐STR landscape of coastal southeastern han: Forensic characteristics, haplotype analyses, mutation rates, and population genetics

Electrophoresis ◽

10.1002/elps.202100037 ◽

2021 ◽

Author(s):

Haoliang Fan ◽

Ying Zeng ◽

Weiwei Wu ◽

Hong Liu ◽

Quyi Xu ◽

...

Keyword(s):

Population Genetics ◽

Mutation Rates

Download Full-text