Sporadic Occurrence of Recent Selective Sweeps from Standing Variation in Humans as Revealed by an Approximate Bayesian Computation Approach

Genetics ◽  
2021 ◽  
Author(s):  
Guillaume Laval ◽  
Etienne Patin ◽  
Pierre Boutillier ◽  
Lluis Quintana-Murci

Abstract During their dispersals over the last 100,000 years, modern humans have been exposed to a large variety of environments, resulting in genetic adaptation. While genome-wide scans for the footprints of positive Darwinian selection have increased knowledge of genes and functions potentially involved in human local adaptation, they have globally produced evidence of a limited contribution of selective sweeps in humans. Conversely, studies based on machine learning algorithms suggest that recent sweeps from standing variation are widespread in humans, an observation that has been recently questioned. Here, we sought to formally quantify the number of recent selective sweeps in humans, by leveraging approximate Bayesian computation and whole-genome sequence data. Our computer simulations revealed suitable ABC estimations, regardless of the frequency of the selected alleles at the onset of selection and the completion of sweeps. Under a model of recent selection from standing variation, we inferred that an average of 68 (from 56 to 79) and 140 (from 94 to 198) sweeps occurred over the last 100,000 years of human history, in African and Eurasian populations, respectively. The former estimation is compatible with human adaptation rates estimated since divergence with chimps, and reveal numbers of sweeps per generation per site in the range of values estimated in Drosophila. Our results confirm the rarity of selective sweeps in humans and show a low contribution of sweeps from standing variation to recent human adaptation.

Author(s):  
Guillaume Laval ◽  
Etienne Patin ◽  
Pierre Boutillier ◽  
Lluis Quintana-Murci

Over the last 100,000 years, humans have spread across the globe and encountered a highly diverse set of environments to which they have had to adapt. Genome-wide scans of selection are powerful to detect selective sweeps. However, because of unknown fractions of undetected sweeps and false discoveries, the numbers of detected sweeps often poorly reflect actual numbers of selective sweeps in populations. The thousands of soft sweeps on standing variation recently evidenced in humans have also been interpreted as a majority of mis-classified neutral regions. In such a context, the extent of human adaptation remains little understood. We present a new rationale to estimate these actual numbers of sweeps expected over the last 100,000 years (denoted by X) from genome-wide population data, both considering hard sweeps and selective sweeps on standing variation. We implemented an approximate Bayesian computation framework and showed, based on computer simulations, that such a method can properly estimate X. We then jointly estimated the number of selective sweeps, their mean intensity and age in several 1000G African, European and Asian populations. Our estimations of X, found weakly sensitive to demographic misspecifications, revealed very limited numbers of sweeps regardless the frequency of the selected alleles at the onset of selection and the completion of sweeps. We estimated ∼80 sweeps in average across fifteen 1000G populations when assuming incomplete sweeps only and ∼140 selective sweeps in non-African populations when incorporating complete sweeps in our simulations. The method proposed may help to address controversies on the number of selective sweeps in populations, guiding further genome-wide investigations of recent positive selection.


2020 ◽  
Author(s):  
Erik R Funk ◽  
Garth M Spellman ◽  
Kevin Winker ◽  
Jack J Withrow ◽  
Kristen C Ruegg ◽  
...  

Abstract Understanding how gene flow affects population divergence and speciation remains challenging. Differentiating one evolutionary process from another can be difficult because multiple processes can produce similar patterns, and more than one process can occur simultaneously. Although simple population models produce predictable results, how these processes balance in taxa with patchy distributions and complicated natural histories is less certain. These types of populations might be highly connected through migration (gene flow), but can experience stronger effects of genetic drift and inbreeding, or localized selection. Although different signals can be difficult to separate, the application of high-throughput sequence data can provide the resolution necessary to distinguish many of these processes. We present whole-genome sequence data for an avian species group with an alpine and arctic tundra distribution to examine the role that different population genetic processes have played in their evolutionary history. Rosy-finches inhabit high elevation mountaintop sky islands and high-latitude island and continental tundra. They exhibit extensive plumage variation coupled with low levels of genetic variation. Additionally, the number of species within the complex is debated, making them excellent for studying the forces involved in the process of diversification, as well as an important species group in which to investigate species boundaries. Total genomic variation suggests a broadly continuous pattern of allele frequency changes across the mainland taxa of this group in North America. However, phylogenomic analyses recover multiple distinct, well supported, groups that coincide with previously described morphological variation and current species-level taxonomy. Tests of introgression using D-statistics and approximate Bayesian computation reveal significant levels of introgression between multiple North American taxa. These results provide insight into the balance between divergent and homogenizing population genetic processes and highlight remaining challenges in interpreting conflict between different types of analytical approaches with whole-genome sequence data. [ABBA-BABA; approximate Bayesian computation; gene flow; phylogenomics; speciation; whole-genome sequencing.]


PLoS Genetics ◽  
2016 ◽  
Vol 12 (3) ◽  
pp. e1005877 ◽  
Author(s):  
Simon Boitard ◽  
Willy Rodríguez ◽  
Flora Jay ◽  
Stefano Mona ◽  
Frédéric Austerlitz

2014 ◽  
Author(s):  
Camille Roux ◽  
John Pannell

Despite its importance in the diversification of many eucaryote clades, particularly plants, detailed genomic analysis of polyploid species is still in its infancy, with published analysis of only a handful of model species to date. Fundamental questions concerning the origin of polyploid lineages (e.g., auto- vs. allopolyploidy) and the extent to which polyploid genomes display different modes of inheritance are poorly resolved for most polyploids, not least because they have hitherto required detailed karyotypic analysis or the analysis of allele segregation at multiple loci in pedigrees or artificial crosses, which are often not practical for non-model species. However, the increasing availability of sequence data for non-model species now presents an opportunity to apply established approaches for the evolutionary analysis of genomic data to polyploid species complexes. Here, we ask whether approximate Bayesian computation (ABC), applied to sequence data produced by next-generation sequencing technologies from polyploid taxa, allows correct inference of the evolutionary and demographic history of polyploid lineages and their close relatives. We use simulations to investigate how the number of sampled individuals, the number of surveyed loci and their length affect the accuracy and precision of evolutionary and demographic inferences by ABC, including the mode of polyploidisation, mode of inheritance of polyploid taxa, the relative timing of genome duplication and speciation, and effective populations sizes of contributing lineages. We also apply the ABC framework we develop to sequence data from diploid and polyploidy species of the plant genus Capsella, for which we infer an allopolyploid origin for tetra C. bursa-pastoris ≈ 90,000 years ago. In general, our results indicate that ABC is a promising and powerful method for uncovering the origin and subsequent evolution of polyploid species.


2020 ◽  
Vol 635 ◽  
pp. A136 ◽  
Author(s):  
G. Aufort ◽  
L. Ciesla ◽  
P. Pudlo ◽  
V. Buat

Although galaxies are found to follow a tight relation between their star formation rate and stellar mass, they are expected to exhibit complex star formation histories (SFH) with short-term fluctuations. The goal of this pilot study is to present a method that identifies galaxies that undergo strong variation in star formation activity in the last ten to some hundred million years. In other words, the proposed method determines whether a variation in the last few hundred million years of the SFH is needed to properly model the spectral energy distribution (SED) rather than a smooth normal SFH. To do so, we analyzed a sample of COSMOS galaxies with 0.5 <  z <  1 and log M* >  8.5 using high signal-to-noise ratio broadband photometry. We applied approximate Bayesian computation, a custom statistical method for performing model choice, which is associated with machine-learning algorithms to provide the probability that a flexible SFH is preferred based on the observed flux density ratios of galaxies. We present the method and test it on a sample of simulated SEDs. The input information fed to the algorithm is a set of broadband UV to NIR (rest-frame) flux ratios for each galaxy. The choice of using colors is made to remove any difficulty linked to normalization when classification algorithms are used. The method has an error rate of 21% in recovering the correct SFH and is sensitive to SFR variations larger than 1 dex. A more traditional SED-fitting method using CIGALE is tested to achieve the same goal, based on fit comparisons through the Bayesian information criterion, but the best error rate we obtained is higher, 28%. We applied our new method to the COSMOS galaxies sample. The stellar mass distribution of galaxies with a strong to decisive evidence against the smooth delayed-τ SFH peaks at lower M* than for galaxies where the smooth delayed-τ SFH is preferred. We discuss the fact that this result does not come from any bias due to our training. Finally, we argue that flexible SFHs are needed to be able to cover the largest possible SFR-M* parameter space.


2019 ◽  
Vol 36 (11) ◽  
pp. 2451-2461 ◽  
Author(s):  
Bo-Wen Zhang ◽  
Lin-Lin Xu ◽  
Nan Li ◽  
Peng-Cheng Yan ◽  
Xin-Hua Jiang ◽  
...  

Abstract Persian walnut (Juglans regia) is cultivated worldwide for its high-quality wood and nuts, but its origin has remained mysterious because in phylogenies it occupies an unresolved position between American black walnuts and Asian butternuts. Equally unclear is the origin of the only American butternut, J. cinerea. We resequenced the whole genome of 80 individuals from 19 of the 22 species of Juglans and assembled the genome of its relatives Pterocarya stenoptera and Platycarya strobilacea. Using phylogenetic-network analysis of single-copy nuclear genes, genome-wide site pattern probabilities, and Approximate Bayesian Computation, we discovered that J. regia (and its landrace J. sigillata) arose as a hybrid between the American and the Asian lineages and that J. cinerea resulted from massive introgression from an immigrating Asian butternut into the genome of an American black walnut. Approximate Bayesian Computation modeling placed the hybrid origin in the late Pliocene, ∼3.45 My, with both parental lineages since having gone extinct in Europe.


2018 ◽  
Author(s):  
Flora Jay ◽  
Simon Boitard ◽  
Frédéric Austerlitz

AbstractSpecies generally undergo a complex demographic history, consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large datasets. Here we designed an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation (ABC), a simulation-based statistical framework that allows (i) identifying the best demographic scenario among several competing scenarios, and (ii) estimating the best-fitting parameters under the chosen scenario. ABC relies on the computation of summary statistics. Using a cross-validation approach, we showed that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (eg heterozygosity, Tajima’s D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrated the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally showed that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion was the most relevant for Eurasian populations.


2014 ◽  
Author(s):  
Krishna Veeramah ◽  
August E Woerner ◽  
Laurel Johnstone ◽  
Ivo Gut ◽  
Marta Gut ◽  
...  

Gibbons are believed to have diverged from the larger great apes ~16.8 Mya and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the familyHylobatidaeis divided into four genera,Nomascus,Symphalangus,HoolockandHylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mtDNA, the Y chromosome, and short autosomal sequences have been inconclusive. To examine the relationships among gibbon genera in more depth, we performed 2nd generation whole genome sequencing to a mean of ~15X coverage in two individuals from each genus. We developed a coalescent-based Approximate Bayesian Computation method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. AlthoughHoolockandSymphalangusare likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Our combined results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1x10-9/site/year this speciation process occurred ~5 Mya during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera.


Sign in / Sign up

Export Citation Format

Share Document