scholarly journals SNP-level FST outperforms window statistics for detecting soft sweeps in local adaptation

2022 ◽  
Author(s):  
Tiago da Silva Ribeiro ◽  
José A Galván ◽  
John E Pool

Local adaptation can lead to elevated genetic differentiation at the targeted genetic variant and nearby sites. Selective sweeps come in different forms, and depending on the initial and final frequencies of a favored variant, very different patterns of genetic variation may be produced. If local selection favors an existing variant that had already recombined onto multiple genetic backgrounds, then the width of elevated genetic differentiation (high FST) may be too narrow to detect using a typical windowed genome scan, even if the targeted variant becomes highly differentiated. We therefore used a simulation approach to investigate the power of SNP-level FST (specifically, the maximum SNP FST value within a window) to detect diverse scenarios of local adaptation, and compared it against whole-window FST and the Comparative Haplotype Identity statistic. We found that SNP FST had superior power to detect complete or mostly complete soft sweeps, but lesser power than window-wide statistics to detect partial hard sweeps. To investigate the relative enrichment and nature of SNP FST outliers from real data, we applied the two FST statistics to a panel of Drosophila melanogaster populations. We found that SNP FST had a genome-wide enrichment of outliers compared to demographic expectations, and though it yielded a lesser enrichment than window FST, it detected mostly unique outlier genes and functional categories. Our results suggest that SNP FST is highly complementary to typical window-based approaches for detecting local adaptation, and merits inclusion in future genome scans and methodologies.

2015 ◽  
Author(s):  
Tim B Bigdeli ◽  
Donghyung Lee ◽  
Brien P Riley ◽  
Vladimir I Vladimirov ◽  
Ayman H Fanous ◽  
...  

Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.


2017 ◽  
Author(s):  
Xing Chen ◽  
Yi-Hsiang Hsu

AbstractPleiotropic effects occur when a single genetic variant independently influences multiple phenotypes. In genetic epidemiological studies, multiple endo-phenotypes or correlated traits are commonly tested separately in a univariate statistical framework to identify associations with genetic determinants. Subsequently, a simple look-up of overlapping univariate results is applied to identify pleiotropic genetic effects. However, this strategy offers limited power to detect pleiotropy. In contrast, combining correlated traits into a composite test provides a powerful approach for detecting pleiotropic genes. Here, we propose a two-stage approach to identify potential pleiotropic effects by utilizing aggregated results from large-scale genome-wide association (GWAS) meta-analyses. In the first stage, we developed two novel approaches (direct linear combining, dLC; and empirical combining, eLC) combining correlated univariate test statistics to screen potential pleiotropic variants on a genome-wide scale, using either individual-level or aggregated data. Our simulations indicated that dLC and eLC outperform other popular multivariate approaches (such as principal component analysis (PCA), multivariate analysis of variance (MANOVA), canonical correlation (CCA), generalized estimation equations (GEE), linear mixed effects models (LME) and O’Brien combining approach). In particular, eLC provides a notable increase in power when the genetic variant exhibits both protective and deleterious effects. In the second stage, we developed a unique approach, conditional pleiotropy testing (cPLT), to examine pleiotropic effects using individual-level data for candidate variants identified in Stage 1. Simulation demonstrated reduced type 1 error for cPLT in identifying pleiotropic genetic variants compared to the typical conditional strategy. We validated our two-stage approach by performing a bivariate GWA study on two correlated quantitative traits, high-density lipoprotein (HDL) and triglycerides (TG), in the Genetic Analysis Workshop 16 (GAW16) simulation dataset. In summary, the proposed two-stage approach allows us to leverage aggregated summary statistics from univariate GWAS and improves the power to identify potential pleiotropy while maintaining valid false-positive rates.Author SummaryPleiotropy, occurring when a single genetic variant contributes to multiple phenotypes, remains difficult to identify in genome-wide association studies (GWAS). To leverage data for multiple phenotypes and incorporate univariate GWAS summary results, we propose a novel two-stage approach for discovering potential pleiotropic variants. In the first stage, two novel combining approaches were developed to screen potential pleiotropic variants on a genome-wide scale. Simulations demonstrated the superior statistical power of these approaches over other multivariate methods. In the second stage, our approach was used to identify potential pleiotropy in the candidate marker sets generated from the first stage. The proposed two-stage approach was applied to the GAW16 simulation dataset to discover pleiotropic variants associated with high-density lipoprotein and triglycerides. In summary, we demonstrate that the proposed two-stage approach can be applied as a viable and robust strategy to accommodate phenotypic and genetic heterogeneity for discovering potential pleiotropy on genome-wide scale.


2018 ◽  
Vol 14 (1) ◽  
pp. 53-61
Author(s):  
Xinfeng Li ◽  
Fang Chen ◽  
Jinfeng Xiao ◽  
Shan-Ho Chou ◽  
Xuming Li ◽  
...  

Background: Riboswitches are structured elements that usually reside in the noncoding regions of mRNAs, with which various ligands bind to control a wide variety of downstream gene expressions. To date, more than twenty different classes of riboswitches have been characterized to sense various metabolites, including purines and their derivatives, coenzymes, amino acids, and metal ions, etc. </P><P> Objective: This study aims to study the genome-wide analysis of the distribution of riboswitches and function analyses of the corresponding downstream genes in prokaryotes. Results: In this study, we have completed a genome context analysis of 27 riboswitches to elucidate their metabolic capacities of riboswitch-mediated gene regulation from the completely-sequenced 3,079 prokaryotic genomes. Furthermore, Cluster of Orthologous Groups of proteins (COG) annotation was applied to predict and classify the possible functions of corresponding downstream genes of these riboswitches. We found that they could all be successfully annotated and grouped into 20 different COG functional categories, in which the two main clusters &quot;coenzyme metabolism [H]&quot; and &quot;amino acid transport and metabolism [E]&quot; were the most significantly enriched. Conclusion: Riboswitches are found to be widespread in bacteria, among which three main classes of TPP-, cobalamin- and SAM-riboswitch were the most widely distributed. We found a wide variety of functions were associated with the corresponding downstream genes, suggesting that a wide extend of regulatory roles were mediated by these riboswitches in prokaryotes.


2022 ◽  
Vol 1 (1) ◽  
Author(s):  
S Volis ◽  
I Shulgina ◽  
B Dyuzgenbekova

Environmental variation can be large across a wide range of spatial scales resulting in complex patterns of local adaptation across species ranges. We analyzed the scale, genetic mechanism and direct climatic causes of local adaptation in a widely distributed grass Hordeum spontaneum. We performed artificial crosses of maternal plants representing the same Negev desert population with plants originating elsewhere. Pollen donors were plants from other Negev desert populations, non-desert Israeli populations sampled along an aridity gradient, and accessions covering the entire species range. Our study included planting of inter-population hybrids under favorable and simulated desert experimental conditions, followed by analysis of their performance, variation in adaptive traits and relationship with climatic parameters at sampling locations. The combined results of parental phenotypic variation and performance of hybrids were consistent with local selection, reflecting the importance of both regional and local climates. The adaptive genetic differentiation of barley desert populations had a complex architecture. None of the three effects (additive, dominance and epistasis) were fully responsible for this differentiation. Although genetic effects not related to extrinsic selection appear to contribute to genetic differentiation in barley, epistatic effects arising from local selection clearly predominated. The short-term effect of gene flow by pollen was generally negative, indicating that a majority of the new allele combinations created by recombination were maladaptive. However, the long-term effect of occasional pollen flow from other desert populations appears to be positive, as some new recombined genotypes were superior in fitness to the maternal plants even in the F2 generation.


2014 ◽  
Vol 106 (2) ◽  
pp. 166-176 ◽  
Author(s):  
U. K. Reddy ◽  
L. Abburi ◽  
V. L. Abburi ◽  
T. Saminathan ◽  
R. Cantrell ◽  
...  

Author(s):  
Guillaume Laval ◽  
Etienne Patin ◽  
Pierre Boutillier ◽  
Lluis Quintana-Murci

Over the last 100,000 years, humans have spread across the globe and encountered a highly diverse set of environments to which they have had to adapt. Genome-wide scans of selection are powerful to detect selective sweeps. However, because of unknown fractions of undetected sweeps and false discoveries, the numbers of detected sweeps often poorly reflect actual numbers of selective sweeps in populations. The thousands of soft sweeps on standing variation recently evidenced in humans have also been interpreted as a majority of mis-classified neutral regions. In such a context, the extent of human adaptation remains little understood. We present a new rationale to estimate these actual numbers of sweeps expected over the last 100,000 years (denoted by X) from genome-wide population data, both considering hard sweeps and selective sweeps on standing variation. We implemented an approximate Bayesian computation framework and showed, based on computer simulations, that such a method can properly estimate X. We then jointly estimated the number of selective sweeps, their mean intensity and age in several 1000G African, European and Asian populations. Our estimations of X, found weakly sensitive to demographic misspecifications, revealed very limited numbers of sweeps regardless the frequency of the selected alleles at the onset of selection and the completion of sweeps. We estimated ∼80 sweeps in average across fifteen 1000G populations when assuming incomplete sweeps only and ∼140 selective sweeps in non-African populations when incorporating complete sweeps in our simulations. The method proposed may help to address controversies on the number of selective sweeps in populations, guiding further genome-wide investigations of recent positive selection.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257461
Author(s):  
Antonios Kominakis ◽  
Eirini Tarsani ◽  
Ariadne L. Hager-Theodorides ◽  
Ioannis Mastranestasis ◽  
Dimitra Gkelia ◽  
...  

In Greece, a number of local sheep breeds are raised in a wide range of ecological niches across the country. These breeds can be used for the identification of genetic variants that contribute to local adaptation. To this end, 50k genotypes of 90 local sheep from mainland Greece (Epirus, n = 35 and Peloponnesus, n = 55) were used, as well as 147 genotypes of sheep from insular Greece (Skyros, n = 21), Lemnos, n = 36 and Lesvos, n = 90). Principal components and phylogenetic analysis along with admixture and spatial point patterns analyses suggested genetic differentiation of ‘mainland-island’ populations. Genome scans for signatures of selection and genome-wide association analysis (GWAS) pointed to one highly differentiating marker on OAR4 (FST = 0.39, FLK = 21.93, FDR p-value = 0.10) that also displayed genome wide significance (FDR p-value = 0.002) during GWAS. A total number of 6 positional candidate genes (LOC106990429, ZNF804B, TEX47, STEAP4, SRI and ADAM22) were identified within 500 kb flanking regions around the significant marker. In addition, two QTLs related to fat tail deposition are reported in genomic regions 800 kb downstream the significant marker. Based on gene ontology analysis and literature evidence, the identified candidate genes possess biological functions relevant to local adaptation that worth further investigation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Aamir Saleem ◽  
Hilde Muylle ◽  
Jonas Aper ◽  
Tom Ruttink ◽  
Jiao Wang ◽  
...  

Targeted and untargeted selections including domestication and breeding efforts can reduce genetic diversity in breeding germplasm and create selective sweeps in crop genomes. The genomic regions at which selective sweeps are detected can reveal important information about signatures of selection. We have analyzed the genetic diversity within a soybean germplasm collection relevant for breeding in Europe (the EUCLEG collection), and have identified selective sweeps through a genome-wide scan comparing that collection to Chinese soybean collections. This work involved genotyping of 480 EUCLEG soybean accessions, including 210 improved varieties, 216 breeding lines and 54 landraces using the 355K SoySNP microarray. SNP calling of 477 EUCLEG accessions together with 328 Chinese soybean accessions identified 224,993 high-quality SNP markers. Population structure analysis revealed a clear differentiation between the EUCLEG collection and the Chinese materials. Further, the EUCLEG collection was sub-structured into five subgroups that were differentiated by geographical origin. No clear association between subgroups and maturity group was detected. The genetic diversity was lower in the EUCLEG collection compared to the Chinese collections. Selective sweep analysis revealed 23 selective sweep regions distributed over 12 chromosomes. Co-localization of these selective sweep regions with previously reported QTLs and genes revealed that various signatures of selection in the EUCLEG collection may be related to domestication and improvement traits including seed protein and oil content, phenology, nitrogen fixation, yield components, diseases resistance and quality. No signatures of selection related to stem determinacy were detected. In addition, absence of signatures of selection for a substantial number of QTLs related to yield, protein content, oil content and phenological traits suggests the presence of substantial genetic diversity in the EUCLEG collection. Taken together, the results obtained demonstrate that the available genetic diversity in the EUCLEG collection can be further exploited for research and breeding purposes. However, incorporation of exotic material can be considered to broaden its genetic base.


Sign in / Sign up

Export Citation Format

Share Document