scholarly journals Biases in GWAS – the dog that did not bark

2019 ◽  
Author(s):  
C M Schooling

AbstractBackgroundGenome wide association studies (GWAS) of specific diseases are central to scientific discovery. Bias from inevitably recruiting only survivors of genetic make-up and disease specific competing risk has not been comprehensively considered.MethodsWe identified sources of bias using directed acyclic graphs, and tested for them in the UK Biobank GWAS by making comparisons across the survival distribution, proxied by age at recruitment.ResultsAssociations of genetic variants with some diseases depended on their effect on survival. Variants associated with common harmful diseases had weaker or reversed associations with subsequent diseases that shared causes.ConclusionGenetic studies of diseases that involve surviving other common diseases are open to selection bias that can generate systematic type 2 error. GWAS ignoring such selection bias are most suitable for monogenetic diseases. Genetic effects on age at recruitment may indicate potential bias in disease-specific GWAS and relevance to population health.

Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


2019 ◽  
Vol 116 (4) ◽  
pp. 1195-1200 ◽  
Author(s):  
Daniel J. Wilson

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


2020 ◽  
Vol 2 (7A) ◽  
Author(s):  
Megan De Ste Croix ◽  
Dave Neelam ◽  
Neil Oldfield ◽  
Jay Lucidarme ◽  
David Turner ◽  
...  

Despite on-going vaccination programmes, Neisseria meningitidis causes over 700 cases of invasive meningococcal disease (IMD) in the UK each year. In 2017-18, the MenW and MenY capsular groups caused 38% of all IMD cases. Current policy is to generate genome sequences of all meningococcal disease isolates. Using this resource, we aim to understand how genetic variation contributes to phenotypic differences between carriage and disease isolates. We are adapting a variety of assays, designed to mimic carriage and disease behaviours, for high throughput phenotypic testing of multiple meningococcal isolates from carriage and cases of IMD. We have selected 335 MenW cc11 and MenY cc23 isolates and are currently testing subsets of isolates in cell culture (CaLu3), growth and biofilm assays. Phenotypic differences will be utilised as input data for Genome Wide Association Studies that aim to identify the specific genomic variants, or combinations of variants, determining observed differences. Genomic data will include whole genome sequences and repeat-mediated phase variation states. Our preliminary data has detected variation in the ability of cc11 and cc23 isolates to disrupt monolayers of CaLu3 cells, indicating that minor genetic differences in phylogentically similar organisms may be physiologically important for both carriage and disease. We will also discuss progress in establishing successful, high-throughput assays for testing multiple isolates.


2021 ◽  
Author(s):  
Adam C. Naj ◽  
Ganna Leonenko ◽  
Xueqiu Jian ◽  
Benjamin Grenier-Boley ◽  
Maria Carolina Dalmasso ◽  
...  

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.


2020 ◽  
Author(s):  
Dan Ju ◽  
Iain Mathieson

AbstractSkin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.SignificanceSome of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 years we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.


2021 ◽  
Author(s):  
Weihua Meng ◽  
Parminder Reel ◽  
Charvi Nangia ◽  
Aravind Rajendrakumar ◽  
Harry Hebert ◽  
...  

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.


2020 ◽  
Author(s):  
Lucas D. Ward ◽  
Ho-Chou Tu ◽  
Chelsea Quenneville ◽  
Alexander O. Flynn-Carroll ◽  
Margaret M. Parker ◽  
...  

AbstractTo better understand molecular pathways underlying liver health and disease, we performed genome-wide association studies (GWAS) on circulating levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) across 408,300 subjects from four ethnic groups in the UK Biobank, focusing on variants associating with both enzymes. Of these variants, the strongest effect is a rare (MAF in White British = 0.12%) missense variant in the gene encoding manganese efflux transporter SLC30A10, Thr95Ile (rs188273166), associating with a 5.9% increase in ALT and a 4.2% increase in AST. Carriers have higher prevalence of all-cause liver disease (OR = 1.70; 95% CI = 1.24 to 2.34) and higher prevalence of extrahepatic bile duct cancer (OR = 23.8; 95% CI = 9.1 to 62.1) compared to non-carriers. Over 4% of the cases of extrahepatic cholangiocarcinoma in the UK Biobank carry SLC30A10 Thr95Ile. Unlike variants in SLC30A10 known to cause the recessive syndrome hypermanganesemia with dystonia-1 (HMNDYT1), the Thr95Ile variant has a detectable effect even in the heterozygous state. Also unlike HMNDYT1-causing variants, Thr95Ile results in a protein that is properly trafficked to the plasma membrane when expressed in HeLa cells. These results suggest that coding variation in SLC30A10 impacts liver health in more individuals than the small population of HMNDYT1 patients.


2018 ◽  
Author(s):  
Mashaal Sohail ◽  
Robert M. Maier ◽  
Andrea Ganna ◽  
Alex Bloemendal ◽  
Alicia R. Martin ◽  
...  

AbstractGenetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of subsignificant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.


Author(s):  
Mathew Vithayathil ◽  
Paul Carter ◽  
Siddhartha Kar ◽  
Amy M. Mason ◽  
Stephen Burgess ◽  
...  

ABSTRACTObjectivesTo investigate the casual role of body mass index, body fat composition and height in cancer.DesignTwo stage mendelian randomisation studySettingPrevious genome wide association studies and the UK BiobankParticipantsGenetic instrumental variables for body mass index (BMI), fat mass index (FMI), fat free mass index (FFMI) and height from previous genome wide association studies and UK Biobank. Cancer outcomes from 367 586 participants of European descent from the UK Biobank.Main outcome measuresOverall cancer risk and 22 site-specific cancers risk for genetic instrumental variables for BMI, FMI, FFMI and height.ResultsGenetically predicted BMI (per 1 kg/m2) was not associated with overall cancer risk (OR 0.99; 95% confidence interval (CI) 0-98-1.00, p=0.105). Elevated BMI was associated with increased risk of stomach cancer (OR 1.15, 95% (CI) 1.05-1.26; p=0.003) and melanoma (OR 0.96, 95% CI 0.92-1.00; p=0.044). For sex-specific cancers, BMI was positively associated with uterine cancer (OR 1.08, 95% CI 1.01-1.14; p=0.015) but inversely associated with breast (OR 0.95, 95% CI 0.92-0.98; p=0.001), prostate (OR 0.95, 95% CI 0.92-0.99; p=0.007) and testicular cancer (OR 0.89, 95% CI 0.81-0.98; p=0.017). Elevated FMI (per 1 kg/m2) was associated with gastrointestinal cancer (stomach cancer OR 4.23, 95% CI 1.18-15.13, p=0.027; colorectal cancer OR 1.94, 95% CI 1.23-3.07; p=0.004). Increased height (per 1 standard deviation, approximately 6.5cm) was associated with increased risk of overall cancer (OR 1.06; 95% 1.04-1.09; p = 2.97×10-8) and most site-specific cancers with the strongest estimates for kidney, non-Hodgkin lymphoma, colorectal, lung, melanoma and breast cancer.ConclusionsThere is little evidence for BMI as a casual risk factor for cancer. BMI may have a causal role for sex-specific cancers, although with inconsistent directions of effect, and FMI for gastrointestinal malignancies. Elevated height is a risk factor for overall cancer and multiple site cancers.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Karoline Kuchenbaecker ◽  
◽  
Nikita Telkar ◽  
Theresa Reiker ◽  
Robin G. Walters ◽  
...  

Abstract Most genome-wide association studies are based on samples of European descent. We assess whether the genetic determinants of blood lipids, a major cardiovascular risk factor, are shared across populations. Genetic correlations for lipids between European-ancestry and Asian cohorts are not significantly different from 1. A genetic risk score based on LDL-cholesterol-associated loci has consistent effects on serum levels in samples from the UK, Uganda and Greece (r = 0.23–0.28, p < 1.9 × 10−14). Overall, there is evidence of reproducibility for ~75% of the major lipid loci from European discovery studies, except triglyceride loci in the Ugandan samples (10% of loci). Individual transferable loci are identified using trans-ethnic colocalization. Ten of fourteen loci not transferable to the Ugandan population have pleiotropic associations with BMI in Europeans; none of the transferable loci do. The non-transferable loci might affect lipids by modifying food intake in environments rich in certain nutrients, which suggests a potential role for gene-environment interactions.


Sign in / Sign up

Export Citation Format

Share Document