Principals about principal components in statistical genetics

2018 ◽  
Vol 20 (6) ◽  
pp. 2200-2216 ◽  
Author(s):  
Fentaw Abegaz ◽  
Kridsadakorn Chaichoompu ◽  
Emmanuelle Génin ◽  
David W Fardo ◽  
Inke R König ◽  
...  

Abstract Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.

2018 ◽  
Author(s):  
BW Kunkle ◽  
B Grenier-Boley ◽  
R Sims ◽  
JC Bis ◽  
AC Naj ◽  
...  

IntroductionLate-onset Alzheimer’s disease (LOAD, onset age > 60 years) is the most prevalent dementia in the elderly1, and risk is partially driven by genetics2. Many of the loci responsible for this genetic risk were identified by genome-wide association studies (GWAS)3–8. To identify additional LOAD risk loci, the we performed the largest GWAS to date (89,769 individuals), analyzing both common and rare variants. We confirm 20 previous LOAD risk loci and identify four new genome-wide loci (IQCK, ACE, ADAM10, and ADAMTS1). Pathway analysis of these data implicates the immune system and lipid metabolism, and for the first time tau binding proteins and APP metabolism. These findings show that genetic variants affecting APP and Aβ processing are not only associated with early-onset autosomal dominant AD but also with LOAD. Analysis of AD risk genes and pathways show enrichment for rare variants (P = 1.32 × 10−7) indicating that additional rare variants remain to be identified.


Blood ◽  
2012 ◽  
Vol 119 (10) ◽  
pp. 2392-2400 ◽  
Author(s):  
Jessica Dennis ◽  
Candice Y. Johnson ◽  
Adeniyi Samuel Adediran ◽  
Mariza de Andrade ◽  
John A. Heit ◽  
...  

Abstract The endothelial protein C receptor (EPCR) limits thrombus formation by enhancing activation of the protein C anticoagulant pathway, and therefore may play a role in the etiology of thrombotic disorders. The rs867186 single-nucleotide polymorphism in the PROCR gene (g.6936A > G, c.4600A > G), resulting in a serine-to-glycine substitution at codon 219, has been associated with reduced activation of the protein C pathway, although its association with thrombosis risk remains unclear. The present study is a highly comprehensive systematic review and meta-analysis, including unpublished genome-wide association study results, conducted to evaluate the evidence for an association between rs867186 and 2 common thrombotic outcomes, venous thromboembolism (VTE) and myocardial infarction (MI), which are hypothesized to share some etiologic pathways. MEDLINE, EMBASE, and HuGE Navigator were searched through July 2011 to identify relevant epidemiologic studies, and data were summarized using random-effects meta-analysis. Twelve candidate genes and 13 genome-wide association studies were analyzed (11 VTE and 14 MI, including 37 415 cases and 84 406 noncases). Under the additive genetic model, the odds of VTE increased by a factor of 1.22 (95% confidence interval, 1.11-1.33, P < .001) for every additional copy of the G allele. No evidence for association with MI was observed.


Cells ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 306 ◽  
Author(s):  
◽  
Pinchas Cohen ◽  
◽  
◽  
◽  
...  

Mitochondrial genome-wide association studies identify mitochondrial single nucleotide polymorphisms (mtSNPs) that associate with disease or disease-related phenotypes. Most mitochondrial and nuclear genome-wide association studies adjust for genetic ancestry by including principal components derived from nuclear DNA, but not from mitochondrial DNA, as covariates in statistical regression analyses. Furthermore, there is no standard when controlling for genetic ancestry during mitochondrial and nuclear genetic interaction association scans, especially across ethnicities with substantial mitochondrial genetic heterogeneity. The purpose of this study is to (1) compare the degree of ethnic variation captured by principal components calculated from microarray-defined nuclear and mitochondrial DNA and (2) assess the utility of mitochondrial principal components for association studies. Analytic techniques used in this study include a principal component analysis for genetic ancestry, decision-tree classification for self-reported ethnicity, and linear regression for association tests. Data from the Health and Retirement Study, which includes self-reported White, Black, and Hispanic Americans, was used for all analyses. We report that (1) mitochondrial principal component analysis (PCA) captures ethnic variation to a similar or slightly greater degree than nuclear PCA in Blacks and Hispanics, (2) nuclear and mitochondrial DNA classify self-reported ethnicity to a high degree but with a similar level of error, and 3) mitochondrial principal components can be used as covariates to adjust for population stratification in association studies with complex traits, as demonstrated by our analysis of height—a phenotype with a high heritability. Overall, genetic association studies might reveal true and robust mtSNP associations when including mitochondrial principal components as regression covariates.


2013 ◽  
Vol 2013 ◽  
pp. 1-4 ◽  
Author(s):  
Yingchang Lu ◽  
Sinae Kane ◽  
Haoyan Chen ◽  
Argentina Leon ◽  
Ethan Levin ◽  
...  

Recent genome-wide association studies (GWAS) have identified multiple genetic risk factors for psoriasis, but data on their association with age of onset have been marginally explored. The goal of this study was to evaluate known risk alleles of psoriasis for association with age of psoriasis onset in three well-defined case-only cohorts totaling 1,498 psoriasis patients. We selected 39 genetic variants from psoriasis GWAS and tested these variants for association with age of psoriasis onset in a meta-analysis. We found that rs10484554 and rs12191877 near HLA-C and rs17716942 near IFIH1 were associated with age of psoriasis onset with false discovery rate < 0.05. The association between rs17716942 and age of onset was not replicated in a fourth independent cohort of 489 patients (). The imputed HLA-C*06:02 allele demonstrated a much stronger association with age of psoriasis onset than rs10484554 and rs12191877. We conclude that despite the discovery of numerous psoriasis risk alleles, HLA-C*06:02 still plays the most important role in determining the age of onset of psoriasis. Larger studies are needed to evaluate the contribution of other risk alleles, including IFIH1, to age of psoriasis onset.


SLEEP ◽  
2020 ◽  
Vol 43 (9) ◽  
Author(s):  
Om Prakash Kafle ◽  
Shiqiang Cheng ◽  
Mei Ma ◽  
Ping Li ◽  
Bolun Cheng ◽  
...  

Abstract Study Objectives Insomnia is a common sleep disorder and constitutes a major issue in modern society. We provide new clues for revealing the association between environmental chemicals and insomnia. Methods Three genome-wide association studies (GWAS) summary datasets of insomnia (n = 113,006, n = 1,331,010, and n = 453,379, respectively) were driven from the UK Biobank, 23andMe, and deCODE. The chemical–gene interaction dataset was downloaded from the Comparative Toxicogenomics Database. First, we conducted a meta-analysis of the three datasets of insomnia using the METAL software. Using the result of meta-analysis, transcriptome-wide association studies were performed to calculate the expression association testing statistics of insomnia. Then chemical-related gene set enrichment analysis (GSEA) was used to explore the association between chemicals and insomnia. Results For GWAS meta-analysis dataset of insomnia, we identified 42 chemicals associated with insomnia in brain tissue (p &lt; 0.05) by GSEA. We detected five important chemicals such as pinosylvin (p = 0.0128), bromobenzene (p = 0.0134), clonidine (p = 0.0372), gabapentin (p = 0.0372), and melatonin (p = 0.0404) which are directly associated with insomnia. Conclusion Our study results provide new clues for revealing the roles of environmental chemicals in the development of insomnia.


2018 ◽  
Author(s):  
Benedikt von der Heyde ◽  
Anastasia Emmanouilidou ◽  
Eugenia Mazzaferro ◽  
Silvia Vicenzi ◽  
Ida Höijer ◽  
...  

AbstractA meta-analysis of genome-wide association studies (GWAS) identified eight loci that are associated with heart rate variability (HRV), but candidate genes in these loci remain uncharacterized. We developed an image- and CRISPR/Cas9-based pipeline to systematically characterize candidate genes for HRV in live zebrafish embryos. Nine zebrafish orthologues of six human candidate genes were targeted simultaneously in eggs from fish that transgenically express GFP on smooth muscle cells (Tg[acta2:GFP]), to visualize the beating heart. An automated analysis of repeated 30s recordings of beating atria in 381 live, intact zebrafish embryos at 2 and 5 days post-fertilization highlighted genes that influence HRV (hcn4 and si:dkey-65j6.2 [KIAA1755]); heart rate (rgs6 and hcn4); and the risk of sinoatrial pauses and arrests (hcn4). Exposure to 10 or 25µM ivabradine – an open channel blocker of HCNs – for 24h resulted in a dose-dependent higher HRV and lower heart rate at 5 days post-fertilization. Hence, our screen confirmed the role of established genes for heart rate and rhythm (RGS6 and HCN4); showed that ivabradine reduces heart rate and increases HRV in zebrafish embryos, as it does in humans; and highlighted a novel gene that plays a role in HRV (KIAA1755).


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Caroline M. Nievergelt ◽  
Adam X. Maihofer ◽  
Torsten Klengel ◽  
Elizabeth G. Atkinson ◽  
Chia-Yen Chen ◽  
...  

Abstract The risk of posttraumatic stress disorder (PTSD) following trauma is heritable, but robust common variants have yet to be identified. In a multi-ethnic cohort including over 30,000 PTSD cases and 170,000 controls we conduct a genome-wide association study of PTSD. We demonstrate SNP-based heritability estimates of 5–20%, varying by sex. Three genome-wide significant loci are identified, 2 in European and 1 in African-ancestry analyses. Analyses stratified by sex implicate 3 additional loci in men. Along with other novel genes and non-coding RNAs, a Parkinson’s disease gene involved in dopamine regulation, PARK2, is associated with PTSD. Finally, we demonstrate that polygenic risk for PTSD is significantly predictive of re-experiencing symptoms in the Million Veteran Program dataset, although specific loci did not replicate. These results demonstrate the role of genetic variation in the biology of risk for PTSD and highlight the necessity of conducting sex-stratified analyses and expanding GWAS beyond European ancestry populations.


2020 ◽  
Author(s):  
Uladzislau Rudakou ◽  
Eric Yu ◽  
Lynne M Krohn ◽  
Jennifer A Ruskey ◽  
Farnaz Asayesh ◽  
...  

Genome-wide association studies (GWAS) have identified numerous loci associated with Parkinson's disease. The specific genes and variants that drive the associations within the vast majority of these loci are unknown. We aimed to perform a comprehensive analysis of selected genes to determine the potential role of rare and common genetic variants within these loci. We fully sequenced 32 genes from 25 loci previously associated with Parkinson's disease in 2,657 patients and 3,647 controls from three cohorts. Capture was done using molecular inversion probes targeting the exons, exon-intron boundaries and untranslated regions (UTRs) of the genes of interest, followed by sequencing. Quality control was performed to include only high-quality variants. We examined the role of rare variants (minor allele frequency < 0.01) using optimized sequence Kernel association tests (SKAT-O). The association of common variants was estimated using regression models adjusted for age, sex and ethnicity as required in each cohort, followed by a meta-analysis. After Bonferroni correction, we identified a burden of rare variants in SYT11, FGF20 and GCH1 associated with Parkinson's disease. Nominal associations were identified in 21 additional genes. Previous reports suggested that the SYT11 GWAS association is driven by variants in the nearby GBA gene. However, the association of SYT11 was mainly driven by a rare 3' UTR variant (rs945006601) and was independent of GBA variants (p=5.23E-05 after exclusion of all GBA variant carriers). The association of FGF20 was driven by a rare 5' UTR variant (rs1034608171) located in the promoter region. The previously reported association of GCH1 with Parkinson's Disease is driven by rare nonsynonymous variants, some of which are known to cause dopamine-responsive dystonia. We also identified two LRRK2 variants, p.Arg793Met and p.Gln1353Lys, in ten and eight controls, respectively, but not in patients. We identified common variants associated with Parkinson's disease in MAPT, TMEM175, BST1, SNCA and GPNMB which are all in strong linkage disequilibrium (LD) with known GWAS hits in their respective loci. A common coding PM20D1 variant, p.Ile149Val, was nominally associated with reduced risk of Parkinson's disease (OR 0.73, 95% CI 0.60-0.89, p=1.161E-03). This variant is not in LD with the top GWAS hits within this locus and may represent a novel association. These results further demonstrate the importance of fine mapping of GWAS loci, and suggest that SYT11, FGF20, and potentially PM20D1, BST1 and GPNMB should be considered for future studies as possible Parkinson's disease-related genes.


2021 ◽  
Author(s):  
Minako Imamura ◽  
Atsushi Takahashi ◽  
Masatoshi Matsunami ◽  
Momoko Horikoshi ◽  
Minoru Iwata ◽  
...  

Abstract Several reports have suggested that genetic susceptibility contributes to the development and progression of diabetic retinopathy. We aimed to identify genetic loci that confer susceptibility to diabetic retinopathy in Japanese patients with type 2 diabetes. We analysed 5 790 508 single nucleotide polymorphisms (SNPs) in 8880 Japanese patients with type 2 diabetes, 4839 retinopathy cases and 4041 controls, as well as 2217 independent Japanese patients with type 2 diabetes, 693 retinopathy cases, and 1524 controls. The results of these two genome-wide association studies (GWAS) were combined with an inverse variance meta-analysis (Stage-1), followed by de novo genotyping for the candidate SNP loci (p &lt; 1.0 × 10−4) in an independent case–control study (Stage-2, 2260 cases and 723 controls). After combining the association data (Stage-1 and -2) using meta-analysis, the associations of two loci reached a genome-wide significance level: rs12630354 near STT3B on chromosome 3, p = 1.62 × 10−9, odds ratio (OR) = 1.17, 95% confidence interval (CI) 1.11–1.23, and rs140508424 within PALM2 on chromosome 9, p = 4.19 × 10−8, OR = 1.61, 95% CI 1.36–1.91. However, the association of these two loci were not replicated in Korean, European, or African American populations. Gene-based analysis using Stage-1 GWAS data identified a gene-level association of EHD3 with susceptibility to diabetic retinopathy (p = 2.17 × 10−6). In conclusion, we identified two novel SNP loci, STT3B and PALM2, and a novel gene, EHD3, that confers susceptibility to diabetic retinopathy; however, further replication studies are required to validate these associations.


Sign in / Sign up

Export Citation Format

Share Document