scholarly journals Sparse Principal Component Analysis for Identifying Ancestry-Informative Markers in Genome-Wide Association Studies

2012 ◽  
Vol 36 (4) ◽  
pp. 293-302 ◽  
Author(s):  
Seokho Lee ◽  
Michael P. Epstein ◽  
Richard Duncan ◽  
Xihong Lin
2014 ◽  
Vol 94 (5) ◽  
pp. 662-676 ◽  
Author(s):  
Hugues Aschard ◽  
Bjarni J. Vilhjálmsson ◽  
Nicolas Greliche ◽  
Pierre-Emmanuel Morange ◽  
David-Alexandre Trégouët ◽  
...  

Animals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 1147
Author(s):  
Asha M. Miles ◽  
Christian J. Posbergh ◽  
Heather J. Huson

Our objectives were to robustly characterize a cohort of Holstein cows for udder and teat type traits and perform high-density genome-wide association studies for those traits within the same group of animals, thereby improving the accuracy of the phenotypic measurements and genomic association study. Additionally, we sought to identify a novel udder and teat trait composite risk index to determine loci with potential pleiotropic effects related to mastitis. This approach was aimed at improving the biological understanding of the genetic factors influencing mastitis. Cows (N = 471) were genotyped on the Illumina BovineHD777k beadchip and scored for front and rear teat length, width, end shape, and placement; fore udder attachment; udder cleft; udder depth; rear udder height; and rear udder width. We used principal component analysis to create a single composite measure describing type traits previously linked to high odds of developing mastitis within our cohort of cows. Genome-wide associations were performed, and 28 genomic regions were significantly associated (Bonferroni-corrected p < 0.05). Interrogation of these genomic regions revealed a number of biologically plausible genes whicht may contribute to the development of mastitis and whose functions range from regulating cell proliferation to immune system signaling, including ZNF683, DHX9, CUX1, TNNT1, and SPRY1. Genetic investigation of the risk composite trait implicated a novel locus and candidate genes that have potentially pleiotropic effects related to mastitis.


2019 ◽  
Vol 35 (17) ◽  
pp. 3046-3054 ◽  
Author(s):  
Anastasia Gurinovich ◽  
Harold Bae ◽  
John J Farrell ◽  
Stacy L Andersen ◽  
Stefano Monti ◽  
...  

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Daiwei Zhang ◽  
Rounak Dey ◽  
Seunggeun Lee

AbstractPopulation stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loading and recently developed data augmentation-decomposition-transformation (ADP), such as LASER and TRACE, are popular methods for predicting PC scores. However, they are either biased or computationally expensive. The predicted PC scores from SP can be biased toward NULL. On the other hand, since ADP requires running PCA separately for each study sample on the augmented data set, its computational cost is high. To address these problems, we develop and propose two alternative approaches, bias-adjusted projection (AP) and online ADP (OADP). Using random matrix theory, AP asymptotically estimates and adjusts for the bias of SP. OADP uses computationally efficient online singular value decomposition, which can greatly reduce the computation cost of ADP. We carried out extensive simulation studies to show that these alternative approaches are unbiased and the computation times can be 10-100 times faster than ADP. We applied our approaches to UK-Biobank data of 488,366 study samples with 2,492 samples from the 1000 Genomes data as the reference. AP and OADP required 7 and 75 CPU hours, respectively, while the projected computation time of ADP is 2,534 CPU hours. Furthermore, when we only used the European reference samples in the 1000 Genomes to infer sub-European ancestry, SP clearly showed bias, unlike the proposed approaches. By using AP and OADP, we can infer ancestry and adjust for PS robustly and efficiently.


Cells ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 306 ◽  
Author(s):  
◽  
Pinchas Cohen ◽  
◽  
◽  
◽  
...  

Mitochondrial genome-wide association studies identify mitochondrial single nucleotide polymorphisms (mtSNPs) that associate with disease or disease-related phenotypes. Most mitochondrial and nuclear genome-wide association studies adjust for genetic ancestry by including principal components derived from nuclear DNA, but not from mitochondrial DNA, as covariates in statistical regression analyses. Furthermore, there is no standard when controlling for genetic ancestry during mitochondrial and nuclear genetic interaction association scans, especially across ethnicities with substantial mitochondrial genetic heterogeneity. The purpose of this study is to (1) compare the degree of ethnic variation captured by principal components calculated from microarray-defined nuclear and mitochondrial DNA and (2) assess the utility of mitochondrial principal components for association studies. Analytic techniques used in this study include a principal component analysis for genetic ancestry, decision-tree classification for self-reported ethnicity, and linear regression for association tests. Data from the Health and Retirement Study, which includes self-reported White, Black, and Hispanic Americans, was used for all analyses. We report that (1) mitochondrial principal component analysis (PCA) captures ethnic variation to a similar or slightly greater degree than nuclear PCA in Blacks and Hispanics, (2) nuclear and mitochondrial DNA classify self-reported ethnicity to a high degree but with a similar level of error, and 3) mitochondrial principal components can be used as covariates to adjust for population stratification in association studies with complex traits, as demonstrated by our analysis of height—a phenotype with a high heritability. Overall, genetic association studies might reveal true and robust mtSNP associations when including mitochondrial principal components as regression covariates.


Sign in / Sign up

Export Citation Format

Share Document