scholarly journals Inferring population structure in biobank-scale genomic data

2021 ◽  
Author(s):  
Alec M Chiu ◽  
Erin K Molloy ◽  
Zilong Tan ◽  
Ameet Talwalkar ◽  
Sriram Sankararaman

Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. While a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.

2019 ◽  
Author(s):  
Aman Agrawal ◽  
Alec M. Chiu ◽  
Minh Le ◽  
Eran Halperin ◽  
Sriram Sankararaman

AbstractPrincipal component analysis (PCA) is a key tool for understanding population structure and controlling for population stratification in genome-wide association studies (GWAS). With the advent of large-scale datasets of genetic variation, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. We present ProPCA, a highly scalable method based on a probabilistic generative model, which computes the top PCs on genetic variation data efficiently. We applied ProPCA to compute the top five PCs on genotype data from the UK Biobank, consisting of 488,363 individuals and 146,671 SNPs, in less than thirty minutes. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we scanned for SNPs that are not well-explained by the PCs to identify several novel genome-wide signals of recent putative selection including missense mutations in RPGRIP1L and TLR4.Author SummaryPrincipal component analysis is a commonly used technique for understanding population structure and genetic variation. With the advent of large-scale datasets that contain the genetic information of hundreds of thousands of individuals, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. In this study, we present ProPCA, a highly scalable statistical method to compute genetic PCs efficiently. We systematically evaluate the accuracy and robustness of our method on large-scale simulated data and apply it to the UK Biobank. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we identify several novel signals of putative recent selection.


Nutrients ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 2218
Author(s):  
Shuai Yuan ◽  
Paul Carter ◽  
Amy M. Mason ◽  
Stephen Burgess ◽  
Susanna C. Larsson

Coffee consumption has been linked to a lower risk of cardiovascular disease in observational studies, but whether the associations are causal is not known. We conducted a Mendelian randomization investigation to assess the potential causal role of coffee consumption in cardiovascular disease. Twelve independent genetic variants were used to proxy coffee consumption. Summary-level data for the relations between the 12 genetic variants and cardiovascular diseases were taken from the UK Biobank with up to 35,979 cases and the FinnGen consortium with up to 17,325 cases. Genetic predisposition to higher coffee consumption was not associated with any of the 15 studied cardiovascular outcomes in univariable MR analysis. The odds ratio per 50% increase in genetically predicted coffee consumption ranged from 0.97 (95% confidence interval (CI), 0.63, 1.50) for intracerebral hemorrhage to 1.26 (95% CI, 1.00, 1.58) for deep vein thrombosis in the UK Biobank and from 0.86 (95% CI, 0.50, 1.49) for subarachnoid hemorrhage to 1.34 (95% CI, 0.81, 2.22) for intracerebral hemorrhage in FinnGen. The null findings remained in multivariable Mendelian randomization analyses adjusted for genetically predicted body mass index and smoking initiation, except for a suggestive positive association for intracerebral hemorrhage (odds ratio 1.91; 95% CI, 1.03, 3.54) in FinnGen. This Mendelian randomization study showed limited evidence that coffee consumption affects the risk of developing cardiovascular disease, suggesting that previous observational studies may have been confounded.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Paul Carter ◽  
Mathew Vithayathil ◽  
Siddhartha Kar ◽  
Rahul Potluri ◽  
Amy M Mason ◽  
...  

Laboratory studies have suggested oncogenic roles of lipids, as well as anticarcinogenic effects of statins. Here we assess the potential effect of statin therapy on cancer risk using evidence from human genetics. We obtained associations of lipid-related genetic variants with the risk of overall and 22 site-specific cancers for 367,703 individuals in the UK Biobank. In total, 75,037 individuals had a cancer event. Variants in the HMGCR gene region, which represent proxies for statin treatment, were associated with overall cancer risk (odds ratio [OR] per one standard deviation decrease in low-density lipoprotein [LDL] cholesterol 0.76, 95% confidence interval [CI] 0.65–0.88, p=0.0003) but variants in gene regions representing alternative lipid-lowering treatment targets (PCSK9, LDLR, NPC1L1, APOC3, LPL) were not. Genetically predicted LDL-cholesterol was not associated with overall cancer risk (OR per standard deviation increase 1.01, 95% CI 0.98–1.05, p=0.50). Our results predict that statins reduce cancer risk but other lipid-lowering treatments do not. This suggests that statins reduce cancer risk through a cholesterol independent pathway.


Nature ◽  
2018 ◽  
Vol 562 (7726) ◽  
pp. 203-209 ◽  
Author(s):  
Clare Bycroft ◽  
Colin Freeman ◽  
Desislava Petkova ◽  
Gavin Band ◽  
Lloyd T. Elliott ◽  
...  

2019 ◽  
Vol 21 (1) ◽  
Author(s):  
Ravi K. Narang ◽  
Ruth Topless ◽  
Murray Cadzow ◽  
Greg Gamble ◽  
Lisa K. Stamp ◽  
...  

2021 ◽  
Author(s):  
David Curtis

AbstractAimsThe study aimed to identify specific genes and functional genetic variants affecting susceptibility to two alcohol related phenotypes: heavy drinking and problem drinking.MethodsPhenotypic and exome sequence data was downloaded from the UK Biobank. Reported drinks in the last 24 hours was used to define heavy drinking while responses to a mental health questionnaire defined problem drinking. Gene-wise weighted burden analysis was applied, with genetic variants which were rarer and/or had a more severe functional effect being weighted more highly. Additionally, previously reported variants of interest were analysed inidividually.ResultsOf exome sequenced subjects, for heavy drinking there were 8,166 cases and 84,461 controls while for problem drinking there were 7,811 cases and 59,606 controls. No gene was formally significant after correction for multiple testing but three genes possibly related to autism were significant at p < 0.001, FOXP1, ARHGAP33 and CDH9, along with VGF which may also be of psychiatric interest. Well established associations with rs1229984 in ADH1B and rs671 in ALDH2 were confirmed but previously reported variants in ALDH1B1 and GRM3 were not associated with either phenotype.ConclusionsThis large study fails to conclusively implicate any novel genes or variants. It is possible that more definitive results will be obtained when sequence data for the remaining UK Biobank participants becomes available and/or if data can be obtained for a more extreme phenotype such as alcohol dependence disorder. This research has been conducted using the UK Biobank Resource.Short summaryTests for association of rare, functional genetic variants with heavy drinking and problem drinking confirm the known effects of variants in ADH1B and ALDH2 but fail to implicate novel variants or genes. Results for three genes potentially related to autism suggest they might exert a protective effect.


2020 ◽  
Author(s):  
Dan Ju ◽  
Iain Mathieson

AbstractSkin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.SignificanceSome of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 years we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.


Author(s):  
Juba Nait Saada ◽  
Georgios Kalantzis ◽  
Derek Shyr ◽  
Martin Robinson ◽  
Alexander Gusev ◽  
...  

AbstractDetection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of genomic analyses. We developed a new method, called FastSMC, that enables accurate biobank-scale detection of IBD segments transmitted by common ancestors living up to several hundreds of generations in the past. FastSMC combines a fast heuristic search for IBD segments with accurate coalescent-based likelihood calculations and enables estimating the age of common ancestors transmitting IBD regions. We applied FastSMC to 487,409 phased samples from the UK Biobank and detected the presence of ∼214 billion IBD segments transmitted by shared ancestors within the past 1,500 years. We quantified time-dependent shared ancestry within and across 120 postcodes, obtaining a fine-grained picture of genetic relatedness within the past two millennia in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the localization of a sample’s birth coordinates from genomic data. We sought evidence of recent positive selection by identifying loci with unusually strong shared ancestry within recent millennia and we detected 12 genome-wide significant signals, including 7 novel loci. We found IBD sharing to be highly predictive of the sharing of ultra-rare variants in exome sequencing samples from the UK Biobank. Focusing on loss-of-function variation discovered using exome sequencing, we devised an IBD-based association test and detected 29 associations with 7 blood-related traits, 20 of which were not detected in the exome sequencing study. These results underscore the importance of modelling distant relatedness to reveal subtle population structure, recent evolutionary history, and rare pathogenic variation.


2021 ◽  
Author(s):  
Isabel Gamache ◽  
Marc-André Legault ◽  
Jean-Christophe Grenier ◽  
Rocio Sanchez ◽  
Eric Rhéaume ◽  
...  

Pharmacogenomic studies have revealed associations between rs1967309 in the adenylyl cyclase type 9 (ADCY9) gene and clinical responses to the cholesteryl ester transfer protein (CETP) modulator dalcetrapib, however, the mechanism behind this interaction is still unknown. Here, we characterized selective signals at the locus associated with the pharmacogenomic response in human populations and we show that rs1967309 region exhibits signatures of natural selection in several human populations. Furthermore, we identified a variant in CETP, rs158477, which is in long-range linkage disequilibrium with rs1967309 in the Peruvian population. The signal is mainly seen in males, a sex-specific result that is replicated in the LIMAA cohort of over 3400 Peruvians. We further detected interaction effects of these two SNPs with sex on cardiovascular phenotypes in the UK Biobank, in line with the sex-specific genotype associations found in Peruvians at these loci. Analyses of RNA-seq data further suggest an epistatic interaction on CETP expression levels between the two SNPs in multiple tissues. We propose that ADCY9 and CETP coevolved during recent human evolution, which points towards a biological link between dalcetrapib's pharmacogene ADCY9 and its therapeutic target CETP.


Sign in / Sign up

Export Citation Format

Share Document