Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data

Mapping Intimacies ◽

10.1101/659235 ◽

2019 ◽

Cited By ~ 4

Author(s):

C.J. Battey ◽

Peter L. Ralph ◽

Andrew D. Kern

Keyword(s):

Population Genetic ◽

Isolation By Distance ◽

Association Studies ◽

Genetic Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Mixed Populations ◽

Demographic Inference

ABSTRACTReal geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies. We find that most common summary statistics have distributions that differ substantially from that seen in well-mixed populations, especially when Wright’s neighborhood size is less than 100 and sampling is spatially clustered. Stepping-stone models reproduce some of these effects, but discretizing the landscape introduces artifacts which in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations were surprisingly robust to isolation by distance. We also show that the combination of spatially autocorrelated environments and limited dispersal causes genome-wide association studies to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.

Space is the Place: Effects of Continuous Spatial Structure on Analysis of Population Genetic Data

Genetics ◽

10.1534/genetics.120.303143 ◽

2020 ◽

Vol 215 (1) ◽

pp. 193-214 ◽

Cited By ~ 7

Author(s):

C. J. Battey ◽

Peter L. Ralph ◽

Andrew D. Kern

Keyword(s):

Population Genetic ◽

Association Studies ◽

Sampling Strategy ◽

Genetic Data ◽

Systematic Bias ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Limited Dispersal ◽

Mixed Populations ◽

Demographic Inference

Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright’s neighborhood size is < 100 and sampling is spatially clustered. “Stepping-stone” models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Genetic Data Analysis

Handbook of Research on Disease Prediction Through Data Analytics and Machine Learning - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-7998-2742-9.ch017 ◽

2021 ◽

pp. 358-372

Author(s):

M. Shamila ◽

Amit Kumar Tyagi

Keyword(s):

Data Analysis ◽

Association Studies ◽

Single Gene ◽

Genetic Data ◽

Building Blocks ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Human Beings ◽

Genome Wide ◽

Genetic Data Analysis

Genome-wide association studies (GWAS) or genetic data analysis is used to discover common genetic factors which influence the health of human beings and become a part of a disease. The concept of using genomics has increased in recent years, especially in e-healthcare. Today there is huge improvement required in this field or genomics. Note that the terms genomics and genetics are not similar terms here. Basically, the human genome is made up of DNA, which consists of four different chemical building blocks (called bases and abbreviated A, T, C, and G). Based on this, we differentiate each and every human being living on earth. The term ‘genetics' originated from the Greek word ‘genetikos'. It means ‘origin'. In simple terms, genetics can be defined as a branch of biology, which deals with the study of the functionalities and composition of a single gene in an organism. There are mainly three branches of genetics, which include classical genetics, molecular genetics, and population genetics.

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007565 ◽

2020 ◽

Vol 16 (2) ◽

pp. e1007565 ◽

Cited By ~ 1

Author(s):

Shuang Song ◽

Wei Jiang ◽

Lin Hou ◽

Hongyu Zhao

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Summary Statistics ◽

Polygenic Risk ◽

Genome Wide

Genome-Wide Association Studies of Schizophrenia and Bipolar Disorder in a Diverse Cohort of US Veterans

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa133 ◽

2020 ◽

Author(s):

Tim B Bigdeli ◽

Ayman H Fanous ◽

Yuli Li ◽

Nallakkandi Rajeevan ◽

Frederick Sayward ◽

...

Keyword(s):

Bipolar Disorder ◽

Association Studies ◽

Genome Wide Association ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Susceptibility Loci ◽

New Associations ◽

Genome Wide ◽

Us Veterans

Abstract Background Schizophrenia (SCZ) and bipolar disorder (BIP) are debilitating neuropsychiatric disorders, collectively affecting 2% of the world’s population. Recognizing the major impact of these psychiatric disorders on the psychosocial function of more than 200 000 US Veterans, the Department of Veterans Affairs (VA) recently completed genotyping of more than 8000 veterans with SCZ and BIP in the Cooperative Studies Program (CSP) #572. Methods We performed genome-wide association studies (GWAS) in CSP #572 and benchmarked the predictive value of polygenic risk scores (PRS) constructed from published findings. We combined our results with available summary statistics from several recent GWAS, realizing the largest and most diverse studies of these disorders to date. Results Our primary GWAS uncovered new associations between CHD7 variants and SCZ, and novel BIP associations with variants in Sortilin Related VPS10 Domain Containing Receptor 3 (SORCS3) and downstream of PCDH11X. Combining our results with published summary statistics for SCZ yielded 39 novel susceptibility loci including CRHR1, and we identified 10 additional findings for BIP (28 326 cases and 90 570 controls). PRS trained on published GWAS were significantly associated with case-control status among European American (P < 10–30) and African American (P < .0005) participants in CSP #572. Conclusions We have demonstrated that published findings for SCZ and BIP are robustly generalizable to a diverse cohort of US veterans. Leveraging available summary statistics from GWAS of global populations, we report 52 new susceptibility loci and improved fine-mapping resolution for dozens of previously reported associations.

metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Bioinformatics ◽

10.1093/bioinformatics/btw052 ◽

2016 ◽

Vol 32 (13) ◽

pp. 1981-1989 ◽

Cited By ~ 66

Author(s):

Anna Cichonska ◽

Juho Rousu ◽

Pekka Marttinen ◽

Antti J. Kangas ◽

Pasi Soininen ◽

...

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide

A unified framework for variance component estimation with summary statistics in genome-wide association studies

The Annals of Applied Statistics ◽

10.1214/17-aoas1052 ◽

2017 ◽

Vol 11 (4) ◽

pp. 2027-2051 ◽

Cited By ~ 42

Author(s):

Xiang Zhou

Keyword(s):

Variance Component ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Variance Component Estimation ◽

Summary Statistics ◽

Unified Framework ◽

Genome Wide ◽

Component Estimation

Multiple phenotype association tests using summary statistics in genome-wide association studies

Biometrics ◽

10.1111/biom.12735 ◽

2017 ◽

Vol 74 (1) ◽

pp. 165-175 ◽

Cited By ~ 19

Author(s):

Zhonghua Liu ◽

Xihong Lin

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Association Tests ◽

Genome Wide ◽

Multiple Phenotype

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model

10.1101/133132 ◽

2017 ◽

Cited By ~ 8

Author(s):

Dominic Holland ◽

Oleksandr Frei ◽

Rahul Desikan ◽

Chun-Chieh Fan ◽

Alexey A. Shadrin ◽

...

Keyword(s):

Association Studies ◽

Causal Snps ◽

Reference Panel ◽

Causal Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Common Variants ◽

Genome Wide ◽

Causal Variants

AbstractEstimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities ranging from ≃ 2 × 10−5to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.Author SummaryThere are ~10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype.Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ~11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants – we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities – whose product allows for lower bound estimates of heritability – vary by orders of magnitude.