Low-coverage sequencing: Implications for design of complex trait association studies

AbstractMotivationVery low depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterisation of the genotype quality and association power for very low depth sequencing designs is still lacking.ResultsWe perform cohort-wide whole genome sequencing (WGS) at low depth in 1,239 individuals (990 at 1x depth and 249 at 4x depth) from an isolated population, and establish a robust pipeline for calling and imputing very low depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (WES, 75x depth) and high-depth (22x) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1x WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1x further allowed the discovery of 140,844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design.Supplementary DataSupplementary Data are appended to this manuscript.

Download Full-text

Discovery of Rare Variants via Sequencing: Implications for the Design of Complex Trait Association Studies

PLoS Genetics ◽

10.1371/journal.pgen.1000481 ◽

2009 ◽

Vol 5 (5) ◽

pp. e1000481 ◽

Cited By ~ 101

Author(s):

Bingshan Li ◽

Suzanne M. Leal

Keyword(s):

Rare Variants ◽

Association Studies ◽

Complex Trait ◽

Trait Association

Download Full-text

Replication Strategies for Rare Variant Complex Trait Association Studies via Next-Generation Sequencing

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2010.10.025 ◽

2010 ◽

Vol 87 (6) ◽

pp. 790-801 ◽

Cited By ~ 45

Author(s):

Dajiang J. Liu ◽

Suzanne M. Leal

Keyword(s):

Next Generation Sequencing ◽

Rare Variant ◽

Association Studies ◽

Complex Trait ◽

Next Generation ◽

Trait Association ◽

Generation Sequencing

Download Full-text

Very low-depth whole-genome sequencing in complex trait association studies

Bioinformatics ◽

10.1093/bioinformatics/bty1032 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2555-2561 ◽

Cited By ~ 17

Author(s):

Arthur Gilly ◽

Lorraine Southam ◽

Daniel Suveges ◽

Karoline Kuchenbaecker ◽

Rachel Moore ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Low Frequency ◽

Cost Effective ◽

Complex Trait ◽

Supplementary Information ◽

Whole Genome ◽

Trait Association ◽

High Depth

Abstract Motivation Very low-depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterization of the genotype quality and association power for very low-depth sequencing designs is still lacking. Results We perform cohort-wide whole-genome sequencing (WGS) at low depth in 1239 individuals (990 at 1× depth and 249 at 4× depth) from an isolated population, and establish a robust pipeline for calling and imputing very low-depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (75× depth) and high-depth (22×) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1× WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1× further allowed the discovery of 140 844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low-depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design. Availability and implementation The HELIC genotype and WGS datasets have been deposited to the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/home): EGAD00010000518; EGAD00010000522; EGAD00010000610; EGAD00001001636, EGAD00001001637. The peakplotter software is available at https://github.com/wtsi-team144/peakplotter, the transformPhenotype app can be downloaded at https://github.com/wtsi-team144/transformPhenotype. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Faculty Opinions recommendation of Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717960893.793463835 ◽

2012 ◽

Author(s):

Nicola Mulder

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Low Coverage

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Efficient phasing and imputation of low-coverage sequencing data using large reference panels

10.1101/2020.04.14.040329 ◽

2020 ◽

Cited By ~ 2

Author(s):

S. Rubinacci ◽

D.M. Ribeiro ◽

R. Hofmeister ◽

O. Delaneau

Keyword(s):

Paradigm Shift ◽

Rare Variants ◽

Association Studies ◽

Cost Effective ◽

Human Populations ◽

Sequencing Data ◽

Snp Arrays ◽

Genomic Studies ◽

Low Coverage ◽

The Impact

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.

Download Full-text

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples

Genetics ◽

10.1534/genetics.118.301865 ◽

2019 ◽

Vol 212 (3) ◽

pp. 919-929

Author(s):

Daniel A. Skelly ◽

Narayanan Raghupathy ◽

Raymond F. Robledo ◽

Joel H. Graber ◽

Elissa J. Chesler

Keyword(s):

Gene Expression ◽

Canonical Correlation ◽

Complex Traits ◽

Behavioral Genetics ◽

Association Studies ◽

Complex Trait ◽

Integrated Analysis ◽

Data Set ◽

Trait Analysis ◽

Molecular Features

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

Download Full-text