Association score testing for rare variants and binary traits in family data with shared controls

AbstractRecognizing that family data provide unique advantage of identifying rare risk variants in genetic association studies, many cohorts with related samples have gone through whole genome sequencing in large initiatives such as the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Analyzing rare variants poses challenges for binary traits in that some genotype categories may have few or no observed events, causing bias and inflation in commonly used methods. Several methods have recently been proposed to better handle rare variants while accounting for family relationship, but their performances have not been thoroughly evaluated together. Here we compare several existing approaches including SAIGE but not limited to related samples using simulations based on the Framingham Heart Study samples and genotype data from Illumina HumanExome BeadChip where rare variants are the majority. We found that logistic regression with likelihood ratio test applied to related samples was the only approach that did not have inflated type I error rates in both single variant test (SVT) and gene-based tests, followed by Firth logistic regression that had inflation in its direction insensitive gene-based test at prevalence 0.01 only, applied to either related or unrelated samples, though theoretically logistic regression and Firth logistic regression do not account for relatedness in samples. SAIGE had inflation in SVT at prevalence 0.1 or lower and the inflation was eliminated with a minor allele count filter of 5. As for power, there was no approach that outperformed others consistently among all single variant tests and gene-based tests.

Download Full-text

Assessing the Contribution Family Data Can Make to Case-Control Studies of Rare Variants

Annals of Human Genetics ◽

10.1111/j.1469-1809.2011.00660.x ◽

2011 ◽

Vol 75 (5) ◽

pp. 630-638 ◽

Cited By ~ 8

Author(s):

David Curtis

Keyword(s):

Rare Variants ◽

Case Control ◽

Family Data ◽

Case Control Studies

Download Full-text

Challenges and directions: an analysis of Genetic Analysis Workshop 17 data by collapsing rare variants within family data

BMC Proceedings ◽

10.1186/1753-6561-5-s9-s30 ◽

2011 ◽

Vol 5 (S9) ◽

Cited By ~ 2

Author(s):

Peng Lin ◽

Michael Hamm ◽

Sarah Hartz ◽

Zhehao Zhang ◽

John P Rice

Keyword(s):

Genetic Analysis ◽

Genetic Analysis Workshop ◽

Rare Variants ◽

Family Data

Download Full-text

FastGWA-GLMM: a generalized linear mixed model association tool for biobank-scale data

10.21203/rs.3.rs-128758/v1 ◽

2021 ◽

Author(s):

Jian Yang ◽

Longda Jiang ◽

Zhili Zheng

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Rare Variants ◽

Sparse Matrix ◽

Extreme Case ◽

Generalized Linear Mixed Model ◽

Binary Complex ◽

Binary Traits ◽

The Uk

Abstract Compared to linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. Here, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool (called fastGWA-GLMM) that is orders of magnitude faster than the state-of-the-art tool (e.g., ~37 times faster when n=400,000) with more scalable memory usage. We show by simulation that the fastGWA-GLMM test-statistics of both common and rare variants are well-calibrated under the null, even for traits with an extreme case-control ratio (e.g., 0.1%). We applied fastGWA-GLMM to the UK Biobank data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin) and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.

Download Full-text