A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies

Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare variants' (RVs) associations with complex human traits. Variant set analysis is a powerful approach to study RV association, and a key component of it is constructing RV sets for analysis. However, existing methods have limited ability to define analysis units in the noncoding genome. Furthermore, there is a lack of robust pipelines for comprehensive and scalable noncoding RV association analysis. Here we propose a computationally-efficient noncoding RV association-detection framework that uses STAAR (variant-set test for association using annotation information) to group noncoding variants in gene-centric analysis based on functional categories. We also propose SCANG (scan the genome)-STAAR, which uses dynamic window sizes and incorporates multiple functional annotations, in a non-gene-centric analysis. We furthermore develop STAARpipeline to perform flexible noncoding RV association analysis, including gene-centric analysis as well as fixed-window-based and dynamic-window-based non-gene-centric analysis. We apply STAARpipeline to identify noncoding RV sets associated with four quantitative lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several noncoding RV associations in an additional 9,123 TOPMed samples.

Download Full-text

P4-097: RARE VARIANTS IN FAMILIAL LATE-ONSET ALZHEIMER'S DISEASE IDENTIFIED FROM LARGE SCALE WHOLE GENOME SEQUENCING

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.06.3757 ◽

2019 ◽

Vol 15 ◽

pp. P1312-P1312

Author(s):

Badri N. Vardarajan ◽

James Jaworski ◽

Gary W. Beecham ◽

Sandra Barral ◽

Dolly Reyes-Dumeyer ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Rare Variants ◽

Late Onset ◽

Whole Genome

Download Full-text

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale

Nature Genetics ◽

10.1038/s41588-020-0676-4 ◽

2020 ◽

Vol 52 (9) ◽

pp. 969-983 ◽

Cited By ~ 5

Author(s):

Xihao Li ◽

◽

Zilin Li ◽

Hufeng Zhou ◽

Sheila M. Gaynor ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Association Analysis ◽

Genome Sequencing ◽

Rare Variant ◽

In Silico ◽

Whole Genome ◽

Rare Variant Association ◽

Functional Annotations ◽

Sequencing Studies

Download Full-text

Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2018.12.012 ◽

2019 ◽

Vol 104 (2) ◽

pp. 260-274 ◽

Cited By ~ 26

Author(s):

Han Chen ◽

Jennifer E. Huffman ◽

Jennifer A. Brody ◽

Chaolong Wang ◽

Seunggeun Lee ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Mixed Model ◽

Whole Genome ◽

Association Tests ◽

Binary Traits ◽

Sequencing Studies

Download Full-text

Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies

BMC Bioinformatics ◽

10.1186/s12859-015-0736-4 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 11

Author(s):

Kristopher A. Standish ◽

Tristan M. Carland ◽

Glenn K. Lockwood ◽

Wayne Pfeiffer ◽

Mahidhar Tatineni ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Variant Calling ◽

Whole Genome ◽

Next Generation ◽

Sequencing Studies

Download Full-text

Identification of putative causal loci in whole-genome sequencing data via knockoff statistics

Nature Communications ◽

10.1038/s41467-021-22889-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zihuai He ◽

Linxi Liu ◽

Chen Wang ◽

Yann Le Guen ◽

Justin Lee ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variants ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Association Tests ◽

Sequencing Project ◽

Risk Variants ◽

Sequencing Studies

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

Download Full-text

Identification of putative causal loci in whole-genome sequencing data via knockoff statistics

10.1101/2021.03.08.434451 ◽

2021 ◽

Author(s):

Zihuai He ◽

Linxi Liu ◽

Chen Wang ◽

Yann Le Guen ◽

Justin Lee ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variants ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Association Tests ◽

Sequencing Project ◽

Risk Variants ◽

Sequencing Studies

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

Download Full-text

Faculty Opinions recommendation of Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.738541309.793579562 ◽

2020 ◽

Author(s):

Inês Barroso ◽

Eleanor Wheeler

Keyword(s):

Whole Genome Sequencing ◽

Association Analysis ◽

Genome Sequencing ◽

Rare Variant ◽

In Silico ◽

Whole Genome ◽

Rare Variant Association ◽

Functional Annotations ◽

Sequencing Studies

Download Full-text

Haplocheck: Phylogeny-based Contamination Detection in Mitochondrial and Whole-Genome Sequencing Studies

10.1101/2020.05.06.080952 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hansi Weissensteiner ◽

Lukas Forer ◽

Liane Fendt ◽

Azin Kheirkhah ◽

Antonio Salas ◽

...

Keyword(s):

Mitochondrial Genome ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Nuclear Dna ◽

Nuclear Genome ◽

Whole Genome ◽

Sequencing Studies ◽

Project Data ◽

The Impact

AbstractWithin-species contamination is a major issue in sequencing studies, especially for mitochondrial studies. Contamination can be detected by analysing the nuclear genome or by inspecting the heteroplasmic sites in the mitochondrial genome. Existing methods using the nuclear genome are computationally expensive, and no suitable tool for detecting contamination in large-scale mitochondrial datasets is available. Here we present haplocheck, a tool that requires only the mitochondrial genome to detect contamination in both mitochondrial and whole-genome sequencing studies. Haplocheck is able to distinguish between contaminated and real heteroplasmic sites using the mitochondrial phylogeny. By applying haplocheck to the 1000 Genomes Project data, we show (1) high concordance in contamination estimates between mitochondrial and nuclear DNA and (2) quantify the impact of mitochondrial copy numbers on the mitochondrial based contamination results. Haplocheck complements leading nuclear DNA based contamination tools, and can therefore be used as a proxy tool in nuclear genome studies.Haplocheck is available both as a command-line tool at https://github.com/genepi/haplocheck and as a cloud web-service producing interactive reports that facilitates the navigation through the phylogeny of contaminated samples.

Download Full-text