scholarly journals Identification of putative causal loci in whole-genome sequencing data via knockoff statistics

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zihuai He ◽  
Linxi Liu ◽  
Chen Wang ◽  
Yann Le Guen ◽  
Justin Lee ◽  
...  

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

2021 ◽  
Author(s):  
Zihuai He ◽  
Linxi Liu ◽  
Chen Wang ◽  
Yann Le Guen ◽  
Justin Lee ◽  
...  

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.


2021 ◽  
Author(s):  
KE Joyce ◽  
E Onabanjo ◽  
S Brownlow ◽  
F Nur ◽  
KO Olupona ◽  
...  

ABSTRACTPossession of a clinical or molecular disease label alters the context in which life-course events operate, but rarely explains the phenotypic variability observed by clinicians. Whole genome sequencing of unselected endothelial vasculopathy patients demonstrated more than a third had rare, likely deleterious variants in clinically-relevant genes unrelated to their vasculopathy (1 in 10 within platelet genes; 1 in 8 within coagulation genes; and 1 in 4 within erythrocyte hemolytic genes). High erythrocyte membrane variant rates paralleled genomic damage and prevalence indices in the general population. In blinded analyses, patients with greater hemorrhagic severity that had been attributed solely to their vasculopathy had more deleterious variants in platelet (Spearman ρ=0.25, p=0.008) and coagulation (Spearman ρ=0.21, p=0.024) genes. We conclude that rare diseases can provide insights for medicine beyond their primary pathophysiology, and propose a framework based on rare variants to inform interpretative approaches to accelerate clinical impact from whole genome sequencing.


2020 ◽  
Author(s):  
Yingxi Yang ◽  
Yuchen Yang ◽  
Le Huang ◽  
Jai G. Broome ◽  
Adolfo Correa ◽  
...  

AbstractWith advances in whole genome sequencing (WGS) technology, multiple statistical methods for aggregate association testing have been developed. Many common approaches aggregate variants in a given genomic window of a fixed/varying size and are not reliant on existing knowledge to define appropriate test units, resulting in most identified regions not being clearly linked to genes, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to the genes they affect, can be leveraged to predefine variant sets for aggregate testing in WGS. Therefore, in this paper we propose the eSCAN (Scan the Enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG with the advantages of increased incorporation of genomic annotation. eSCAN searches biologically meaningful searching windows, increasing power and aiding biological interpretation, as demonstrated by simulation studies under a wide range of scenarios. We also apply eSCAN for association analysis of blood cell traits using TOPMed WGS data from Women’s Health Initiative (WHI) and Jackson Heart Study (JHS). Results from this real data example show that eSCAN is able to capture more significant signals, and these signals are of shorter length and drive association of larger regions detected by other methods.


Sign in / Sign up

Export Citation Format

Share Document