scholarly journals A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies

2021 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Hufeng Zhou ◽  
Sheila M Gaynor ◽  
Margaret Sunitha Selvaraj ◽  
...  

Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare variants' (RVs) associations with complex human traits. Variant set analysis is a powerful approach to study RV association, and a key component of it is constructing RV sets for analysis. However, existing methods have limited ability to define analysis units in the noncoding genome. Furthermore, there is a lack of robust pipelines for comprehensive and scalable noncoding RV association analysis. Here we propose a computationally-efficient noncoding RV association-detection framework that uses STAAR (variant-set test for association using annotation information) to group noncoding variants in gene-centric analysis based on functional categories. We also propose SCANG (scan the genome)-STAAR, which uses dynamic window sizes and incorporates multiple functional annotations, in a non-gene-centric analysis. We furthermore develop STAARpipeline to perform flexible noncoding RV association analysis, including gene-centric analysis as well as fixed-window-based and dynamic-window-based non-gene-centric analysis. We apply STAARpipeline to identify noncoding RV sets associated with four quantitative lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several noncoding RV associations in an additional 9,123 TOPMed samples.

2019 ◽  
Vol 15 ◽  
pp. P1312-P1312
Author(s):  
Badri N. Vardarajan ◽  
James Jaworski ◽  
Gary W. Beecham ◽  
Sandra Barral ◽  
Dolly Reyes-Dumeyer ◽  
...  

2019 ◽  
Vol 104 (2) ◽  
pp. 260-274 ◽  
Author(s):  
Han Chen ◽  
Jennifer E. Huffman ◽  
Jennifer A. Brody ◽  
Chaolong Wang ◽  
Seunggeun Lee ◽  
...  

2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Kristopher A. Standish ◽  
Tristan M. Carland ◽  
Glenn K. Lockwood ◽  
Wayne Pfeiffer ◽  
Mahidhar Tatineni ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zihuai He ◽  
Linxi Liu ◽  
Chen Wang ◽  
Yann Le Guen ◽  
Justin Lee ◽  
...  

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.


2021 ◽  
Author(s):  
Zihuai He ◽  
Linxi Liu ◽  
Chen Wang ◽  
Yann Le Guen ◽  
Justin Lee ◽  
...  

AbstractThe analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer’s Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.


Author(s):  
Hansi Weissensteiner ◽  
Lukas Forer ◽  
Liane Fendt ◽  
Azin Kheirkhah ◽  
Antonio Salas ◽  
...  

AbstractWithin-species contamination is a major issue in sequencing studies, especially for mitochondrial studies. Contamination can be detected by analysing the nuclear genome or by inspecting the heteroplasmic sites in the mitochondrial genome. Existing methods using the nuclear genome are computationally expensive, and no suitable tool for detecting contamination in large-scale mitochondrial datasets is available. Here we present haplocheck, a tool that requires only the mitochondrial genome to detect contamination in both mitochondrial and whole-genome sequencing studies. Haplocheck is able to distinguish between contaminated and real heteroplasmic sites using the mitochondrial phylogeny. By applying haplocheck to the 1000 Genomes Project data, we show (1) high concordance in contamination estimates between mitochondrial and nuclear DNA and (2) quantify the impact of mitochondrial copy numbers on the mitochondrial based contamination results. Haplocheck complements leading nuclear DNA based contamination tools, and can therefore be used as a proxy tool in nuclear genome studies.Haplocheck is available both as a command-line tool at https://github.com/genepi/haplocheck and as a cloud web-service producing interactive reports that facilitates the navigation through the phylogeny of contaminated samples.


2016 ◽  
Vol 94 (suppl_5) ◽  
pp. 146-146
Author(s):  
D. M. Bickhart ◽  
L. Xu ◽  
J. L. Hutchison ◽  
J. B. Cole ◽  
D. J. Null ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document