scholarly journals Detecting Disease Association With Rare Variants Using Weighted Entropy

Author(s):  
Yumei Li ◽  
Xinrong Xiang ◽  
Yang Xiang

Abstract BackgroundThe rapid development of sequencing technology and simultaneously the availability of large quantities of sequence data provides an unprecedented opportunity for researchers to conduct studies to detect rare variants associated with the disease. However, none of current existing statistical methods has uniform power in all scenarios because they more or less are affected by nonfunctional variants and variants with opposite effect. The present study focuses on identifying rare variant associated with the disease.Resultswe present a robust approach to identify rare variant using weighted entropy theory.This approach here takes the proportion of the minor allele among all k variants as its probability distribution, which reduces the noise incurred by non-causal variants, and uses a weight to strike a balance between deleterious rare variants and protective rare variants, which makes our method impacted less by variants with opposite effect. Through simulation studies, we investigate the performance of our method for rare variant association analyses as well as for common variant association analyses and compared it with Burden test and the SKAT-O test. Simulation study show that the proposed method is valid and outperform two existing methods. Meanwhile, the proposed method is affected slightly by non-causal variants and opposite effect variants with high and stable power for various paraments set.ConclusionsWe conclude that the proposed method here can be used effectively to detect rare variant associated with the disease.

2021 ◽  
Author(s):  
Megan Null ◽  
Josée Dupuis ◽  
Christopher R. Gignoux ◽  
Audrey E. Hendricks

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.


2017 ◽  
Author(s):  
Xiaowei Zhan ◽  
Sai Chen ◽  
Yu Jiang ◽  
Mengzhen Liu ◽  
William G. Iacono ◽  
...  

AbstractMotivation:There is great interest to understand the impact of rare variants in human diseases using large sequence datasets. In deep sequences datasets of >10,000 samples, ∼10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results.Results:We propose novel methods to encode multi-allelic sites, conduct single variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ∼18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single variant association tests, and enhanced gene-level tests over existing approaches.Availability:Software packages implementing these methods are available at (https://github.com/zhanxw/rvtestshttp://genome.sph.umich.edu/wiki/RareMETAL).Contact:[email protected]; [email protected]


2019 ◽  
Author(s):  
Elizabeth T. Cirulli ◽  
Simon White ◽  
Robert W. Read ◽  
Gai Elhanan ◽  
William J Metcalf ◽  
...  

Defining the effects that rare variants can have on human phenotypes is essential to advancing our understanding of human health and disease. Large-scale human genetic analyses have thus far focused on common variants, but the development of large cohorts of deeply phenotyped individuals with exome sequence data has now made comprehensive analyses of rare variants possible. We analyzed the effects of rare (MAF<0.1%) variants on 3,166 phenotypes in 40,468 exome-sequenced individuals from the UK Biobank and performed replication as well as meta-analyses with 1,067 phenotypes in 13,470 members of the Healthy Nevada Project (HNP) cohort who underwent Exome+ sequencing at Helix. Our analyses of non-benign coding and loss of function (LoF) variants identified 78 gene-based associations that passed our statistical significance threshold (p<5×10-9). These are associations in which carrying any rare coding or LoF variant in the gene is associated with an enrichment for a specific phenotype, as opposed to GWAS-based associations of strictly single variants. Importantly, our results do not suffer from the test statistic inflation that is often seen with rare variant analyses of biobank-scale data because of our rare variant-tailored methodology, which includes a step that optimizes the carrier frequency threshold for each phenotype based on prevalence. Of the 47 discovery associations whose phenotypes were represented in the replication cohort, 98% showed effects in the expected direction, and 45% attained formal replication significance (p<0.001). Six additional significant associations were identified in our meta-analysis of both cohorts. Among the results, we confirm known associations of PCSK9 and APOB variation with LDL levels; we extend knowledge of variation in the TYRP1 gene, previously associated with blonde hair color only in Solomon Islanders to blonde hair color in individuals of European ancestry; we show that PAPPA, a gene in which common variants had previously associated with height via GWAS, contains rare variants that decrease height; and we make the novel discovery that STAB1 variation is associated with blood flow in the brain. Our results are available for download and interactive browsing in an app (https://ukb.research.helix.com). This comprehensive analysis of the effects of rare variants on human phenotypes marks one of the first steps in the next big phase of human genetics, where large, deeply phenotyped cohorts with next generation sequence data will elucidate the effects of rare variants.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 586
Author(s):  
Yu Jiang ◽  
Sai Chen ◽  
Xingyan Wang ◽  
Mengzhen Liu ◽  
William G. Iacono ◽  
...  

There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.


2019 ◽  
Vol 101 ◽  
Author(s):  
Lifeng Liu ◽  
Pengfei Wang ◽  
Jingbo Meng ◽  
Lili Chen ◽  
Wensheng Zhu ◽  
...  

Abstract In recent years, there has been an increasing interest in detecting disease-related rare variants in sequencing studies. Numerous studies have shown that common variants can only explain a small proportion of the phenotypic variance for complex diseases. More and more evidence suggests that some of this missing heritability can be explained by rare variants. Considering the importance of rare variants, researchers have proposed a considerable number of methods for identifying the rare variants associated with complex diseases. Extensive research has been carried out on testing the association between rare variants and dichotomous, continuous or ordinal traits. So far, however, there has been little discussion about the case in which both genotypes and phenotypes are ordinal variables. This paper introduces a method based on the γ-statistic, called OV-RV, for examining disease-related rare variants when both genotypes and phenotypes are ordinal. At present, little is known about the asymptotic distribution of the γ-statistic when conducting association analyses for rare variants. One advantage of OV-RV is that it provides a robust estimation of the distribution of the γ-statistic by employing the permutation approach proposed by Fisher. We also perform extensive simulations to investigate the numerical performance of OV-RV under various model settings. The simulation results reveal that OV-RV is valid and efficient; namely, it controls the type I error approximately at the pre-specified significance level and achieves greater power at the same significance level. We also apply OV-RV for rare variant association studies of diastolic blood pressure.


2020 ◽  
Author(s):  
Amol C. Shetty ◽  
Jeffrey O’Connell ◽  
Braxton D. Mitchell ◽  
Timothy D. O’Connor ◽  
◽  
...  

AbstractMotivationThe global human population has experienced an explosive growth from a few million to roughly 7 billion people in the last 10,000 years. Accompanying this growth has been the accumulation of rare variants that can inform our understanding of human evolutionary history. Common variants have primarily been used to infer the structure of the human population and relatedness between two individuals. However, with the increasing abundance of rare variants observed in large-scale projects, such as Trans-Omics for Precision Medicine (TOPMed), the use of rare variants to decipher cryptic relatedness and fine-scale population structure can be beneficial to the study of population demographics and association studies. Identity-by-descent (IBD) is an important framework used for identifying these relationships. IBD segments are broken down by recombination over time, such that longer shared haplotypes give strong evidence of recent relatedness while shorter shared haplotypes are indicative of more distant relationships. Current methods to identify IBD accurately detect only long segments (> 2cM) found in related individuals.AlgorithmWe describe a metric that leverages rare-variants shared between individuals to improve the detection of short IBD segments. We computed IBD segments using existing methods implemented in Refined IBD where we enrich the signal using our metric that facilitates the detection of short IBD segments (<2cM) by explicitly incorporating rare variants.ResultsTo test our new metric, we simulated datasets involving populations with varying divergent time-scales. We show that rare-variant IBD identifies shorter segments with greater confidence and enables the detection of older divergence between populations. As an example, we applied our metric to the Old-Order Amish cohort with known genealogies dating 14 generations back to validate its ability to detect genetic relatedness between distant relatives. This analysis shows that our method increases the accuracy of identifying shorter segments that in turn capture distant relationships.ConclusionsWe describe a method to enrich the detection of short IBD segments using rare-variant sharing within IBD segments. Leveraging rare-variant sharing improves the information content of short IBD segments better than common variants alone. We validated the method in both simulated and empirical datasets. This method can benefit association analyses, IBD mapping analyses, and demographic inferences.


BMC Genetics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yang Xiang ◽  
Xinrong Xiang ◽  
Yumei Li

Abstract Background The rapid development of sequencing technology and simultaneously the availability of large quantities of sequence data has facilitated the identification of rare variant associated with quantitative traits. However, existing statistical methods depend on certain assumptions and thus lacking uniform power. The present study focuses on mapping rare variant associated with quantitative traits. Results In the present study, we proposed a two-stage strategy to identify rare variant of quantitative traits using phenotype extreme selection design and Kullback-Leibler distance, where the first stage was association analysis and the second stage was fine mapping. We presented a statistic and a linkage disequilibrium measure for the first stage and the second stage, respectively. Theory analysis and simulation study showed that (1) the power of the proposed statistic for association analysis increased with the stringency of the sample selection and was affected slightly by non-causal variants and opposite effect variants, (2) the statistic here achieved higher power than three commonly used methods, and (3) the linkage disequilibrium measure for fine mapping was independent of the frequencies of non-causal variants and simply dependent on the frequencies of causal variants. Conclusions We conclude that the two-stage strategy here can be used effectively to mapping rare variant associated with quantitative traits.


2015 ◽  
Author(s):  
Yi-Juan Hu ◽  
Peizhou Liao ◽  
Henry Richard Johnston ◽  
Andrew Allen ◽  
Glen Satten

Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, when testing the association between rare variants and traits of interest, the current practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths or on different platforms. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available.


2018 ◽  
Author(s):  
Robert M. Porsch ◽  
Timothy Mak ◽  
Clara Tang ◽  
Pak C. Sham

AbstractA number of rare variant tests have been developed to explore the effect of low frequency genetic variations on complex phenotypes. However, an often neglected aspect in these tests is the position of genetic variations. Here we are proposing a way to assess the differences in spatial organization of rare variants by assessing their distributional differences between affected and unaffected subjects. To do so, we have formulated an adaptation of the well know Kolmogorov-Smirnov (KS) test, combining both KS and a simple gene burden approach, called KS-Burden.The performance of our test was evaluated under a comprehensive simulations framework using real data and various scenarios. Our results show that the KS-Burden test is able to outperform the commonly used SKAT-O test, as well as others, in the presents of clusters of causal variants within a genomic region. Furthermore, our test is able to maintain competitive statistical power in scenarios unfavorable to its original assumptions. Hence, the KS-Burden test is a valuable alternative to existing tests and provides better statistical power in the presents of causal clusters within a gene.


Sign in / Sign up

Export Citation Format

Share Document