scholarly journals Utilizing mutual information for detecting rare and common variants associated with a categorical trait

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2139 ◽  
Author(s):  
Leiming Sun ◽  
Chan Wang ◽  
Yue-Qing Hu

Background.Genome-wide association studies have succeeded in detecting novel common variants which associate with complex diseases. As a result of the fast changes in next generation sequencing technology, a large number of sequencing data are generated, which offers great opportunities to identify rare variants that could explain a larger proportion of missing heritability. Many effective and powerful methods are proposed, although they are usually limited to continuous, dichotomous or ordinal traits. Notice that traits having nominal categorical features are commonly observed in complex diseases, especially in mental disorders, which motivates the incorporation of the characteristics of the categorical trait into association studies with rare and common variants.Methods.We construct two simple and intuitive nonparametric tests, MIT and aMIT, based on mutual information for detecting association between genetic variants in a gene or region and a categorical trait. MIT and aMIT can gauge the difference among the distributions of rare and common variants across a region given every categorical trait value. If there is little association between variants and a categorical trait, MIT or aMIT approximately equals zero. The larger the difference in distributions, the greater values MIT and aMIT have. Therefore, MIT and aMIT have the potential for detecting functional variants.Results.We checked the validity of proposed statistics and compared them to the existing ones through extensive simulation studies with varied combinations of the numbers of variants of rare causal, rare non-causal, common causal, and common non-causal, deleterious and protective, various minor allele frequencies and different levels of linkage disequilibrium. The results show our methods have higher statistical power than conventional ones, including the likelihood based score test, in most cases: (1) there are multiple genetic variants in a gene or region; (2) both protective and deleterious variants are present; (3) there exist rare and common variants; and (4) more than half of the variants are neutral. The proposed tests are applied to the data from Collaborative Studies on Genetics of Alcoholism, and a competent performance is exhibited therein.Discussion.As a complementary to the existing methods mainly focusing on quantitative traits, this study provides the nonparametric tests MIT and aMIT for detecting variants associated with categorical trait. Furthermore, we plan to investigate the association between rare variants and multiple categorical traits.

2014 ◽  
Vol 99 (11) ◽  
pp. E2400-E2411 ◽  
Author(s):  
Seung Hun Lee ◽  
Moo Il Kang ◽  
Seong Hee Ahn ◽  
Kyeong-Hye Lim ◽  
Gun Eui Lee ◽  
...  

Context: Osteoporotic fracture risk is highly heritable, but genome-wide association studies have explained only a small proportion of the heritability to date. Genetic data may improve prediction of fracture risk in osteopenic subjects and assist early intervention and management. Objective: To detect common and rare variants in coding and regulatory regions related to osteoporosis-related traits, and to investigate whether genetic profiling improves the prediction of fracture risk. Design and Setting: This cross-sectional study was conducted in three clinical units in Korea. Participants: Postmenopausal women with extreme phenotypes (n = 982) were used for the discovery set, and 3895 participants were used for the replication set. Main Outcome Measure: We performed targeted resequencing of 198 genes. Genetic risk scores from common variants (GRS-C) and from common and rare variants (GRS-T) were calculated. Results: Nineteen common variants in 17 genes (of the discovered 34 functional variants in 26 genes) and 31 rare variants in five genes (of the discovered 87 functional variants in 15 genes) were associated with one or more osteoporosis-related traits. Accuracy of fracture risk classification was improved in the osteopenic patients by adding GRS-C to fracture risk assessment models (6.8%; P < .001) and was further improved by adding GRS-T (9.6%; P < .001). GRS-C improved classification accuracy for vertebral and nonvertebral fractures by 7.3% (P = .005) and 3.0% (P = .091), and GRS-T further improved accuracy by 10.2% (P < .001) and 4.9% (P = .008), respectively. Conclusions: Our results suggest that both common and rare functional variants may contribute to osteoporotic fracture and that adding genetic profiling data to current models could improve the prediction of fracture risk in an osteopenic individual.


2020 ◽  
Vol 66 (1) ◽  
pp. 11-23
Author(s):  
Yukihide Momozawa ◽  
Keijiro Mizukami

AbstractGenome-wide association studies have identified >10,000 genetic variants associated with various phenotypes and diseases. Although the majority are common variants, rare variants with >0.1% of minor allele frequency have been investigated by imputation and using disease-specific custom SNP arrays. Rare variants sequencing analysis mainly revealed have played unique roles in the genetics of complex diseases in humans due to their distinctive features, in contrast to common variants. Unique roles are hypothesis-free evidence for gene causality, a precise target of functional analysis for understanding disease mechanisms, a new favorable target for drug development, and a genetic marker with high disease risk for personalized medicine. As whole-genome sequencing continues to identify more rare variants, the roles associated with rare variants will also increase. However, a better estimation of the functional impact of rare variants across whole genome is needed to enhance their contribution to improvements in human health.


2019 ◽  
Vol 29 (2) ◽  
pp. 589-602
Author(s):  
Chan Wang ◽  
Shufang Deng ◽  
Leiming Sun ◽  
Liming Li ◽  
Yue-Qing Hu

The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance–covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.


2017 ◽  
Vol 37 (suppl_1) ◽  
Author(s):  
Jacqueline S Dron ◽  
Jian Wang ◽  
Cécile Low-Kam ◽  
Sumeet A Khetarpal ◽  
John F Robinson ◽  
...  

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.


2011 ◽  
Vol 26 (S2) ◽  
pp. 1346-1346
Author(s):  
D. Benmessaoud ◽  
A.-M. Lepagnol-Bestel ◽  
M. Delepine ◽  
J. Hager ◽  
J.-M. Moalic ◽  
...  

Genome wide association studies (GWAS) of Schizophrenia (SZ) patients have identified common variants in ten genes including SMARCA2 (Koga et al., HMG, 2009). We found that the SZ-GWAS genes are part of an interacting network centered on SMARCA2 (Loe-Mie et al., HMG, 2010). Furthermore, SMARCA2 was found disrupted in SZ (Walsh et al., Science, 2008). SMARCA2 encodes the ATPase (BRM) of the SWI/SNF chromatin remodeling complex that is at the interface of genome and environmental adaptation.Taking advantage of an Algerian trio cohort of one hundred SZ patients (Benmessaoud et al., BMC Psychiatry, 2008), we replicated the association of SNP rs2296212 localized in exon 33, already shown associated in Koga study and resulting in D1546E amino acid change in the SMARCA2 protein. We studied SMARCA2 codons and found that exon 33 displays a signature of positive evolution in the primate lineage.Our working hypothesis is that the coding regions displaying positive selection are target of novel rare variants. To address this question, we sequenced two exons displaying positive evolution and one exon without evidence of positive evolution.We found (i) that rare variants are significantly in excess in SZ-patients compared to their parents (p = 0.038, Fisher test) and (ii) a higher proportion of rare variants in the primate-accelerated exons compared with the non-evolutionary exon in SZ-patients (p = 0.032, Fisher test).SMARCA2 exon sequencing and whole exome sequencing from patients harboring SNP rs2296212 common variant are under progress. Altogether, these results are expected to give new insights into the genetic architecture of SZ.


2011 ◽  
Vol 26 (S2) ◽  
pp. 2007-2007
Author(s):  
J. Mendlewicz

The lifetime prevalence of mood disorders is estimated around 20% in the general population leading to a main cause of disability worldwide and a major public health issue.1 The ethiology of mood disorders is still unknown, but its various phenotypes are believed to be caused by multiple genetic variants interacting in a complex way with environmental vulnerability factors. Therefore, the identification of biomarkers and environmental markers is crucial to improve our understanding and diagnosis as well as our treatments. Despite intensive and costly research for more than two decades to unravel suceptibility genes, although pathophysiological pathways of interest have been recongnized, results have not been consistant so far and not a single genetic biomarker of depression has been identified and replicated. More recent systematic genome-wide association studies (GWAS) have reported weak associations of some genetic variants in large samples, but multiple rare variants may concur to confer only part of the suceptibility to depression. Structural variations may also be considered to be promising as is the case for copy-number-variations (CNVs). Methodological issues and limitations will also be critically discussed in light of the complexity of gene-evironment interactions (epigenetic modulation of gene expression)2 and in relation to future prospects for individualized pharmacotherapy of depressive illness.


2019 ◽  
Vol 39 (10) ◽  
Author(s):  
Tae-Joon Park ◽  
Heun-Sik Lee ◽  
Young Jin Kim ◽  
Bong-Jo Kim

Abstract Metabolome-genome wide association studies (mGWASs) are useful for understanding the genetic regulation of metabolites in complex diseases, including type 2 diabetes (T2D). Numerous genetic variants associated with T2D-related metabolites have been identified in previous mGWASs; however, these analyses seem to have difficulty in detecting the genetic variants with functional effects. An exome array focussed on potentially functional variants is an alternative platform to obtain insight into the genetics of biochemical conversion processes. In the present study, we performed an mGWAS using 27,140 non-synonymous variants included in the Illumina HumanExome BeadChip and nine T2D-related metabolites identified by a targetted metabolomics approach to evaluate 2,338 Korean individuals from the Korea Association REsource (KARE) cohort. A linear regression analysis controlling for age, sex, BMI, and T2D status as covariates was performed to identify novel non-synonymous variants associated with T2D-related metabolites. We found significant associations between glycine and CPS1 (rs1047883) and PC ae C36:0 and CYP4F2 (rs2108622) variants (P<2.05 × 10−7, after the Bonferroni correction for multiple testing). One of the two significantly associated variants, rs1047883 was newly identified whereas rs2108622 had been previously reported to be associated with T2D-related traits. These findings expand our understanding of the genetic determinants of T2D-related metabolites and provide a basis for further functional validation.


2018 ◽  
Vol 100 ◽  
Author(s):  
LILI CHEN ◽  
YONG WANG ◽  
YAJING ZHOU

SummaryPleiotropy, the effect of one variant on multiple traits, is widespread in complex diseases. Joint analysis of multiple traits can improve statistical power to detect genetic variants and uncover the underlying genetic mechanism. Currently, a large number of existing methods target one common variant or only rare variants. Increasing evidence shows that complex diseases are caused by common and rare variants. Here we propose a region-based method to test both rare and common variant associated multiple traits based on variable reduction method (abbreviated as MULVR). However, in the presence of noise traits, the MULVR method may lose power, so we propose the MULVR-O method, which jointly analyses the optimal number of traits associated with genetic variants by the MULVR method, to guard against the effect of noise traits. Extensive simulation studies show that our proposed method (MULVR-O) is applied to not only multiple quantitative traits but also qualitative traits, and is more powerful than several other comparison methods in most scenarios. An application to the two genes (SHBG and CHRM3) and two phenotypes (systolic blood pressure and diastolic blood pressure) from the GAW19 dataset illustrates that our proposed methods (MULVR and MULVR-O) are feasible and efficient as a region-based method.


2012 ◽  
Vol 6 ◽  
pp. BBI.S8852 ◽  
Author(s):  
Ao Yuan ◽  
Guanjie Chen ◽  
Yanxun Zhou ◽  
Amy Bentley ◽  
Charles Rotimi

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.


Sign in / Sign up

Export Citation Format

Share Document