scholarly journals PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects

2019 ◽  
Vol 35 (17) ◽  
pp. 3046-3054 ◽  
Author(s):  
Anastasia Gurinovich ◽  
Harold Bae ◽  
John J Farrell ◽  
Stacy L Andersen ◽  
Stefano Monti ◽  
...  

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Shuquan Rao ◽  
Yao Yao ◽  
Daniel E. Bauer

AbstractGenome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 40 (D1) ◽  
pp. D1047-D1054 ◽  
Author(s):  
Mulin Jun Li ◽  
Panwen Wang ◽  
Xiaorong Liu ◽  
Ee Lyn Lim ◽  
Zhangyong Wang ◽  
...  

2014 ◽  
Vol 94 (5) ◽  
pp. 662-676 ◽  
Author(s):  
Hugues Aschard ◽  
Bjarni J. Vilhjálmsson ◽  
Nicolas Greliche ◽  
Pierre-Emmanuel Morange ◽  
David-Alexandre Trégouët ◽  
...  

2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 44 (D1) ◽  
pp. D869-D876 ◽  
Author(s):  
Mulin Jun Li ◽  
Zipeng Liu ◽  
Panwen Wang ◽  
Maria P. Wong ◽  
Matthew R. Nelson ◽  
...  

Author(s):  
Yun Li ◽  
George T. O’Connor ◽  
Josée Dupuis ◽  
Eric Kolaczyk

AbstractIn genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
M Oguri ◽  
K Kato ◽  
H Horibe ◽  
T Fujimaki ◽  
J Sakuma ◽  
...  

Abstract Background Early-onset coronary artery disease (CAD) has a strong genetic component. Although genome-wide association studies have identified various genes and loci significantly associated with CAD mainly in European ancestry populations, genetic variants that contribute to susceptibility to this condition in Japanese individuals remain to be identified definitively. Purpose The purpose of the study was to identify genetic variants that confer susceptibility to early-onset CAD in Japanese. We have now performed exome-wide association studies (EWASs) in subjects with early-onset CAD and controls. Methods A total of 7256 individuals aged ≤65 years was enrolled in the study. The EWAS was conducted with 1482 subjects with CAD and 5774 controls. Genotyping of single nucleotide polymorphisms (SNPs) was performed with Illumina Human Exome-12 DNA Analysis BeadChip or Infinium Exome-24 BeadChip arrays. The relation of allele frequencies for 31,465 SNPs that passed quality control to CAD was examined with Fisher's exact test. To compensate for multiple comparisons of allele frequencies with CAD, we applied a false discovery rate (FDR) of <0.05 for statistical significance of association. Results The relation of allele frequencies for 31,465 SNPs to CAD with the use of Fisher's exact test showed that 170 SNPs were significantly (FDR <0.05) associated with CAD. Multivariable logistic regression analysis with adjustment for age, sex, and the prevalence of hypertension, diabetes mellitus, and dyslipidemia revealed that 162 SNPs were significantly (P<0.05) related to CAD. A stepwise forward selection procedure was performed to examine the effects of genotypes for the 162 SNPs on CAD. The 54 SNPs were significant (P<0.05) and independent [coefficient of determination (R2), 0.0008 to 0.0297] determinants of CAD. These SNPs together accounted for 15.5% of the cause of CAD. After examination of results from previous genome-wide association studies and linkage disequilibrium of the identified SNPs, we newly identified 21 genes (RNF2, YEATS2, USP45, ITGB8, TNS3, FAM170B-AS1, PRKG1, BTRC, MKI67, STIM1, OR52E4, KIAA1551, MON2, PLUT, LINC00354, TRPM1, ADAT1, KRT27, LIPE, GFY, EIF3L) and five chromosomal regions (2p13, 4q31.2, 5q12, 13q34, 20q13.2) that were significantly associated with CAD. Gene ontology analysis showed that various biological functions were predicted in the 18 genes identified in the present study. The network analysis revealed that the 18 genes had potential direct or indirect interactions with the 30 genes previously shown to be associated with CAD or with the 228 genes identified in previous genome-wide association studies of CAD. Conclusion We have newly identified 26 loci that confer susceptibility to CAD. Determination of genotypes for the SNPs at these loci may prove informative for assessment of the genetic risk for CAD in Japanese.


Sign in / Sign up

Export Citation Format

Share Document