Large Scale Association Analysis for Drug Addiction: Results from SNP to Gene

Many genetic association studies used single nucleotide polymorphisms (SNPs) data to identify genetic variants for complex diseases. Although SNP-based associations are most common in genome-wide association studies (GWAS), gene-based association analysis has received increasing attention in understanding genetic etiologies for complex diseases. While both methods have been used to analyze the same data, few genome-wide association studies compare the results or observe the connection between them. We performed a comprehensive analysis of the data from the Study of Addiction: Genetics and Environment (SAGE) and compared the results from the SNP-based and gene-based analyses. Our results suggest that the gene-based method complements the individual SNP-based analysis, and conceptually they are closely related. In terms of gene findings, our results validate many genes that were either reported from the analysis of the same dataset or based on animal studies for substance dependence.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

Meta-Analysis of Allergy Genome-Wide Association Studies

10.1101/2020.12.24.424368 ◽

2020 ◽

Author(s):

Ronin Sharma

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Genetic Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Significance Level ◽

Genome Wide ◽

Linear Regressions

AbstractAllergies are complex conditions involving both environmental and genetic factors. The genetic basis underlying allergic disease is investigated through genetic association studies. Genome-wide association studies (GWAS) leverage sequenced data to identify genetic mutations, such as single-nucleotide polymorphisms (SNPs), associated with phenotypes of interest. Machine learning can be used to analyze large datasets and generate predictive models. In this study, several classification models were created to predict the significance level of SNPs associated with allergies. Summary statistics were obtained from the GWAS Catalog and combined from several studies. Biological features such as chromosomal location, base pair location, effect allele, and odds ratio were used to train the models. The models ranged from simple linear regressions to multi-layer neural networks. The final models reached accuracies of 80% and reflect the features that have the largest impact on a SNP’s association level.

Download Full-text

SCEBE: an efficient and scalable algorithm for genome-wide association studies on longitudinal outcomes with mixed-effects modeling

Briefings in Bioinformatics ◽

10.1093/bib/bbaa130 ◽

2020 ◽

Author(s):

Min Yuan ◽

Xu Steven Xu ◽

Yaning Yang ◽

Yinsheng Zhou ◽

Yi Li ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Mixed Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Scalable Algorithm ◽

Longitudinal Outcomes ◽

Mixed Effects Modeling ◽

Genome Wide

Abstract Genome-wide association studies (GWAS) using longitudinal phenotypes collected over time is appealing due to the improvement of power. However, computation burden has been a challenge because of the complex algorithms for modeling the longitudinal data. Approximation methods based on empirical Bayesian estimates (EBEs) from mixed-effects modeling have been developed to expedite the analysis. However, our analysis demonstrated that bias in both association test and estimation for the existing EBE-based methods remains an issue. We propose an incredibly fast and unbiased method (simultaneous correction for EBE, SCEBE) that can correct the bias in the naive EBE approach and provide unbiased P-values and estimates of effect size. Through application to Alzheimer’s Disease Neuroimaging Initiative data with 6 414 695 single nucleotide polymorphisms, we demonstrated that SCEBE can efficiently perform large-scale GWAS with longitudinal outcomes, providing nearly 10 000 times improvement of computational efficiency and shortening the computation time from months to minutes. The SCEBE package and the example datasets are available at https://github.com/Myuan2019/SCEBE.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Mediation Effects of Aluminum in Plasma and Dipeptidyl Peptidase Like Protein 6 (DPP6) Polymorphism on Renal Function via Genome-Wide Typing Association

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph181910484 ◽

2021 ◽

Vol 18 (19) ◽

pp. 10484

Author(s):

Ting-Hao Chen ◽

Chen-Cheng Yang ◽

Kuei-Hau Luo ◽

Chia-Yen Dai ◽

Yao-Chung Chuang ◽

...

Keyword(s):

Renal Function ◽

Mediation Analysis ◽

Association Studies ◽

Dipeptidyl Peptidase ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Mediation Effects ◽

Genome Wide ◽

Estimated Glomerular Filtration Rates

Aluminum (Al) toxicity is related to renal failure and the failure of other systems. Although there were some genome-wide association studies (GWAS) in Australia and England, there were no GWAS about Han Chinese to our knowledge. Thus, this research focused on using whole genomic genotypes from the Taiwan Biobank for exploring the association between Al concentrations in plasma and renal function. Participants, who underwent questionnaire interviews, biomarkers, and genotyping, were from the Taiwan Biobank database. Then, we measured their plasma Al concentrations with ICP-MS in the laboratory at Kaohsiung Medical University. We used this data to link genome-wide association (GWA) tests while looking for candidate genes and associated plasma Al concentration to renal function. Furthermore, we examined the path relationship between Single Nucleotide Polymorphisms (SNPs), Al concentrations, and estimated glomerular filtration rates (eGFR) through the mediation analysis with 3000 replication bootstraps. Following the principles of GWAS, we focused on three SNPs within the dipeptidyl peptidase-like protein 6 (DPP6) gene in chromosome 7, rs10224371, rs2316242, and rs10268004, respectively. The results of the mediation analysis showed that all of the selected SNPs have indirectly affected eGFR through a mediation of Al concentrations. Our analysis revealed the association between DPP6 SNPs, plasma Al concentrations, and eGFR. However, further longitudinal studies and research on mechanism are in need. Our analysis was still be the first study that explored the association between the DPP6, SNPs, and Al in plasma affecting eGFR.

Download Full-text

Genome-wide association studies of yield-related traits in high-latitude japonica rice

BMC Genomic Data ◽

10.1186/s12863-021-00995-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Guomin Zhang ◽

Rongsheng Wang ◽

Juntao Ma ◽

Hongru Gao ◽

Lingwei Deng ◽

...

Keyword(s):

Linear Model ◽

High Latitude ◽

Association Studies ◽

Genome Wide Association ◽

Japonica Rice ◽

Genome Wide Association Studies ◽

Heilongjiang Province ◽

Nucleotide Polymorphisms ◽

Coding Region ◽

Genome Wide

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.

Download Full-text

The development of genome-wide association studies and their application to complex diseases, including lupus

Lupus ◽

10.1177/0961203313492870 ◽

2013 ◽

Vol 22 (12) ◽

pp. 1205-1213 ◽

Cited By ~ 10

Author(s):

J Bentham ◽

TJ Vyse

Keyword(s):

Association Studies ◽

Complex Diseases ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use

Genetics ◽

10.1534/genetics.119.302598 ◽

2019 ◽

Vol 213 (4) ◽

pp. 1225-1236 ◽

Cited By ~ 1

Author(s):

Weimiao Wu ◽

Zhong Wang ◽

Ke Xu ◽

Xinyu Zhang ◽

Amei Amei ◽

...

Keyword(s):

Association Analysis ◽

Binary Data ◽

Association Studies ◽

Association Test ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cocaine Use ◽

Genome Wide ◽

A Genome ◽

Time Varying Covariates

Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.

Download Full-text

Power Comparison of Admixture Mapping and Direct Association Analysis in Genome-Wide Association Studies

Genetic Epidemiology ◽

10.1002/gepi.21616 ◽

2012 ◽

Vol 36 (3) ◽

pp. 235-243 ◽

Cited By ~ 19

Author(s):

Huaizhen Qin ◽

Xiaofeng Zhu

Keyword(s):

Association Analysis ◽

Association Studies ◽

Genome Wide Association ◽

Admixture Mapping ◽

Genome Wide Association Studies ◽

Power Comparison ◽

Genome Wide ◽

Direct Association

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text