scholarly journals Deep learning-based identification of genetic variants: Application to Alzheimer's disease classification

Author(s):  
Taeho Jo ◽  
Kwangsik Nho ◽  
Paula Bice ◽  
Andrew J Saykin

Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Although deep learning has been used in several genetic studies, it is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) and develop accurate classification models. In the first step, we divided the whole genome into non-overlapping fragments of an optimal size and then ran Convolutional Neural Network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using an overlapping window approach, we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using genome-wide genotyping data for Alzheimer's disease (AD) (N=981; cognitively normal older adults (CN) =650 and AD=331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which outperformed traditional machine learning approaches, Random Forest and XGBoost. By using a novel deep learning-based GWAS approach, we were able to identify AD-associated SNPs and develop a better classification model for AD.

2019 ◽  
Author(s):  
Javier de Velasco Oriol ◽  
Edgar E. Vallejo ◽  
Karol Estrada ◽  

AbstractAlzheimer’s disease (AD) is the leading form of dementia. Over 25 million cases have been estimated worldwide and this number is predicted to increase two-fold every 20 years. Even though there is a variety of clinical markers available for the diagnosis of AD, the accurate and timely diagnosis of this disease remains elusive. Recently, over a dozen of genetic variants predisposing to the disease have been identified by genome-wide association studies. However, these genetic variants only explain a small fraction of the estimated genetic component of the disease. Therefore, useful predictions of AD from genetic data could not rely on these markers exclusively as they are not sufficiently informative predictors. In this study, we propose the use of deep neural networks for the prediction of late-onset Alzheimer’s disease from a large number of genetic variants. Experimental results indicate that the proposed model holds promise to produce useful predictions for clinical diagnosis of AD.


2020 ◽  
Author(s):  
Pavel P Kuksa ◽  
Chia-Lun Lui ◽  
Wei Fu ◽  
Liming Qu ◽  
Yi Zhao ◽  
...  

Background: Alzheimer's disease (AD) genetic findings span progressively larger genome-wide association studies (GWASs) for various outcomes and populations. These genetic findings are obtained from a single GWAS, joint- or meta- analyses of multiple GWAS datasets. However, no single resource provides harmonized and searchable information on all AD genetic associations obtained from these analyses, nor linking the identified genetic variants and reported genes with other supporting functional genomic evidence. Methods: We created the Alzheimer's Disease Variant Portal (ADVP), which provides unified access to a uniquely extensive collection of high-quality GWAS association results for AD. Records in ADVP are curated from the genome-wide significant and suggestive loci reported in AD genetics literature. ADVP contains curated results from all AD GWAS publications by Alzheimer's Disease Genetics Consortium (ADGC) since 2009 and AD GWAS publications identified from other public catalogs (GWAS catalog). Genetic association information was systematically extracted from these publications, harmonized, and organized into three types of tables. These tables included structured publication, variant, and association categories to ensure consistent representation of all AD genetic findings. All extracted AD genetic associations were further annotated and integrated with NIAGADS Genomics DB in order to provide extensive biological and functional genomics annotations. Results: Currently, ADVP contains 6,990 AD-association records curated from >200 AD GWAS publications corresponding to >900 unique genomic loci and >1,800 unique genetic variants. The ADVP collection contains genetic findings from >80 cohorts and across various populations, including Caucasians, Hispanics, African-Americans, and Asians. Of all the association records, 46% are disease-risk, 13% are related to expression quantitative trait analyses, and 27% are related to AD endophenotypes and neuropathology. ADVP web interface allows accessing AD association records by individual variants, genes, publications, genomic regions of interest, and genome-wide interactive variant views. ADVP is integrated with the NIAGADS Alzheimer's Genomics Database. Researchers can explore additional biological annotations at the genetic variant or gene level and view cross-reference functional genomics evidence provided by other public resources. Conclusions: ADVP is the largest, most up-to-date, and comprehensive literature-derived collection of AD genetic associations. All records have been systematically curated, harmonized, and comprehensively annotated. ADVP is freely accessible at https://advp.niagads.org/.


2021 ◽  
Author(s):  
Priyanka Baloni ◽  
Matthias Arnold ◽  
Herman Moreno ◽  
Kwangsik Nho ◽  
Luna Buitrago ◽  
...  

Dysregulation of sphingomyelin (SM) and ceramide metabolism have been implicated in Alzheimer's Disease (AD). Genome-wide and transcriptome wide association studies have identified various genes and genetic variants in lipid metabolism that are associated with AD. However, the molecular mechanisms of sphingomyelin and ceramide disruption remain to be determined. Evaluation of peripheral lipidomic profiles is useful in providing perspective on metabolic dysregulation in preclinical and clinical AD states. In this study, we focused on the sphingolipid pathway and carried out multi-omic analyses to identify central and peripheral metabolic changes in AD patients and correlate them to imaging features and cognitive performance in amyloidogenic mouse models. Our multi-omic approach was based on (a) 2114 human post-mortem brain transcriptomics to identify differentially expressed genes; (b) in silico metabolic flux analysis on 1708 context-specific metabolic networks to identify differential reaction fluxes; (c) multimodal neuroimaging analysis on 1576 participants to associate genetic variants in SM pathway with AD pathogenesis; (d) plasma metabolomic and lipidomic analysis to identify associations of lipid species with dysregulation in AD; (e) metabolite genome-wide association studies (mGWAS) to define receptors within pathway as potential drug target. Our findings from complementary approaches suggested that depletion of S1P compensated for AD cellular pathology, likely by upregulating the SM pathway, suggesting that modulation of S1P signaling may have protective effects in AD. We tested this hypothesis in APP/PS1 mice and showed that prolonged exposure to fingolimod, an S1P signaling modulator approved for treatment of multiple sclerosis, alleviated the cognitive impairment in mice. Our multi-omic approach identified potential targets in the SM pathway and suggested modulators of S1P metabolism as possible candidates for AD treatment.


2021 ◽  
Author(s):  
Jielin Xu ◽  
Yuan Hou ◽  
Yadi Zhou ◽  
Ming Hu ◽  
Feixiong Cheng

Human genome sequencing studies have identified numerous loci associated with complex diseases, including Alzheimer's disease (AD). Translating human genetic findings (i.e., genome-wide association studies [GWAS]) to pathobiology and therapeutic discovery, however, remains a major challenge. To address this critical problem, we present a network topology-based deep learning framework to identify disease-associated genes (NETTAG). NETTAG is capable of integrating multi-genomics data along with the protein-protein interactome to infer putative risk genes and drug targets impacted by GWAS loci. Specifically, we leverage non-coding GWAS loci effects on expression quantitative trait loci (eQTLs), histone-QTLs, and transcription factor binding-QTLs, enhancers and CpG islands, promoter regions, open chromatin, and promoter flanking regions. The key premises of NETTAG are that the disease risk genes exhibit distinct functional characteristics compared to non-risk genes and therefore can be distinguished by their aggregated genomic features under the human protein interactome. Applying NETTAG to the latest AD GWAS data, we identified 156 putative AD-risk genes (i.e., APOE, BIN1, GSK3B, MARK4, and PICALM). We showed that predicted risk genes are: 1) significantly enriched in AD-related pathobiological pathways, 2) more likely to be differentially expressed regarding transcriptome and proteome of AD brains, and 3) enriched in druggable targets with approved medicines (i.e., choline and ibudilast). In summary, our findings suggest that understanding of human pathobiology and therapeutic development could benefit from a network-based deep learning methodology that utilizes GWAS findings under the multimodal genomic analyses.


2011 ◽  
Vol 39 (4) ◽  
pp. 910-916 ◽  
Author(s):  
Rita J. Guerreiro ◽  
John Hardy

In the present review, we look back at the recent history of GWAS (genome-wide association studies) in AD (Alzheimer's disease) and integrate the major findings with current knowledge of biological processes and pathways. These topics are essential for the development of animal models, which will be fundamental to our complete understanding of AD.


2021 ◽  
Author(s):  
Adam C. Naj ◽  
Ganna Leonenko ◽  
Xueqiu Jian ◽  
Benjamin Grenier-Boley ◽  
Maria Carolina Dalmasso ◽  
...  

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.


Sign in / Sign up

Export Citation Format

Share Document