scholarly journals Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

2019 ◽  
Vol 35 (14) ◽  
pp. i538-i547 ◽  
Author(s):  
Bojian Yin ◽  
Marleen Balvert ◽  
Rick A A van der Spek ◽  
Bas E Dutilh ◽  
Sander Bohté ◽  
...  

Abstract Motivation Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype–phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Results Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype–phenotype association in whole genome-sized data. Availability and implementation Our code will be available on Github, together with a synthetic dataset (https://github.com/byin-cwi/ALS-Deeplearning). The data used in this study is available to bona-fide researchers upon request. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Bojian Yin ◽  
Marleen Balvert ◽  
Rick A. A. van der Spek ◽  
Bas E. Dutilh ◽  
Sander Bohté ◽  
...  

AbstractAmyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where nonadditive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype-phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the ProjectMinE dataset. Based on recent insight that regulatory regions on the genome play a major role in ALS, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective.Our approach identifies potential ALS-associated genetic variants, and generally outperforms other classification methods. Test results support the hypothesis that ALS is caused by non-additive combinations of variants. Our method can be applied to large-scale whole genome data. We consider this a first step towards genotype-phenotype association with deep learning that is tailored to genomics and can deal with genome-sized data.


2020 ◽  
Author(s):  
Mike A. Nalls ◽  
Cornelis Blauwendraat ◽  
Lana Sargent ◽  
Dan Vitale ◽  
Hampton Leonard ◽  
...  

SUMMARYBackgroundPrevious research using genome wide association studies (GWAS) has identified variants that may contribute to lifetime risk of multiple neurodegenerative diseases. However, whether there are common mechanisms that link neurodegenerative diseases is uncertain. Here, we focus on one gene, GRN, encoding progranulin, and the potential mechanistic interplay between genetic risk, gene expression in the brain and inflammation across multiple common neurodegenerative diseases.MethodsWe utilized GWAS, expression quantitative trait locus (eQTL) mapping and Bayesian colocalization analyses to evaluate potential causal and mechanistic inferences. We integrate various molecular data types from public resources to infer disease connectivity and shared mechanisms using a data driven process.FindingseQTL analyses combined with GWAS identified significant functional associations between increasing genetic risk in the GRN region and decreased expression of the gene in Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis. Additionally, colocalization analyses show a connection between blood based inflammatory biomarkers relating to platelets and GRN expression in the frontal cortex.InterpretationGRN expression mediates neuroinflammation function related to general neurodegeneration. This analysis suggests shared mechanisms for Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis.FundingNational Institute on Aging, National Institute of Neurological Disorders and Stroke, and the Michael J. Fox Foundation.


2021 ◽  
Author(s):  
Kailin Xia ◽  
Linjing Zhang ◽  
Gan Zhang ◽  
Yajun Wang ◽  
Tao Huang ◽  
...  

Abstract Background Observational studies have suggested that telomere length is associated with amyotrophic lateral sclerosis (ALS). However, it remains unclear whether this association is causal. We employed a two-sample Mendelian randomization (MR) approach to explore the causal relationship between leukocyte telomere length (LTL) and ALS based on the most cited and most recent and largest LTL genome-wide association studies (GWASs) that measured LTL with the Southern blot method (n = 9190) and ALS GWAS summary data (n = 80,610). We adopted the inverse variance weighted (IVW) method to examine the effect of LTL on ALS and used the weighted median method, simple median method, MR Egger method and MR PRESSO method to perform sensitivity analyses. Results We found that genetically determined longer LTL was inversely associated with the risk of ALS (OR = 0.846, 95% CI: 0.744–0.962, P = 0.011), which was mainly driven by rs940209 in the OBFC1 gene, suggesting a potential effect of OBFC1 on ALS. In sensitivity analyses, that was confirmed in MR Egger method (OR = 0.647,95% CI = 0.447–0.936, P = 0.050), and a similar trend was shown with the weighted median method (OR = 0.893, P = 0.201) and simple median method (OR = 0.935 P = 0.535). The MR Egger analyses did not suggest directional pleiotropy, showing an intercept of 0.025 (P = 0.168). Neither the influence of instrumental outliers nor heterogeneity was found. Conclusions Our results suggest that genetically predicted longer LTL has a causal relationship with a lower risk of ALS and underscore the importance of protecting against telomere loss in ALS.


2021 ◽  
Author(s):  
Helgi Hilmarsson ◽  
Arvind S. Kumar ◽  
Richa Rastogi ◽  
Carlos D. Bustamante ◽  
Daniel Mas Montserrat ◽  
...  

ABSTRACTAs genome-wide association studies and genetic risk prediction models are extended to globally diverse and admixed cohorts, ancestry deconvolution has become an increasingly important tool. Also known as local ancestry inference (LAI), this technique identifies the ancestry of each region of an individual’s genome, thus permitting downstream analyses to account for genetic effects that vary between ancestries. Since existing LAI methods were developed before the rise of massive, whole genome biobanks, they are computationally burdened by these large next generation datasets. Current LAI algorithms also fail to harness the potential of whole genome sequences, falling well short of the accuracy that such high variant densities can enable. Here we introduce Gnomix, a set of algorithms that address each of these points, achieving higher accuracy and swifter computational performance than any existing LAI method, while also enabling portable models that are particularly useful when training data are not shareable due to privacy or other restrictions. We demonstrate Gnomix (and its swift phase correction counterpart Gnofix) on worldwide whole-genome data from both humans and canids and utilize its high resolution accuracy to identify the location of ancient New World haplotypes in the Xoloitzcuintle, dating back over 100 generations. Code is available at https://github.com/AI-sandbox/gnomix.


2020 ◽  
pp. jmedgenet-2020-106866 ◽  
Author(s):  
Emily P McCann ◽  
Lyndal Henden ◽  
Jennifer A Fifita ◽  
Katharine Y Zhang ◽  
Natalie Grima ◽  
...  

BackgroundAmyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with phenotypic and genetic heterogeneity. Approximately 10% of cases are familial, while remaining cases are classified as sporadic. To date, >30 genes and several hundred genetic variants have been implicated in ALS.MethodsSeven hundred and fifty-seven sporadic ALS cases were recruited from Australian neurology clinics. Detailed clinical data and whole genome sequencing (WGS) data were available from 567 and 616 cases, respectively, of which 426 cases had both datasets available. As part of a comprehensive genetic analysis, 853 genetic variants previously reported as ALS-linked mutations or disease-associated alleles were interrogated in sporadic ALS WGS data. Statistical analyses were performed to identify correlation between clinical variables, and between phenotype and the number of ALS-implicated variants carried by an individual. Relatedness between individuals carrying identical variants was assessed using identity-by-descent analysis.ResultsForty-three ALS-implicated variants from 18 genes, including C9orf72, ATXN2, TARDBP, SOD1, SQSTM1 and SETX, were identified in Australian sporadic ALS cases. One-third of cases carried at least one variant and 6.82% carried two or more variants, implicating a potential oligogenic or polygenic basis of ALS. Relatedness was detected between two sporadic ALS cases carrying a SOD1 p.I114T mutation, and among three cases carrying a SQSTM1 p.K238E mutation. Oligogenic/polygenic sporadic ALS cases showed earlier age of onset than those with no reported variant.ConclusionWe confirm phenotypic associations among ALS cases, and highlight the contribution of genetic variation to all forms of ALS.


Sign in / Sign up

Export Citation Format

Share Document