scholarly journals A machine learning approach to predicting autism risk genes: Validation of known genes and discovery of new candidates

2018 ◽  
Author(s):  
Ying Lin ◽  
Anjali M. Rajadhyaksha ◽  
James B. Potash ◽  
Shizhong Han

AbstractAutism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role ofde novomutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be differentially expressed in ASD brains, especially in the frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example,TCF20andFBOX11), but also indicated potentially novel candidates, such asDOCK3,MYCBP2andCAND1, which are all involved in neuronal development. Through extensive validations, we also showed that our method outperformed state-of-the-art scoring systems for ranking ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.

Blood ◽  
2004 ◽  
Vol 104 (11) ◽  
pp. 2897-2897
Author(s):  
Torsten Haferlach ◽  
Helmut Loeffler ◽  
Alexander Kohlmann ◽  
Martin Dugas ◽  
Wolfgang Hiddemann ◽  
...  

Abstract Balanced chromosomal rearrangements leading to fusion genes on the molecular level define distinct biological subsets in AML. The four balanced rearrangements (t(15;17), t(8;21), inv(16), and 11q23/MLL) show a close correlation to cytomorphology and gene expression patterns. We here focused on seven AML with t(8;16)(p11;p13). This translocation is rare (7/3515 cases in own cohort). It is more frequently found in therapy-related AML than in de novo AML (3/258 t-AML, and 4/3287 de novo, p=0.0003). Cytomorphologically, AML with t(8;16) is characterized by striking features: In all 7 cases the positivity for myeloperoxidase on bone marrow smears was >70% and intriguingly, in parallel >80% of blast cells stained strongly positive for non-specific esterase (NSE) in all cases. Thus, these cases can not be classified according to FAB categories. These data suggest that AML-t(8;16) arise from a very early stem cell with both myeloid and monoblastic potential. Furthermore, we detected erythrophagocytosis in 6/7 cases that was described as specific feature in AML with t(8;16). Four pts. had chromosomal aberrations in addition to t(8;16), 3 of these were t-AML all showing aberrations of 7q. Survival was poor with 0, 1, 1, 2, 20 and 18+ (after alloBMT) mo., one lost to follow-up, respectively. We then analyzed gene expression patterns in 4 cases (Affymetrix U133A+B). First we compared t(8;16) AML with 46 AML FAB M1, 41 M4, 9 M5a, and 16 M5b, all with normal karyotype. Hierachical clustering and principal component analyses (PCA) revealed that t(8;16) AML were intercalating with FAB M4 and M5b and did not cluster near to M1. Thus, monocytic characteristics influence the gene expression pattern stronger than myeloid. Next we compared the t(8;16) AML with the 4 other balanced subtypes according to the WHO classification (t(15;17): 43; t(8;21): 40; inv(16): 49; 11q23/MLL-rearrangements: 50). Using support vector machines the overall accuracy for correct subgroup assignment was 97.3% (10-fold CV), and 96.8% (2/3 training and 1/3 test set, 100 runs). In PCA and hierarchical cluster analysis the t(8;16) were grouped in the vicinity of the 11q23 cases. However, in a pairwise comparison these two subgroups could be discriminated with an accuracy of 94.4% (10-fold CV). Genes with a specific expression in AML-t(8;16) were further investigated in pathway analyses (Ingenuity). 15 of the top 100 genes associated with AML-t(8;16) were involved in the CMYC-pathway with up regulation of BCOR, COXB5, CDK10, FLI1, HNRPA2B1, NSEP1, PDIP38, RAD50, SUPT5H, TLR2 and USP33, and down regulation of ERG, GATA2, NCOR2 and RPS20. CEBP beta, known to play a role in myelomonocytic differentiation, was also up-regulated in t(8;16)-AML. Ten additional genes out of the 100 top differentially expressed genes were also involved in this pathway with up-regulation of DDB2, HIST1H3D, NSAP1, PTPNS1, RAN, USP4, TRIM8, ZNF278 and down regulation of KIT and MBD2. In conclusion, AML with t(8;16) is a specific subtype of AML with unique characteristics in morphology and gene expression patterns. It is more frequently found in t-AML, outcome is inferior in comparison to other AML with balanced translocations. Due to its unique features, it is a candidate for inclusion into the WHO classification as a specific entity.


Genome ◽  
2015 ◽  
Vol 58 (6) ◽  
pp. 305-313 ◽  
Author(s):  
Jagesh Kumar Tiwari ◽  
Sapna Devi ◽  
S. Sundaresha ◽  
Poonam Chandel ◽  
Nilofer Ali ◽  
...  

Genes involved in photoassimilate partitioning and changes in hormonal balance are important for potato tuberization. In the present study, we investigated gene expression patterns in the tuber-bearing potato somatic hybrid (E1-3) and control non-tuberous wild species Solanum etuberosum (Etb) by microarray. Plants were grown under controlled conditions and leaves were collected at eight tuber developmental stages for microarray analysis. A t-test analysis identified a total of 468 genes (94 up-regulated and 374 down-regulated) that were statistically significant (p ≤ 0.05) and differentially expressed in E1-3 and Etb. Gene Ontology (GO) characterization of the 468 genes revealed that 145 were annotated and 323 were of unknown function. Further, these 145 genes were grouped based on GO biological processes followed by molecular function and (or) PGSC description into 15 gene sets, namely (1) transport, (2) metabolic process, (3) biological process, (4) photosynthesis, (5) oxidation-reduction, (6) transcription, (7) translation, (8) binding, (9) protein phosphorylation, (10) protein folding, (11) ubiquitin-dependent protein catabolic process, (12) RNA processing, (13) negative regulation of protein, (14) methylation, and (15) mitosis. RT-PCR analysis of 10 selected highly significant genes (p ≤ 0.01) confirmed the microarray results. Overall, we show that candidate genes induced in leaves of E1-3 were implicated in tuberization processes such as transport, carbohydrate metabolism, phytohormones, and transcription/translation/binding functions. Hence, our results provide an insight into the candidate genes induced in leaf tissues during tuberization in E1-3.


2016 ◽  
Author(s):  
Shahar Shohat ◽  
Eyal Ben-David ◽  
Sagiv Shifman

AbstractGenetic susceptibility to Intellectual disability (ID), autism spectrum disorder (ASD) and schizophrenia (SCZ) often arises from mutations in the same genes, suggesting that they share common mechanisms. We studied genes with de novo mutations in the three disorders and genes implicated by SCZ genome-wide association study (GWAS). Using biological annotations and brain gene expression, we show that mutation class explains enrichment patterns more than specific disorder. Genes with loss of function mutations and genes with missense mutations were enriched with different pathways, shared with genes intolerant to mutations. Specific gene expression patterns were found for each disorder. ID genes were preferentially expressed in fetal cortex, ASD genes also in fetal cerebellum and striatum, and genes associated with SCZ were most significantly enriched in adolescent cortex. Our study suggests that convergence across neuropsychiatric disorders stems from vulnerable pathways to genetic variations, but spatiotemporal activity of genes contributes to specific phenotypes.


2018 ◽  
Author(s):  
Leo Brueggeman ◽  
Tanner Koomar ◽  
Jacob J Michaelson

AbstractBackgroundGenes are one of the most powerful windows into the biology of autism, and it has been estimated that perhaps a thousand or more genes may confer risk. However, less than 100 genes are currently viewed as having robust enough evidence to be considered true "autism genes". Massive genetic studies are underway to produce data to implicate additional genes, but this approach, although necessary, is costly and slow-moving.MethodsWe approach autism gene discovery as a machine learning problem, rather than a genetic association problem, and use genome-scale data as predictors for identifying further genes that have similar properties in the feature space compared to established autism risk genes. This approach, which we call forecASD, integrates spatiotemporal gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score that indexes each gene’s evidence for being involved in the etiology of autism.ResultsWe demonstrate that forecASD has substantially increased sensitivity and specificity compared to previous gene-level predictors of autism association, including genetic measures such as TADA. On an independent test set, consisting of newly-released pilot data from the SPARK Genomics Consortium, we show that forecASD best predicts which genes will have an excess of likely gene disrupting (LGD) de novo mutations. We further use independent data from a recent post mortem study of case/control gene expression to show that forecASD is also a significant predictor of genes implicated in ASD through differential expression. Using forecASD results, we show which molecular pathways are currently under-represented in the autism literature and likely represent under-appreciated biological mechanisms of autism. Finally, forecASD correctly predicted 12 of 16 genes implicated at FDR=0.2 by the latest ASD gene discovery study, while also identifying the most likely false positives among the candidate genes.ConclusionsThese results demonstrate that forecASD bridges the gap between genetic- and expression-based ASD gene discovery, and provides a data-driven replacement to much of the manual filtering and curation that is a critical step in ensuring the robustness of gene discovery studies.


2008 ◽  
Vol 18 (3) ◽  
pp. 139-149 ◽  
Author(s):  
Yanfang Ren ◽  
J. Derek Bewley ◽  
Xiaofeng Wang

AbstractThe rice (Oryza sativa L.) cv. Taichung 65, a japonica subspecies, was used to characterize the isoform, protein and gene expression patterns of endo-β-mannanase during and after seed germination. Activity assays and isoform analyses of whole grains or seed parts (scutellum, aleurone layer and starchy endosperm) revealed that seeds began to express endo-β-mannanase activity at 48 h from the start of imbibition at 25°C, after the completion of germination of most seeds. Three isoforms of endo-β-mannanase (pI 8.86, pI 8.92 and pI 8.98) were detected in the aleurone layer and starchy endosperm, but only two (pI 8.86 and pI 8.92) were present in the scutellum. The endo-β-mannanase in the starchy endosperm was mainly from the aleurone layer. Western blot analysis, using a tomato anti-endo-β-mannanase antibody, indicated that an endo-β-mannanase protein was present in an inactive form in dry grains. The amount of this protein decreased in the scutellum, but increased in the aleurone layer during and after germination. Thus, the increase in endo-β-mannanase activity in rice grains may be due to the activation of extant proteins and/or the de novo synthesis of the enzyme. Northern blot analysis showed that four putative rice endo-β-mannanase genes (OsMAN1, OsMAN2, OsMAN6 and OsMANP) were expressed in germinating and germinated rice grains. However, OsMANP was not expressed in the scutellum. The amount of OsMAN6 mRNA decreased after the completion of germination and paralleled the decline in endo-β-mannanase protein. In the aleurone layer, the increase of OsMAN2, OsMAN6 and OsMANP mRNA was prior to the increase of endo-β-mannanase protein.


2021 ◽  
Author(s):  
Chayaporn Suphavilai ◽  
Hatairat Yingtaweesittikul

Background: Transcriptomic profiles have become crucial information in understanding diseases and improving treatments. While dysregulated gene sets are identified via pathway analysis, various machine learning models have been proposed for predicting phenotypes such as disease type and drug response based on gene expression patterns. However, these models still lack interpretability, as well as the ability to integrate prior knowledge from a protein-protein interaction network. Results: We propose Grandline, a graph convolutional neural network that can integrate gene expression data and structure of the protein interaction network to predict a specific phenotype. Transforming the interaction network into a spectral domain enables convolution of neighbouring genes and pinpointing high-impact subnetworks, which allow better interpretability of deep learning models. Grandline achieves high phenotype prediction accuracy (67-85% in 8 use cases), comparable to state-of-the-art machine learning models while requiring a smaller number of parameters, allowing it to learn complex but interpretable gene expression patterns from biological datasets. Conclusion: To improve the interpretability of phenotype prediction based on gene expression patterns, we developed Grandline using graph convolutional neural network technique to integrate protein interaction information. We focus on improving the ability to learn nonlinear relationships between gene expression patterns and a given phenotype and incorporation of prior knowledge, which are the main challenges of machine learning models for biological datasets. The graph convolution allows us to aggregate information from relevant genes and reduces the number of trainable parameters, facilitating model training for a small-sized biological dataset.


2021 ◽  
Author(s):  
Jack M. Fu ◽  
F. Kyle Satterstrom ◽  
Minshi Peng ◽  
Harrison Brand ◽  
Ryan L. Collins ◽  
...  

Individuals with autism spectrum disorder (ASD) or related neurodevelopmental disorders (NDDs) often carry disruptive mutations in genes that are depleted of functional variation in the broader population. We build upon this observation and exome sequencing from 154,842 individuals to explore the allelic diversity of rare protein-coding variation contributing risk for ASD and related NDDs. Using an integrative statistical model, we jointly analyzed rare protein-truncating variants (PTVs), damaging missense variants, and copy number variants (CNVs) derived from exome sequencing of 63,237 individuals from ASD cohorts. We discovered 71 genes associated with ASD at a false discovery rate (FDR) ≤ 0.001, a threshold approximately equivalent to exome-wide significance, and 183 genes at FDR ≤ 0.05. Associations were predominantly driven by de novo PTVs, damaging missense variants, and CNVs: 57.4%, 21.2%, and 8.32% of evidence, respectively. Though fewer in number, CNVs conferred greater relative risk than PTVs, and repeat-mediated de novo CNVs exhibited strong maternal bias in parent-of-origin (e.g., 92.3% of 16p11.2 CNVs), whereas all other CNVs showed a paternal bias. To explore how genes associated with ASD and NDD overlap or differ, we analyzed our ASD cohort alongside a developmental delay (DD) cohort from the deciphering developmental disorders study (DDD; n=91,605 samples). We first reanalyzed the DDD dataset using the same models as the ASD cohorts, then performed joint analyses of both cohorts and identified 373 genes contributing to NDD risk at FDR ≤ 0.001 and 662 NDD risk genes at FDR ≤ 0.05. Of these NDD risk genes, 54 genes (125 genes at FDR ≤ 0.05) were unique to the joint analyses and not significant in either cohort alone. Our results confirm overlap of most ASD and DD risk genes, although many differ significantly in frequency of mutation. Analyses of single-cell transcriptome datasets showed that genes associated predominantly with DD were strongly enriched for earlier neurodevelopmental cell types, whereas genes displaying stronger evidence for association in ASD cohorts were more enriched for maturing neurons. The ASD risk genes were also enriched for genes associated with schizophrenia from a separate rare coding variant analysis of 121,570 individuals, emphasizing that these neuropsychiatric disorders share common pathways to risk.


2018 ◽  
Author(s):  
Aimee Lee S. Houde ◽  
Oliver P. Günther ◽  
Jeffrey Strohm ◽  
Tobi J. Ming ◽  
Shaorong Li ◽  
...  

AbstractEarly marine survival of juvenile salmon is intimately associated with their physiological condition during ocean entry and especially smoltification. Smoltification is a developmental parr–smolt transformation allowing salmon to acquire the trait of seawater tolerance in preparation for marine living. Traditionally, this developmental process has been monitored using gill Na+/K+ATPase (NKA) activity or plasma hormones, but gill gene expression can be reliably used. Here, we describe the discovery of candidate genes from gill tissue for staging smoltification using comparisons of microarray studies with particular focus on the commonalities between anadromous Rainbow trout and Sockeye salmon datasets, as well as literature comparison encompassing more species. A subset of 37 candidate genes mainly from the microarray analyses was used for Taq-Man qPCR assay design and their monthly expression patterns were validated using gill samples from four groups, representing three species and two ecotypes: Coho salmon, Sockeye salmon, stream-type Chinook salmon, and ocean-type Chinook salmon. The best smoltification biomarkers, as measured by consistent changes across these four groups, were genes involved in ion regulation, oxygen transport, and immunity. Smoltification gene expression patterns (using the top 10 biomarkers) were confirmed by significant correlations with NKA activity and were associated with changes in body brightness, caudal fin darkness, and caudal peduncle length. We incorporate gene expression patterns of pre-smolt, smolt, and de-smolt trials from acute seawater transfers using a companion study to develop a preliminary seawater tolerance classification model for ocean-type Chinook salmon. This work demonstrates the potential of gene expression biomarkers to stage smoltification and classify juveniles as pre-smolt, smolt, or de-smolt.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Shadi Eshghi Sahraei ◽  
Michelle Cleary ◽  
Jan Stenlid ◽  
Mikael Brandström Durling ◽  
Malin Elfstrand

Abstract Background With the expanding ash dieback epidemic that has spread across the European continent, an improved functional understanding of the disease development in afflicted hosts is needed. The study investigated whether differences in necrosis extension between common ash (Fraxinus excelsior) trees with different levels of susceptibility to the fungus Hymenoscyphus fraxineus are associated with, and can be explained by, the differences in gene expression patterns. We inoculated seemingly healthy branches of each of two resistant and susceptible ash genotypes with H. fraxineus grown in a common garden. Results Ten months after the inoculation, the length of necrosis on the resistant genotypes were shorter than on the susceptible genotypes. RNA sequencing of bark samples collected at the border of necrotic lesions and from healthy tissues distal to the lesion revealed relatively limited differences in gene expression patterns between susceptible and resistant genotypes. At the necrosis front, only 138 transcripts were differentially expressed between the genotype categories while 1082 were differentially expressed in distal, non-symptomatic tissues. Among these differentially expressed genes, several genes in the mevalonate (MVA) and iridoid pathways were found to be co-regulated, possibly indicating increased fluxes through these pathways in response to H. fraxineus. Comparison of transcriptional responses of symptomatic and non-symptomatic ash in a controlled greenhouse experiment revealed a relatively small set of genes that were differentially and concordantly expressed in both studies. This gene-set included the rate-limiting enzyme in the MVA pathway and a number of transcription factors. Furthermore, several of the concordantly expressed candidate genes show significant similarity to genes encoding players in the abscisic acid- or Jasmonate-signalling pathways. Conclusions A set of candidate genes, concordantly expressed between field and greenhouse experiments, was identified. The candidates are associated with hormone signalling and specialized metabolite biosynthesis pathways indicating the involvement of these pathways in the response of the host to infection by H. fraxineus.


2020 ◽  
Vol 61 (7) ◽  
pp. 1348-1364
Author(s):  
M Luisa Hernández ◽  
Elena Lima-Cabello ◽  
Juan de D Alché ◽  
José M Martínez-Rivas ◽  
Antonio J Castro

Abstract Pollen lipids are essential for sexual reproduction, but our current knowledge regarding lipid dynamics in growing pollen tubes is still very scarce. Here, we report unique lipid composition and associated gene expression patterns during olive pollen germination. Up to 376 genes involved in the biosynthesis of all lipid classes, except suberin, cutin and lipopolysaccharides, are expressed in olive pollen. The fatty acid profile of olive pollen is markedly different compared with other plant organs. Triacylglycerol (TAG), containing mostly C12–C16 saturated fatty acids, constitutes the bulk of olive pollen lipids. These compounds are partially mobilized, and the released fatty acids enter the β-oxidation pathway to yield acetyl-CoA, which is converted into sugars through the glyoxylate cycle during the course of pollen germination. Our data suggest that fatty acids are synthesized de novo and incorporated into glycerolipids by the ‘eukaryotic pathway’ in elongating pollen tubes. Phosphatidic acid is synthesized de novo in the endomembrane system during pollen germination and seems to have a central role in pollen tube lipid metabolism. The coordinated action of fatty acid desaturases FAD2–3 and FAD3B might explain the increase in linoleic and alpha-linolenic acids observed in germinating pollen. Continuous synthesis of TAG by the action of diacylglycerol acyltransferase 1 (DGAT1) enzyme, but not phosphoplipid:diacylglycerol acyltransferase (PDAT), also seems plausible. All these data allow for a better understanding of lipid metabolism during the olive reproductive process, which can impact, in the future, on the increase in olive fruit yield and, therefore, olive oil production.


Sign in / Sign up

Export Citation Format

Share Document