scholarly journals Can machine learning aid in identifying disease genes? The case of autism spectrum disorder

2020 ◽  
Author(s):  
Margot Gunning ◽  
Paul Pavlidis

AbstractDiscovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: Can machine learning aid in the discovery of disease genes? We collected thirteen published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Margot Gunning ◽  
Paul Pavlidis

AbstractDiscovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.


2019 ◽  
Author(s):  
Sun Jae Moon ◽  
Jin Seub Hwang ◽  
Rajesh Kana ◽  
John Torous ◽  
Jung Won Kim

BACKGROUND Over the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, its application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder. However, given its complexity and potential clinical implications, there is ongoing need for further research on its accuracy. OBJECTIVE The current study aims to summarize the evidence for the accuracy of use of machine learning algorithms in diagnosing autism spectrum disorder (ASD) through systematic review and meta-analysis. METHODS MEDLINE, Embase, CINAHL Complete (with OpenDissertations), PsyINFO and IEEE Xplore Digital Library databases were searched on November 28th, 2018. Studies, which used a machine learning algorithm partially or fully in classifying ASD from controls and provided accuracy measures, were included in our analysis. Bivariate random effects model was applied to the pooled data in meta-analysis. Subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false negative and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw SROC curves, and obtain area under the curve (AUC) and partial AUC. RESULTS A total of 43 studies were included for the final analysis, of which meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural MRI subgroup meta-analysis (12 samples with 1,776 participants) showed the sensitivity at 0.83 (95% CI-0.76 to 0.89), specificity at 0.84 (95% CI -0.74 to 0.91), and AUC/pAUC at 0.90/0.83. An fMRI/deep neural network (DNN) subgroup meta-analysis (five samples with 1,345 participants) showed the sensitivity at 0.69 (95% CI- 0.62 to 0.75), the specificity at 0.66 (95% CI -0.61 to 0.70), and AUC/pAUC at 0.71/0.67. CONCLUSIONS Machine learning algorithms that used structural MRI features in diagnosis of ASD were shown to have accuracy that is similar to currently used diagnostic tools.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1053
Author(s):  
Jasleen Dhaliwal ◽  
Ying Qiao ◽  
Kristina Calli ◽  
Sally Martell ◽  
Simone Race ◽  
...  

Autism Spectrum Disorder (ASD) is the most common neurodevelopmental disorder in children and shows high heritability. However, how inherited variants contribute to ASD in multiplex families remains unclear. Using whole-genome sequencing (WGS) in a family with three affected children, we identified multiple inherited DNA variants in ASD-associated genes and pathways (RELN, SHANK2, DLG1, SCN10A, KMT2C and ASH1L). All are shared among the three children, except ASH1L, which is only present in the most severely affected child. The compound heterozygous variants in RELN, and the maternally inherited variant in SHANK2, are considered to be major risk factors for ASD in this family. Both genes are involved in neuron activities, including synaptic functions and the GABAergic neurotransmission system, which are highly associated with ASD pathogenesis. DLG1 is also involved in synapse functions, and KMT2C and ASH1L are involved in chromatin organization. Our data suggest that multiple inherited rare variants, each with a subthreshold and/or variable effect, may converge to certain pathways and contribute quantitatively and additively, or alternatively act via a 2nd-hit or multiple-hits to render pathogenicity of ASD in this family. Additionally, this multiple-hits model further supports the quantitative trait hypothesis of a complex genetic, multifactorial etiology for the development of ASDs.


Open Biology ◽  
2018 ◽  
Vol 8 (5) ◽  
pp. 180031 ◽  
Author(s):  
Shani Stern ◽  
Sara Linker ◽  
Krishna C. Vadodaria ◽  
Maria C. Marchetto ◽  
Fred H. Gage

Personalized medicine has become increasingly relevant to many medical fields, promising more efficient drug therapies and earlier intervention. The development of personalized medicine is coupled with the identification of biomarkers and classification algorithms that help predict the responses of different patients to different drugs. In the last 10 years, the Food and Drug Administration (FDA) has approved several genetically pre-screened drugs labelled as pharmacogenomics in the fields of oncology, pulmonary medicine, gastroenterology, haematology, neurology, rheumatology and even psychiatry. Clinicians have long cautioned that what may appear to be similar patient-reported symptoms may actually arise from different biological causes. With growing populations being diagnosed with different psychiatric conditions, it is critical for scientists and clinicians to develop precision medication tailored to individual conditions. Genome-wide association studies have highlighted the complicated nature of psychiatric disorders such as schizophrenia, bipolar disorder, major depression and autism spectrum disorder. Following these studies, association studies are needed to look for genomic markers of responsiveness to available drugs of individual patients within the population of a specific disorder. In addition to GWAS, the advent of new technologies such as brain imaging, cell reprogramming, sequencing and gene editing has given us the opportunity to look for more biomarkers that characterize a therapeutic response to a drug and to use all these biomarkers for determining treatment options. In this review, we discuss studies that were performed to find biomarkers of responsiveness to different available drugs for four brain disorders: bipolar disorder, schizophrenia, major depression and autism spectrum disorder. We provide recommendations for using an integrated method that will use available techniques for a better prediction of the most suitable drug.


2020 ◽  
Author(s):  
Haishuai Wang ◽  
Paul Avillach

BACKGROUND In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children. OBJECTIVE Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening. METHODS After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network–based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning–based classifiers and randomly selected common variants. RESULTS The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic from nonautistic individuals. Our classifier demonstrated a significant improvement over standard autism screening tools by average 13% in terms of classification accuracy. CONCLUSIONS Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism.


2021 ◽  
Author(s):  
Astrid Rybner ◽  
Emil Trenckner Jessen ◽  
Marie Damsgaard Mortensen ◽  
Stine Nyhus Larsen ◽  
Ruth Grossman ◽  
...  

Background: Machine learning (ML) approaches show increasing promise to identify vocal markers of Autism Spectrum Disorder (ASD). Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected in diverse settings such as using a different speech task or a different language. Aim: In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. Methods: We re-train a promising published ML model of vocal markers of ASD on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on i) different participants from the same study, performing the same task; ii) the same participants, performing a different (but similar) task; iii) a different study with participants speaking a different language, performing the same type of task. Results: While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to new similar tasks and not at all to new languages. The ML pipeline is openly shared. Conclusion: Generalizability of ML models of vocal markers - and more generally biobehavioral markers - of ASD is an issue. We outline three recommendations researchers could take in order to be more explicit about generalizability and improve it in future studies.


2016 ◽  
Vol 113 (52) ◽  
pp. 15054-15059 ◽  
Author(s):  
Xiao Ji ◽  
Rachel L. Kember ◽  
Christopher D. Brown ◽  
Maja Bućan

Autism spectrum disorder (ASD) is a heterogeneous, highly heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior. It is estimated that hundreds of genes contribute to ASD. We asked if genes with a strong effect on survival and fitness contribute to ASD risk. Human orthologs of genes with an essential role in pre- and postnatal development in the mouse [essential genes (EGs)] are enriched for disease genes and under strong purifying selection relative to human orthologs of mouse genes with a known nonlethal phenotype [nonessential genes (NEGs)]. This intolerance to deleterious mutations, commonly observed haploinsufficiency, and the importance of EGs in development suggest a possible cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders. With a comprehensive catalog of 3,915 mammalian EGs, we provide compelling evidence for a stronger contribution of EGs to ASD risk compared with NEGs. By examining the exonic de novo and inherited variants from 1,781 ASD quartet families, we show a significantly higher burden of damaging mutations in EGs in ASD probands compared with their non-ASD siblings. The analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Finally, we suggest a high-priority list of 29 EGs with potential ASD risk as targets for future functional and behavioral studies. Overall, we show that large-scale studies of gene function in model organisms provide a powerful approach for prioritization of genes and pathogenic variants identified by sequencing studies of human disease.


Sign in / Sign up

Export Citation Format

Share Document