Prediction of Autism Spectrum Disorder Using Feature Engineering for Machine Learning Classifiers

Autism spectrum disorder (ASD) is a combination of developmental anomalies that causes social and behavioral impairments, affecting around 2% of US children. Common symptoms include difficulties in communications, interactions, and behavioral disabilities. The onset of symptoms can start in early childhood, yet repeated visits to a pediatric specialist are needed before reaching a diagnosis. Still, this diagnosis is usually subjective, and scores can vary from one specialist to another. Previous literature suggests differences in brain development, environmental, and/or genetic factors play a role in developing autism, yet scientists still do not know exactly the pathology of this disorder. Currently, the gold standard diagnosis of ASD is a set of diagnostic evaluations, such as the Autism Diagnostic Observation Schedule (ADOS) or Autism Diagnostic Interview–Revised (ADI-R) report. These gold standard diagnostic instruments are an intensive, lengthy, and subjective process that involves a set of behavioral and communications tests and clinical history information conducted by a team of qualified clinicians. Emerging advancements in neuroimaging and machine learning techniques can provide a fast and objective alternative to conventional repetitive observational assessments. This paper provides a thorough study of implementing feature engineering tools to find discriminant insights from brain imaging of white matter connectivity and using a machine learning framework for an accurate classification of autistic individuals. This work highlights important findings of impacted brain areas that contribute to an autism diagnosis and presents promising accuracy results. We verified our proposed framework on a large publicly available DTI dataset of 225 subjects from the Autism Brain Imaging Data Exchange-II (ABIDE-II) initiative, achieving a high global balanced accuracy over the 5 sites of up to 99% with 5-fold cross validation. The data used was slightly unbalanced, including 125 autistic subjects and 100 typically developed (TD) ones. The achieved balanced accuracy of the proposed technique is the highest in the literature, which elucidates the importance of feature engineering steps involved in extracting useful knowledge and the promising potentials of adopting neuroimaging for the diagnosis of autism.

Download Full-text

Diagnostic test accuracy for use of machine learning in diagnosis of autism spectrum disorder: A Systematic Review and Meta-Analysis (Preprint)

10.2196/preprints.14108 ◽

2019 ◽

Author(s):

Sun Jae Moon ◽

Jin Seub Hwang ◽

Rajesh Kana ◽

John Torous ◽

Jung Won Kim

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Autism Spectrum Disorder ◽

Meta Analysis ◽

Learning Algorithms ◽

Structural Mri ◽

Autism Spectrum ◽

Machine Learning Algorithms ◽

Spectrum Disorder ◽

Test Accuracy

BACKGROUND Over the recent years, machine learning algorithms have been more widely and increasingly applied in biomedical fields. In particular, its application has been drawing more attention in the field of psychiatry, for instance, as diagnostic tests/tools for autism spectrum disorder. However, given its complexity and potential clinical implications, there is ongoing need for further research on its accuracy. OBJECTIVE The current study aims to summarize the evidence for the accuracy of use of machine learning algorithms in diagnosing autism spectrum disorder (ASD) through systematic review and meta-analysis. METHODS MEDLINE, Embase, CINAHL Complete (with OpenDissertations), PsyINFO and IEEE Xplore Digital Library databases were searched on November 28th, 2018. Studies, which used a machine learning algorithm partially or fully in classifying ASD from controls and provided accuracy measures, were included in our analysis. Bivariate random effects model was applied to the pooled data in meta-analysis. Subgroup analysis was used to investigate and resolve the source of heterogeneity between studies. True-positive, false-positive, false negative and true-negative values from individual studies were used to calculate the pooled sensitivity and specificity values, draw SROC curves, and obtain area under the curve (AUC) and partial AUC. RESULTS A total of 43 studies were included for the final analysis, of which meta-analysis was performed on 40 studies (53 samples with 12,128 participants). A structural MRI subgroup meta-analysis (12 samples with 1,776 participants) showed the sensitivity at 0.83 (95% CI-0.76 to 0.89), specificity at 0.84 (95% CI -0.74 to 0.91), and AUC/pAUC at 0.90/0.83. An fMRI/deep neural network (DNN) subgroup meta-analysis (five samples with 1,345 participants) showed the sensitivity at 0.69 (95% CI- 0.62 to 0.75), the specificity at 0.66 (95% CI -0.61 to 0.70), and AUC/pAUC at 0.71/0.67. CONCLUSIONS Machine learning algorithms that used structural MRI features in diagnosis of ASD were shown to have accuracy that is similar to currently used diagnostic tools.

Download Full-text

Genotype-Based Deep Learning in Autism Spectrum Disorder: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants (Preprint)

10.2196/preprints.24754 ◽

2020 ◽

Author(s):

Haishuai Wang ◽

Paul Avillach

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Deep Learning ◽

The United States ◽

Autism Spectrum ◽

Repetitive Behaviors ◽

Spectrum Disorder ◽

Screening Tools ◽

Common Variants ◽

Autism Screening

BACKGROUND In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children. OBJECTIVE Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening. METHODS After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network–based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning–based classifiers and randomly selected common variants. RESULTS The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic from nonautistic individuals. Our classifier demonstrated a significant improvement over standard autism screening tools by average 13% in terms of classification accuracy. CONCLUSIONS Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism.

Download Full-text

Identifying Challenging Behavior Profiles and Exploring their Impact on Treatment Efficacy in Autism Spectrum Disorder using Unsupervised Machine Learning (Preprint)

JMIR Medical Informatics ◽

10.2196/27793 ◽

2021 ◽

Author(s):

Julie Gardner-Hoag ◽

Marlena Novack ◽

Chelsea Parlett-Pelleriti ◽

Elizabeth Stevens ◽

Dennis Dixon ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Treatment Efficacy ◽

Challenging Behavior ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Unsupervised Machine Learning ◽

Impact On Treatment ◽

Behavior Profiles

Download Full-text

Vocal markers of Autism Spectrum Disorder: assessing the generalizability of machine learning models

10.1101/2021.11.22.469538 ◽

2021 ◽

Author(s):

Astrid Rybner ◽

Emil Trenckner Jessen ◽

Marie Damsgaard Mortensen ◽

Stine Nyhus Larsen ◽

Ruth Grossman ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Model Performance ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Future Studies ◽

Out Of Sample ◽

Biobehavioral Markers ◽

Similar Task ◽

Machine Learning Models

Background: Machine learning (ML) approaches show increasing promise to identify vocal markers of Autism Spectrum Disorder (ASD). Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected in diverse settings such as using a different speech task or a different language. Aim: In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. Methods: We re-train a promising published ML model of vocal markers of ASD on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on i) different participants from the same study, performing the same task; ii) the same participants, performing a different (but similar) task; iii) a different study with participants speaking a different language, performing the same type of task. Results: While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to new similar tasks and not at all to new languages. The ML pipeline is openly shared. Conclusion: Generalizability of ML models of vocal markers - and more generally biobehavioral markers - of ASD is an issue. We outline three recommendations researchers could take in order to be more explicit about generalizability and improve it in future studies.

Download Full-text