Abstract
Background: Alzheimer’s Disease (AD) is a common dementia which affects linguistic function, memory, cognitive and visual spatial ability of the patients. More and more studies have been done to access non-invasive, accessible, cost-effective methods for the detection of AD, Speech is proved to have relationship with AD, so a time that AD can be diagnosed in a doctor’s office is coming.Methods: In our study, the ADRess dataset in 2020 was used to detect AD which was balanced in gender and age. First we extract three categories of feature parameters: acoustic feature extracted by opensmile software, bert embeddings automatically and complicated linguistic feature extraction manually. Linguistic features are based on the POS tag, lexical Richness, fluency, semantic feature. Then seven different classifiers are used for identifying AD from normal controls, including SVM, Logistic Regress, Random forest, Extra Trees, Adaboost, LightGBM and a novel ensemble approach with majority voting strategy which is applied to overcome the error caused by a base classifier. Finally ten-fold cross validation is adopted for the evaluation of our approach. In addition, individual features and their combine features are fed to six base classifiers and ensemble of classifier. Results: We get top-performing classify result on the test set with ensemble of classifiers, the best accuracy of which is 85.4%. The best performance of feature sets are linguistic features, the accuracy of which is 85.6% with LightGBM classifier, and SFS approach is used to manifest seven discriminative linguistic features. Conclusions: The statistical and experimental results illustrates the feasibility by using speech to predict AD effectively based on acoustic and linguistic feature parameters. Stronger classifier and discriminate features are vital for the final results. We emphasise the best linguistic features for predicting AD disease are based on the POS tag, lexical Richness, fluency, semantic feature. Ensemble of classifiers usually has a better performance than single classifier.