Abstract
Objective
Machine learning (ML) classifier performance estimates are affected by sample size and class imbalance in training data, and yet performance is often reported with balanced data. We explore the effect of varying sample size and dementia conversion base rate on the performance of a classifier that predicts future dementia.
Method
Longitudinal data from the National Alzheimer’s Coordination Center (NACC) Uniform Data Set (UDS) were used. All participants had MCI at baseline. A random forest classifier (RFC) was trained to predict dementia at 1, 2, and 3 years. Predictors included baseline neuropsychological test scores, demographics, and health history. Cases were sampled at multiple sample sizes (N = 125, 250, 500, 1000 and 2000) and base rates (0.1, 0.2, 0.3, 0.4, and 0.5). Performance was evaluated using Matthews Correlation Coefficient (MCC).
Results
For balanced data (N = 1000), the classifier predicts conversion to dementia at 3 years with an MCC of 0.54 (sensitivity = 0.79; specificity = 0.75). As expected, means of classifier performance estimates decline as the conversion rate decreases. Likewise, variability of estimates increases with smaller sample sizes. For a conversion rate of 30%, consistent with many memory clinics, classifier performance declines only moderately (MCC = 0.44). In conversion rates of 10% and 20%, performance approaches chance. Performance trends illustrated in Figure 1.
Conclusions
Such classifiers may have clinical utility in memory clinics with higher conversion rates. Expected tradeoffs are observed with respect to diminishing sample size increasing error variance, and higher base rates of positive cases improving overall performance. Results provide potential guidelines for sample size and recruitment targets with RFC designs.