Performance of Various Machine Learning Classifiers on Small Datasets with Varying Dimensionalities: A Study
Classification is an important supervised learning technique that is used by many applications. An important factor on which the performance of a classifier depends is the size of the dataset using which the classifier is going to be trained. In this manuscript the authors analyzed five different classification techniques (namely decision trees, KNN, SVM, linear discriminant and Ensemble method) in terms of AUC and predictive accuracy when trained using small datasets with different dimensionalities. The study was done using a dataset with 24 features and 400 instances (samples). The results showed that in general ensemble method (using boosted trees) performed better than others but its performance degraded a bit with reduced dimensionality.