Statistical Study of Machine Learning Algorithms Using Parametric and Non-Parametric Tests
The emerging area of the internet of things (IoT) generates a large amount of data from IoT applications such as health care, smart cities, etc. This data needs to be analyzed in order to derive useful inferences. Machine learning (ML) plays a significant role in analyzing such data. It becomes difficult to select optimal algorithm from the available set of algorithms/classifiers to obtain best results. The performance of algorithms differs when applied to datasets from different application domains. In learning, it is difficult to understand if the difference in performance is real or due to random variation in test data, training data, or internal randomness of the learning algorithms. This study takes into account these issues during a comparison of ML algorithms for binary and multivariate classification. It helps in providing guidelines for statistical validation of results. The results obtained show that the performance measure of accuracy for one algorithm differs by critical difference (CD) than others over binary and multivariate datasets obtained from different application domains.