Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce

Author(s):  
Vikas Singh ◽  
Rahul K. Gupta ◽  
Rahul K. Sevakula ◽  
Nishchal K. Verma
2017 ◽  
Vol 7 (1.5) ◽  
pp. 97
Author(s):  
T. Surekha ◽  
R. Siva Rama Prasad

The growth of the data is enormous in the current scenario of the developing information technology and performing the data classification is complex both in time and information extraction. Moreover, there are uncertainties in performing the big data classification that are associated with the unbalanced datasets. In order to overcome the issues, a novel method of big data classification is introduced in this paper. The novel method, Log Decision Tree and Map Reduce Framework (LDT-MRF) uses the Log Decision Tree (LDT) and the Map Reduce Framework (MRF) for performing the parallel data classification. The novel parameter termed as Log-entropy is used to select the best feature attribute for data classification. The data classification is performed using the LDT that enables the efficient data classification. Experimentation is carried out using three datasets, namely the Cleveland dataset, Switzerland dataset, and the Breast Cancer dataset. The comparative analysis is carried out using the performance metrics, such as sensitivity, specificity, and accuracy to prove the effectiveness of the proposed method. The sensitivity, specificity, and accuracy of the proposed method is 84.7596%, 74.633%, and 80.9088% respectively, which is greater when compared with the existing methods of big data classification. 


Sign in / Sign up

Export Citation Format

Share Document