scholarly journals LDT-MRF: Log decision tree and map reduce framework to clinical big data classification

2017 ◽  
Vol 7 (1.5) ◽  
pp. 97
Author(s):  
T. Surekha ◽  
R. Siva Rama Prasad

The growth of the data is enormous in the current scenario of the developing information technology and performing the data classification is complex both in time and information extraction. Moreover, there are uncertainties in performing the big data classification that are associated with the unbalanced datasets. In order to overcome the issues, a novel method of big data classification is introduced in this paper. The novel method, Log Decision Tree and Map Reduce Framework (LDT-MRF) uses the Log Decision Tree (LDT) and the Map Reduce Framework (MRF) for performing the parallel data classification. The novel parameter termed as Log-entropy is used to select the best feature attribute for data classification. The data classification is performed using the LDT that enables the efficient data classification. Experimentation is carried out using three datasets, namely the Cleveland dataset, Switzerland dataset, and the Breast Cancer dataset. The comparative analysis is carried out using the performance metrics, such as sensitivity, specificity, and accuracy to prove the effectiveness of the proposed method. The sensitivity, specificity, and accuracy of the proposed method is 84.7596%, 74.633%, and 80.9088% respectively, which is greater when compared with the existing methods of big data classification. 

2020 ◽  
pp. 1742-1763
Author(s):  
Neha Bansal ◽  
R.K. Singh ◽  
Arun Sharma

This article describes how classification algorithms have emerged as strong meta-learning techniques to accurately and efficiently analyze the masses of data generated from the widespread use of internet and other sources. In particular, there is need of some mechanism which classifies unstructured data into some organized form. Classification techniques over big transactional database may provide required data to the users from large datasets in a more simplified way. With the intention of organizing and clearly representing the current state of classification algorithms for big data, present paper discusses various concepts and algorithms, and also an exhaustive review of existing classification algorithms over big data classification frameworks and other novel frameworks. The paper provides a comprehensive comparison, both from a theoretical as well as an empirical perspective. The effectiveness of the candidate classification algorithms is measured through a number of performance metrics such as implementation technique, data source validation, and scalability etc.


At present the Big Data applications, for example, informal communication, therapeutic human services, horticulture, banking, financial exchange, instruction, Facebook and so forth are producing the information with extremely rapid. Volume and Velocity of the Big information assumes a significant job in the presentation of Big information applications. Execution of the Big information application can be influenced by different parameters. Expediently search, proficiency and precision are the a portion of the overwhelming parameters which influence the general execution of any Big information applications. Due the immediate and aberrant inclusion of the qualities of 7Vs of Big information, each Big Data administrations anticipate the elite. Elite is the greatest test in the present evolving situation. In this paper we propose the Big Data characterization way to deal with speedup the Big Data applications. This paper is the review paper, we allude different Big information advancements and the related work in the field of Big Data Classification. In the wake of learning and understanding the writing we discover the holes in existing work and techniques. Finally we propose the novel methodology of Big Data characterization. Our methodology relies on the Deep Learning and Apache Spark engineering. In the proposed work two stages are appeared; first stage is include choice and second stage is Big Data Classification. Apache Spark is the most reasonable and predominant innovation to execute this proposed work. Apache Spark is having two hubs; introductory hubs and last hubs. The element choice will be occur in introductory hubs and Big Data Classification will happen in definite hubs of Apache Spark


Author(s):  
Ahmad B. A. Hassanat

Big Data classification has recently received a great deal of attention due to the main properties of Big Data, which are volume, variety, and velocity. The furthest-pair-based binary search tree (FPBST) shows a great potential for Big Data classification. This work attempts to improve the performance the FPBST in terms of computation time, space consumed and accuracy. The major enhancement of the FPBST includes converting the resultant BST to a decision tree, in order to remove the need for the slow K-nearest neighbors (KNN), and to obtain a smaller tree, which is useful for memory usage, speeding both training and testing phases and increasing the classification accuracy. The proposed decision trees are based on calculating the probabilities of each class at each node using various methods; these probabilities are then used by the testing phase to classify an unseen example. The experimental results on some (small, intermediate and big) machine learning datasets show the efficiency of the proposed methods, in terms of space, speed and accuracy compared to the FPBST, which shows great potential for further enhancements of the proposed methods to be used in practice.


IEEE Access ◽  
2022 ◽  
pp. 1-1
Author(s):  
Hasan N. Ali ◽  
Ahmad B. Hassanat ◽  
Ahmad S. Tarawneh ◽  
Malek Alrashidi ◽  
Mansoor Alghamdi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document