Adaptive hybrid optimization enabled stack autoencoder-based MapReduce framework for big data classification

PurposeThe innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.Design/methodology/approachThe purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.FindingsThe maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.Originality/valueIn this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Download Full-text

Rider-Deep Belief Network-Based MapReduce Framework for Big Data Classification

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-0878-0_24 ◽

2021 ◽

pp. 241-250

Author(s):

Sridhar Gujjeti ◽

Suresh Pabboju

Keyword(s):

Big Data ◽

Data Classification ◽

Deep Belief Network ◽

Mapreduce Framework ◽

Belief Network ◽

Big Data Classification

Download Full-text

Energy and Trust Aware Secure Routing Algorithm for Big Data Classification Using Mapreduce Framework in Iot Networks

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500100 ◽

2020 ◽

Author(s):

S. Md. Mujeeb ◽

R. Praveen Sam ◽

K. Madhavi

Keyword(s):

Big Data ◽

Routing Algorithm ◽

Data Classification ◽

Secure Routing ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

Big data classification with optimization driven MapReduce framework

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-210062 ◽

2021 ◽

Vol 25 (2) ◽

pp. 173-183

Author(s):

Mujeeb Shaik Mohammed ◽

Praveen Sam Rachapudy ◽

Madhavi Kasa

Keyword(s):

Big Data ◽

Optimization Algorithm ◽

Moving Average ◽

Imbalanced Data ◽

Data Classification ◽

Bat Algorithm ◽

True Positive Rate ◽

Mapreduce Framework ◽

Positive Rate ◽

Big Data Classification

With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face burden in handling them. Additionally, the presence of the imbalance data in the big data is a huge concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new optimization algorithm. Here, the big data classification is performed using the MapReduce framework, wherein the map and reduce functions are based on the proposed optimization algorithm. The optimization algorithm is named as Exponential Bat algorithm (E-Bat), which is the integration of the Exponential Weighted Moving Average (EWMA) and Bat Algorithm (BA). The function of map function is to select the features that are presented to the classification in the reducer module using the Neural Network (NN). Thus, the classification of big data is performed using the proposed E-Bat algorithm-based MapReduce Framework and the experimentation is performed using four standard databases, such as Breast cancer, Hepatitis, Pima Indian diabetes dataset, and Heart disease dataset. From, the experimental results, it can be shown that the proposed method acquired a maximal accuracy of 0.8829 and True Positive Rate (TPR) of 0.9090, respectively.

Download Full-text

Design and Development of Bayesian Optimization Algorithms for Big Data Classification Based on MapReduce Framework

Advances in Intelligent Systems and Computing - International Conference on Intelligent and Smart Computing in Data Analytics ◽

10.1007/978-981-33-6176-8_6 ◽

2021 ◽

pp. 47-53

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Optimization Algorithms ◽

Data Classification ◽

Bayesian Optimization ◽

Mapreduce Framework ◽

Design And Development ◽

Big Data Classification

Download Full-text

FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0020 ◽

2018 ◽

Vol 29 (1) ◽

pp. 994-1006

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Fuzzy Theory ◽

Data Classification ◽

Classification Problem ◽

Extraction Process ◽

Naïve Bayes ◽

Mapreduce Framework ◽

Skin Segmentation ◽

Big Data Classification

Abstract The term “big data” means a large amount of data, and big data management refers to the efficient handling, organization, or use of large volumes of structured and unstructured data belonging to an organization. Due to the gradual availability of plenty of raw data, the knowledge extraction process from big data is a very difficult task for most of the classical data mining and machine learning tools. In a previous paper, the correlative naive Bayes (CNB) classifier was developed for big data classification. This work incorporates the fuzzy theory along with the CNB classifier to develop the fuzzy CNB (FCNB) classifier. The proposed FCNB classifier solves the big data classification problem by using the MapReduce framework and thus achieves improved classification results. Initially, the database is converted to the probabilistic index table, in which data and attributes are presented in rows and columns, respectively. Then, the membership degree of the unique symbols present in each attribute of data is found. Finally, the proposed FCNB classifier finds the class of data based on training information. The simulation of the proposed FCNB classifier uses the localization and skin segmentation datasets for the purpose of experimentation. The results of the proposed FCNB classifier are analyzed based on the metrics, such as sensitivity, specificity, and accuracy, and compared with the various existing works.

Download Full-text

Folk Song Computer Big Data Classification and Analysis Research Based on National Style Characteristics

Journal of Physics Conference Series ◽

10.1088/1742-6596/1744/3/032117 ◽

2021 ◽

Vol 1744 (3) ◽

pp. 032117

Author(s):

Jin Yang

Keyword(s):

Big Data ◽

Data Classification ◽

Folk Song ◽

Big Data Classification ◽

National Style

Download Full-text

Application research of Sports Place System based on big data classification technology

2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE) ◽

10.1109/icbase51474.2020.00020 ◽

2020 ◽

Author(s):

Liu Hu ◽

Qingxuan Zeng

Keyword(s):

Big Data ◽

Data Classification ◽

Application Research ◽

Big Data Classification

Download Full-text

Research on complex attribute big data classification based on iterative fuzzy clustering algorithm

Web Intelligence ◽

10.3233/web-210463 ◽

2021 ◽

pp. 1-12

Author(s):

Li Qian

Keyword(s):

Big Data ◽

Fuzzy Clustering ◽

Classification Accuracy ◽

Clustering Algorithm ◽

Principal Component ◽

Data Classification ◽

Fisher Discriminant Analysis ◽

Fuzzy Clustering Algorithm ◽

Local Fisher Discriminant Analysis ◽

Big Data Classification

In order to overcome the low classification accuracy of traditional methods, this paper proposes a new classification method of complex attribute big data based on iterative fuzzy clustering algorithm. Firstly, principal component analysis and kernel local Fisher discriminant analysis were used to reduce dimensionality of complex attribute big data. Then, the Bloom Filter data structure is introduced to eliminate the redundancy of the complex attribute big data after dimensionality reduction. Secondly, the redundant complex attribute big data is classified in parallel by iterative fuzzy clustering algorithm, so as to complete the complex attribute big data classification. Finally, the simulation results show that the accuracy, the normalized mutual information index and the Richter’s index of the proposed method are close to 1, the classification accuracy is high, and the RDV value is low, which indicates that the proposed method has high classification effectiveness and fast convergence speed.

Download Full-text