Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

PurposeThe innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.Design/methodology/approachThe purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.FindingsThe maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.Originality/valueIn this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Download Full-text

CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v11i11.10116 ◽

2016 ◽

Vol 11 (11) ◽

pp. 1007 ◽

Cited By ~ 3

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Data Classification ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text

An Intelligent Metaheuristic Binary Pigeon Optimization-Based Feature Selection and Big Data Classification in a MapReduce Environment

Mathematics ◽

10.3390/math9202627 ◽

2021 ◽

Vol 9 (20) ◽

pp. 2627

Author(s):

Felwa Abukhodair ◽

Wafaa Alsaggaf ◽

Amani Tariq Jamal ◽

Sayed Abdel-Khalek ◽

Romany F. Mansour

Keyword(s):

Feature Selection ◽

Big Data ◽

Short Term Memory ◽

Programming Model ◽

Data Classification ◽

Massive Data ◽

Effective Performance ◽

Different Dimensions ◽

Long Short Term Memory ◽

Big Data Classification

Big Data are highly effective for systematically extracting and analyzing massive data. It can be useful to manage data proficiently over the conventional data handling approaches. Recently, several schemes have been developed for handling big datasets with several features. At the same time, feature selection (FS) methodologies intend to eliminate repetitive, noisy, and unwanted features that degrade the classifier results. Since conventional methods have failed to attain scalability under massive data, the design of new Big Data classification models is essential. In this aspect, this study focuses on the design of metaheuristic optimization based on big data classification in a MapReduce (MOBDC-MR) environment. The MOBDC-MR technique aims to choose optimal features and effectively classify big data. In addition, the MOBDC-MR technique involves the design of a binary pigeon optimization algorithm (BPOA)-based FS technique to reduce the complexity and increase the accuracy. Beetle antenna search (BAS) with long short-term memory (LSTM) model is employed for big data classification. The presented MOBDC-MR technique has been realized on Hadoop with the MapReduce programming model. The effective performance of the MOBDC-MR technique was validated using a benchmark dataset and the results were investigated under several measures. The MOBDC-MR technique demonstrated promising performance over the other existing techniques under different dimensions.

Download Full-text

Datamasser og sansemiljøer - Iscenesættelsen af big data mellem information, omslutning og overvågning

MedieKultur Journal of media and communication research ◽

10.7146/mediekultur.v31i59.20004 ◽

2016 ◽

Vol 31 (59) ◽

pp. 26

Author(s):

Ulrik Schmidt

Keyword(s):

Big Data ◽

Dynamic Environments ◽

Digital Culture ◽

Massive Data ◽

Data Sets ◽

Data Visualisation ◽

Massive Data Sets ◽

Sensory Environment ◽

Major Trend ◽

Everyday Communication

”Data Masses and Sensory Environments” explores a major trend in current digital culture to visualise massive data sets in the form of abstract, dynamic environments. This ‘performative’ staging of big data manifests what we could think of as big data aesthetics proper because it gives the ‘big’ and ‘massive’ properties of big data a direct and perceptible visual expression. Drawing on several recent examples of big data visualisation, the article examines the different manifestations and aesthetic potential of such performative big data aesthetics. It is concluded that the performative ‘massification’ of big data in abstract environments has important implications for our everyday communication with and through data because it potentially generates a conflict between the comprehension of information and a more abstract and defocused ‘ambient’ sensation of being surrounded by a ubiquitous and all-encompassing sensory environment.

Download Full-text

Adaptive hybrid optimization enabled stack autoencoder-based MapReduce framework for big data classification

2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) ◽

10.1109/ic-etite47903.2020.6366147 ◽

2020 ◽

Author(s):

S.Md. Mujeeb ◽

R. Praveen Sam ◽

K. Madhavi

Keyword(s):

Big Data ◽

Data Classification ◽

Hybrid Optimization ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

A STUDY ON THE ERROR OF DISTRIBUTED ALGORITHMS FOR BIG DATA CLASSIFICATION WITH SVM

The ANZIAM Journal ◽

10.1017/s1446181116000390 ◽

2017 ◽

Vol 58 (3-4) ◽

pp. 231-237

Author(s):

CHENG WANG ◽

FEILONG CAO

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Distributed Algorithm ◽

Data Classification ◽

Classification Error ◽

Support Vector ◽

Data Sets ◽

Gaussian Kernels ◽

Big Data Classification ◽

The Given

The error of a distributed algorithm for big data classification with a support vector machine (SVM) is analysed in this paper. First, the given big data sets are divided into small subsets, on which the classical SVM with Gaussian kernels is used. Then, the classification error of the SVM for each subset is analysed based on the Tsybakov exponent, geometric noise, and width of the Gaussian kernels. Finally, the whole error of the distributed algorithm is estimated in terms of the error of each subset.

Download Full-text

Rider-Deep Belief Network-Based MapReduce Framework for Big Data Classification

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-0878-0_24 ◽

2021 ◽

pp. 241-250

Author(s):

Sridhar Gujjeti ◽

Suresh Pabboju

Keyword(s):

Big Data ◽

Data Classification ◽

Deep Belief Network ◽

Mapreduce Framework ◽

Belief Network ◽

Big Data Classification

Download Full-text

Energy and Trust Aware Secure Routing Algorithm for Big Data Classification Using Mapreduce Framework in Iot Networks

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500100 ◽

2020 ◽

Author(s):

S. Md. Mujeeb ◽

R. Praveen Sam ◽

K. Madhavi

Keyword(s):

Big Data ◽

Routing Algorithm ◽

Data Classification ◽

Secure Routing ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

Neural Network for Big Data Sets

Computational Intelligence in the Internet of Things - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-7955-7.ch012 ◽

2019 ◽

pp. 271-303

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Download Full-text

Semantics for Big Data Sets

Advances in Data Mining and Database Management - Handbook of Research on Big Data and the IoT ◽

10.4018/978-1-5225-7432-3.ch007 ◽

2019 ◽

pp. 101-124

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Information Technology ◽

Big Data ◽

Computer Science ◽

Knowledge Discovery ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Data Semantics ◽

The World ◽

Commercial Applications

Information technology, computer science, etc. have been developed more and more in many countries in the world. Their subfields have already had many very crucial contributions to everyone life: production, politics, advertisement, etc. Especially, big data semantics, scientific and knowledge discovery, and intelligence are the subareas that are gaining more interest. Therefore, the authors display semantics for massive data sets fully in this chapter. This is very significant for commercial applications, studies, researchers, etc. in the world.

Download Full-text