KM-MBFO: A Hybrid Hadoop Map Reduce Access for Clustering Big Data by Adopting Modified Bacterial Foraging Optimization Algorithm

K-Means Clustering is a very powerful and frequently used algorithm for the clustering, it has got its own limitation. The prevalent K-Means clustering algorithm used for grouping have inadequacies, for example, slow convergence rate, local optima trap, and so on. Therefore, many swarm knowledge based procedures combined with KM for clustering were presented and demonstrated their presentation, its variations and its applications in data grouping. In this paper we intend to propose a parallel organizing strategy for KM-MBFO mechanism that actualized in Hadoop Distributed File System (HDFS) for diminishing the execution time. This Mapper approach produces the populace for given data set for grouping. The Modified Bacterial Foraging Optimization (MBFO) algorithm finds the wellness of the populace to choose the optimal K values as far as execution time and classification error. Through simulated test results, we assess the demonstration of the proposed KM-BFO conspire

Download Full-text

Multiobjective fuzzy knowledge‐based bacterial foraging optimization for congestion control in clustered wireless sensor networks

International Journal of Communication Systems ◽

10.1002/dac.4949 ◽

2021 ◽

Author(s):

Elaheh Moharamkhani ◽

Behrouz Zadmehr ◽

Saeideh Memarian ◽

Mohammad Javad Saber ◽

Mohammad Shokouhifar

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Congestion Control ◽

Wireless Sensor ◽

Bacterial Foraging Optimization ◽

Bacterial Foraging ◽

Knowledge Based ◽

Fuzzy Knowledge

Download Full-text

A Dynamic Genetic Algorithm for Clustering Problems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1884 ◽

2013 ◽

Vol 411-414 ◽

pp. 1884-1893

Author(s):

Yong Chun Cao ◽

Ya Bin Shao ◽

Shuang Liang Tian ◽

Zheng Qi Cai

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Search Space ◽

Adaptive Mutation ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Clustering Problems

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.

Download Full-text

An Optimal Data Placement Strategy for Improving System Performance of Massive Data Applications Using Graph Clustering

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2018070102 ◽

2018 ◽

Vol 9 (3) ◽

pp. 15-30 ◽

Cited By ~ 4

Author(s):

S. Vengadeswaran ◽

S. R. Balasundaram

Keyword(s):

Big Data ◽

Execution Time ◽

Clustering Algorithm ◽

Graph Clustering ◽

Data Placement ◽

Data Locality ◽

Query Execution ◽

Data Set ◽

Statistical Measures ◽

Default Data

This article describes how the time taken to execute a query and return the results, increase exponentially as the data size increases, leading to more waiting times of the user. Hadoop with its distributed processing capability is considered as an efficient solution for processing such large data. Hadoop's Default Data Placement Strategy (HDDPS) allocates the data blocks randomly across the cluster of nodes without considering any of the execution parameters. This result in non-availability of the blocks required for execution in local machine so that the data has to be transferred across the network for execution, leading to data locality issue. Also, it is commonly observed that most of the data intensive applications show grouping semantics. Hence during query execution, only a part of the Big-Data set is utilized. Since such execution parameters and grouping behavior are not considered, the default placement does not perform well resulting in several lacunas such as decreased local map task execution, increased query execution time, query latency, etc. In order to overcome such issues, an Optimal Data Placement Strategy (ODPS) based on grouping semantics is proposed. Initially, user history log is dynamically analyzed for identifying access pattern which is depicted as a graph. Markov clustering, a Graph clustering algorithm is applied to identify groupings among the dataset. Then, an Optimal Data Placement Algorithm (ODPA) is proposed based on the statistical measures estimated from the clustered graph. This in turn re-organizes the default data layouts in HDFS to achieve improved performance for Big-Data sets in heterogeneous distributed environment. Our proposed strategy is tested in a 15 node cluster placed in a single rack topology. The result has proved to be more efficient for massive datasets, reducing query execution time by 26% and significantly improves the data locality by 38% compared to HDDPS.

Download Full-text

Image Segmentation Based on Bacterial Foraging and FCM Algorithm

Recent Algorithms and Applications in Swarm Intelligence Research ◽

10.4018/978-1-4666-2479-5.ch011 ◽

2013 ◽

pp. 209-222

Author(s):

Hongwei Mo ◽

Yujing Yin

Keyword(s):

Image Segmentation ◽

Clustering Algorithm ◽

Optimal Algorithm ◽

Data Sets ◽

Bacterial Foraging Optimization ◽

Objective Criterion ◽

Bacterial Foraging ◽

Bacterial Foraging Algorithm ◽

Search Capability ◽

Bacterial Foraging Optimization Algorithm

This paper addresses the issue of image segmentation by clustering in the domain of image processing. The clustering algorithm taken account here is the Fuzzy C-Means which is widely adopted in this field. Bacterial Foraging Optimization Algorithm is an optimal algorithm inspired by the foraging behavior of E.coli. For the purpose to reinforce the global search capability of FCM, the Bacterial Foraging Algorithm was employed to optimize the objective criterion function which is interrelated to centroids in FCM. To evaluate the validation of the composite algorithm, cluster validation indexes were used to obtain numerical results and guide the possible best solution found by BF-FCM. Several experiments were conducted on three UCI data sets. For image segmentation, BF-FCM successfully segmented 8 typical grey scale images, and most of them obtained the desired effects. All the experiment results show that BF-FCM has better performance than that of standard FCM.

Download Full-text

An Intelligent Artificial Bee Colony and Adaptive Bacterial Foraging Optimization Scheme for reliable breast cancer diagnosis

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200618143705 ◽

2020 ◽

Vol 13 ◽

Author(s):

S. Punitha ◽

A. Amuthan ◽

K. Suresh Joseph

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

Cancer Diagnosis ◽

Artificial Bee Colony ◽

Breast Cancer Diagnosis ◽

Bacterial Foraging Optimization ◽

Data Set ◽

Cancer Data ◽

Bacterial Foraging ◽

Bee Colony

: Breast cancer is essential to be detected in primitive localized stage for enhancing the possibility of survival since it is considered as the major malediction to the women society around the globe. Most of the intelligent approaches devised for breast cancer necessitates expertise that results in reliable identification of patterns that conclude the presence of oncology cells and determine the possible treatment to the breast cancer patients in order to enhance their survival feasibility. Moreover, the majority of the existing scheme of the literature incurs intensive labor and time, which induces predominant impact over the diagnosis time utilized for detecting breast cancer cells. An Intelligent Artificial Bee Colony and Adaptive Bacterial Foraging Optimization (IABC-ABFO) scheme is proposed for facilitating better rate of local and global searching ability in selecting the optimal features subsets and optimal parameters of ANN considered for breast cancer diagnosis. In the proposed IABC-ABFO approach, the traditional ABC algorithm used for cancer detection is improved by integrating an adaptive bacterial foraging process in the onlooker bee and the employee bee phase that results in an optimal exploitation and exploration. The results investigation of the proposed IABC-ABFO approach facilitated using Wisconsin breast cancer data set confirmed an enhanced mean classification accuracy of 99.52% on par with the existing baseline cancer detection schemes.

Download Full-text

Image Segmentation Based on Bacterial Foraging and FCM Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/jsir.2011070102 ◽

2011 ◽

Vol 2 (3) ◽

pp. 16-28

Author(s):

Hongwei Mo ◽

Yujing Yin

Keyword(s):

Image Segmentation ◽

Clustering Algorithm ◽

Optimal Algorithm ◽

Data Sets ◽

Criterion Function ◽

Bacterial Foraging Optimization ◽

Objective Criterion ◽

Bacterial Foraging ◽

Bacterial Foraging Algorithm ◽

Search Capability

Download Full-text

Study and Analysis of Data Mining Algorithms for Identifying the Students’ for Psychology Motivation

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s2.2018 ◽

2019 ◽

Vol 8 (S2) ◽

pp. 83-87

Author(s):

S. Peerbasha ◽

M. Mohamed Surputheen

Keyword(s):

Data Mining ◽

Academic Performance ◽

Student Performance ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Data Set ◽

Knowledge Based ◽

Data Mining Algorithms ◽

Data Prediction ◽

Clustering And Classification

The development of many educational institutions is based on the performance of students learning and understanding capabilities. Here, we analyzed their academic profile with their grades and various cumulative attributes. The academic performance in learning their subjects could be improved by motivational approach. The analysis of student performance is carried out through knowledge-based data mining process. But, the problem is arrived by a probability of information prediction accuracy from student data set which is not accurate. Here, we propose a novel machine learning algorithm based on subspace clustering and multi-perspective classification techniques to identify psychological motivation required students. Also, the extraction of relational patterns to form enhanced clustering classes is done. This discovers the innovative relations between students and their educational performance in the various attributes using surf scale nested clustering approach based on an intelligent predicting system from soft computing processing tasks. This improves the data prediction rate by considering the time factor analysis and complexity to design and develop an efficient clustering algorithm which maximizes the clustering and classification accuracy for improving academic performance.

Download Full-text

Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry

Symmetry ◽

10.3390/sym12081274 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1274 ◽

Cited By ~ 1

Author(s):

Satvik Vats ◽

Bharat Bhushan Sagar ◽

Karan Singh ◽

Ali Ahmadian ◽

Bruno A. Pansera

Keyword(s):

Machine Learning ◽

Execution Time ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Data Set ◽

Demilitarized Zone ◽

Proposed Model ◽

Hadoop Distributed File System ◽

Selection Of

Traditional data analytics tools are designed to deal with the asymmetrical type of data i.e., structured, semi-structured, and unstructured. The diverse behavior of data produced by different sources requires the selection of suitable tools. The restriction of recourses to deal with a huge volume of data is a challenge for these tools, which affects the performances of the tool’s execution time. Therefore, in the present paper, we proposed a time optimization model, shares common HDFS (Hadoop Distributed File System) between three Name-node (Master Node), three Data-node, and one Client-node. These nodes work under the DeMilitarized zone (DMZ) to maintain symmetry. Machine learning jobs are explored from an independent platform to realize this model. In the first node (Name-node 1), Mahout is installed with all machine learning libraries through the maven repositories. The second node (Name-node 2), R connected to Hadoop, is running through the shiny-server. Splunk is configured in the third node (Name-node 3) and is used to analyze the logs. Experiments are performed between the proposed and legacy model to evaluate the response time, execution time, and throughput. K-means clustering, Navies Bayes, and recommender algorithms are run on three different data sets, i.e., movie rating, newsgroup, and Spam SMS data set, representing structured, semi-structured, and unstructured data, respectively. The selection of tools defines data independence, e.g., Newsgroup data set to run on Mahout as others cannot be compatible with this data. It is evident from the outcome of the data that the performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model. In addition, the proposed model can process any kind of algorithm on different sets of data, which resides in its native formats.

Download Full-text

Computer-Aided Diagnosis in Jaundice: Comparison of Knowledge-based and Probabilistic Approaches

Methods of Information in Medicine ◽

10.1055/s-0038-1634634 ◽

1996 ◽

Vol 35 (01) ◽

pp. 41-51 ◽

Cited By ~ 3

Author(s):

F. Molino ◽

D. Furia ◽

F. Bar ◽

S. Battista ◽

N. Cappello ◽

...

Keyword(s):

Clinical Presentation ◽

Clinical Information ◽

Diagnostic Value ◽

Clinical Findings ◽

Clinical Documentation ◽

Data Set ◽

Knowledge Based ◽

Number Of Patients ◽

Support Tools ◽

Aided Diagnosis

AbstractThe study reported in this paper is aimed at evaluating the effectiveness of a knowledge-based expert system (ICTERUS) in diagnosing jaundiced patients, compared with a statistical system based on probabilistic concepts (TRIAL). The performances of both systems have been evaluated using the same set of data in the same number of patients. Both systems are spin-off products of the European project Euricterus, an EC-COMACBME Project designed to document the occurrence and diagnostic value of clinical findings in the clinical presentation of jaundice in Europe, and have been developed as decision-making tools for the identification of the cause of jaundice based only on clinical information and routine investigations. Two groups of jaundiced patients were studied, including 500 (retrospective sample) and 100 (prospective sample) subjects, respectively. All patients were independently submitted to both decision-support tools. The input of both systems was the data set agreed within the Euricterus Project. The performances of both systems were evaluated with respect to the reference diagnoses provided by experts on the basis of the full clinical documentation. Results indicate that both systems are clinically reliable, although the diagnostic prediction provided by the knowledge-based approach is slightly better.

Download Full-text