scholarly journals KM-MBFO: A Hybrid Hadoop Map Reduce Access for Clustering Big Data by Adopting Modified Bacterial Foraging Optimization Algorithm

K-Means Clustering is a very powerful and frequently used algorithm for the clustering, it has got its own limitation. The prevalent K-Means clustering algorithm used for grouping have inadequacies, for example, slow convergence rate, local optima trap, and so on. Therefore, many swarm knowledge based procedures combined with KM for clustering were presented and demonstrated their presentation, its variations and its applications in data grouping. In this paper we intend to propose a parallel organizing strategy for KM-MBFO mechanism that actualized in Hadoop Distributed File System (HDFS) for diminishing the execution time. This Mapper approach produces the populace for given data set for grouping. The Modified Bacterial Foraging Optimization (MBFO) algorithm finds the wellness of the populace to choose the optimal K values as far as execution time and classification error. Through simulated test results, we assess the demonstration of the proposed KM-BFO conspire

2013 ◽  
Vol 411-414 ◽  
pp. 1884-1893
Author(s):  
Yong Chun Cao ◽  
Ya Bin Shao ◽  
Shuang Liang Tian ◽  
Zheng Qi Cai

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.


2018 ◽  
Vol 9 (3) ◽  
pp. 15-30 ◽  
Author(s):  
S. Vengadeswaran ◽  
S. R. Balasundaram

This article describes how the time taken to execute a query and return the results, increase exponentially as the data size increases, leading to more waiting times of the user. Hadoop with its distributed processing capability is considered as an efficient solution for processing such large data. Hadoop's Default Data Placement Strategy (HDDPS) allocates the data blocks randomly across the cluster of nodes without considering any of the execution parameters. This result in non-availability of the blocks required for execution in local machine so that the data has to be transferred across the network for execution, leading to data locality issue. Also, it is commonly observed that most of the data intensive applications show grouping semantics. Hence during query execution, only a part of the Big-Data set is utilized. Since such execution parameters and grouping behavior are not considered, the default placement does not perform well resulting in several lacunas such as decreased local map task execution, increased query execution time, query latency, etc. In order to overcome such issues, an Optimal Data Placement Strategy (ODPS) based on grouping semantics is proposed. Initially, user history log is dynamically analyzed for identifying access pattern which is depicted as a graph. Markov clustering, a Graph clustering algorithm is applied to identify groupings among the dataset. Then, an Optimal Data Placement Algorithm (ODPA) is proposed based on the statistical measures estimated from the clustered graph. This in turn re-organizes the default data layouts in HDFS to achieve improved performance for Big-Data sets in heterogeneous distributed environment. Our proposed strategy is tested in a 15 node cluster placed in a single rack topology. The result has proved to be more efficient for massive datasets, reducing query execution time by 26% and significantly improves the data locality by 38% compared to HDDPS.


Author(s):  
Hongwei Mo ◽  
Yujing Yin

This paper addresses the issue of image segmentation by clustering in the domain of image processing. The clustering algorithm taken account here is the Fuzzy C-Means which is widely adopted in this field. Bacterial Foraging Optimization Algorithm is an optimal algorithm inspired by the foraging behavior of E.coli. For the purpose to reinforce the global search capability of FCM, the Bacterial Foraging Algorithm was employed to optimize the objective criterion function which is interrelated to centroids in FCM. To evaluate the validation of the composite algorithm, cluster validation indexes were used to obtain numerical results and guide the possible best solution found by BF-FCM. Several experiments were conducted on three UCI data sets. For image segmentation, BF-FCM successfully segmented 8 typical grey scale images, and most of them obtained the desired effects. All the experiment results show that BF-FCM has better performance than that of standard FCM.


Author(s):  
S. Punitha ◽  
A. Amuthan ◽  
K. Suresh Joseph

: Breast cancer is essential to be detected in primitive localized stage for enhancing the possibility of survival since it is considered as the major malediction to the women society around the globe. Most of the intelligent approaches devised for breast cancer necessitates expertise that results in reliable identification of patterns that conclude the presence of oncology cells and determine the possible treatment to the breast cancer patients in order to enhance their survival feasibility. Moreover, the majority of the existing scheme of the literature incurs intensive labor and time, which induces predominant impact over the diagnosis time utilized for detecting breast cancer cells. An Intelligent Artificial Bee Colony and Adaptive Bacterial Foraging Optimization (IABC-ABFO) scheme is proposed for facilitating better rate of local and global searching ability in selecting the optimal features subsets and optimal parameters of ANN considered for breast cancer diagnosis. In the proposed IABC-ABFO approach, the traditional ABC algorithm used for cancer detection is improved by integrating an adaptive bacterial foraging process in the onlooker bee and the employee bee phase that results in an optimal exploitation and exploration. The results investigation of the proposed IABC-ABFO approach facilitated using Wisconsin breast cancer data set confirmed an enhanced mean classification accuracy of 99.52% on par with the existing baseline cancer detection schemes.


2011 ◽  
Vol 2 (3) ◽  
pp. 16-28
Author(s):  
Hongwei Mo ◽  
Yujing Yin

This paper addresses the issue of image segmentation by clustering in the domain of image processing. The clustering algorithm taken account here is the Fuzzy C-Means which is widely adopted in this field. Bacterial Foraging Optimization Algorithm is an optimal algorithm inspired by the foraging behavior of E.coli. For the purpose to reinforce the global search capability of FCM, the Bacterial Foraging Algorithm was employed to optimize the objective criterion function which is interrelated to centroids in FCM. To evaluate the validation of the composite algorithm, cluster validation indexes were used to obtain numerical results and guide the possible best solution found by BF-FCM. Several experiments were conducted on three UCI data sets. For image segmentation, BF-FCM successfully segmented 8 typical grey scale images, and most of them obtained the desired effects. All the experiment results show that BF-FCM has better performance than that of standard FCM.


2019 ◽  
Vol 8 (S2) ◽  
pp. 83-87
Author(s):  
S. Peerbasha ◽  
M. Mohamed Surputheen

The development of many educational institutions is based on the performance of students learning and understanding capabilities. Here, we analyzed their academic profile with their grades and various cumulative attributes. The academic performance in learning their subjects could be improved by motivational approach. The analysis of student performance is carried out through knowledge-based data mining process. But, the problem is arrived by a probability of information prediction accuracy from student data set which is not accurate. Here, we propose a novel machine learning algorithm based on subspace clustering and multi-perspective classification techniques to identify psychological motivation required students. Also, the extraction of relational patterns to form enhanced clustering classes is done. This discovers the innovative relations between students and their educational performance in the various attributes using surf scale nested clustering approach based on an intelligent predicting system from soft computing processing tasks. This improves the data prediction rate by considering the time factor analysis and complexity to design and develop an efficient clustering algorithm which maximizes the clustering and classification accuracy for improving academic performance.


Symmetry ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 1274 ◽  
Author(s):  
Satvik Vats ◽  
Bharat Bhushan Sagar ◽  
Karan Singh ◽  
Ali Ahmadian ◽  
Bruno A. Pansera

Traditional data analytics tools are designed to deal with the asymmetrical type of data i.e., structured, semi-structured, and unstructured. The diverse behavior of data produced by different sources requires the selection of suitable tools. The restriction of recourses to deal with a huge volume of data is a challenge for these tools, which affects the performances of the tool’s execution time. Therefore, in the present paper, we proposed a time optimization model, shares common HDFS (Hadoop Distributed File System) between three Name-node (Master Node), three Data-node, and one Client-node. These nodes work under the DeMilitarized zone (DMZ) to maintain symmetry. Machine learning jobs are explored from an independent platform to realize this model. In the first node (Name-node 1), Mahout is installed with all machine learning libraries through the maven repositories. The second node (Name-node 2), R connected to Hadoop, is running through the shiny-server. Splunk is configured in the third node (Name-node 3) and is used to analyze the logs. Experiments are performed between the proposed and legacy model to evaluate the response time, execution time, and throughput. K-means clustering, Navies Bayes, and recommender algorithms are run on three different data sets, i.e., movie rating, newsgroup, and Spam SMS data set, representing structured, semi-structured, and unstructured data, respectively. The selection of tools defines data independence, e.g., Newsgroup data set to run on Mahout as others cannot be compatible with this data. It is evident from the outcome of the data that the performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model. In addition, the proposed model can process any kind of algorithm on different sets of data, which resides in its native formats.


1996 ◽  
Vol 35 (01) ◽  
pp. 41-51 ◽  
Author(s):  
F. Molino ◽  
D. Furia ◽  
F. Bar ◽  
S. Battista ◽  
N. Cappello ◽  
...  

AbstractThe study reported in this paper is aimed at evaluating the effectiveness of a knowledge-based expert system (ICTERUS) in diagnosing jaundiced patients, compared with a statistical system based on probabilistic concepts (TRIAL). The performances of both systems have been evaluated using the same set of data in the same number of patients. Both systems are spin-off products of the European project Euricterus, an EC-COMACBME Project designed to document the occurrence and diagnostic value of clinical findings in the clinical presentation of jaundice in Europe, and have been developed as decision-making tools for the identification of the cause of jaundice based only on clinical information and routine investigations. Two groups of jaundiced patients were studied, including 500 (retrospective sample) and 100 (prospective sample) subjects, respectively. All patients were independently submitted to both decision-support tools. The input of both systems was the data set agreed within the Euricterus Project. The performances of both systems were evaluated with respect to the reference diagnoses provided by experts on the basis of the full clinical documentation. Results indicate that both systems are clinically reliable, although the diagnostic prediction provided by the knowledge-based approach is slightly better.


Sign in / Sign up

Export Citation Format

Share Document