scholarly journals Automatic Threshold Selections by exploration and exploitation of optimization algorithm in Record Deduplication

2016 ◽  
Vol 12 (11) ◽  
pp. 4515-4522
Author(s):  
K. Deepa ◽  
C. Vivek ◽  
S.Palanivel Rajan

A deduplication process uses similarity function to identify the two entries are duplicate or not by setting the threshold.  This threshold setting is an important issue to achieve more accuracy and it relies more on human intervention. Swarm Intelligence algorithm such as PSO and ABC have been used for automatic detection of threshold to find the duplicate records. Though the algorithms performed well there is still an insufficiency regarding the solution search equation, which is used to generate new candidate solutions based on the information of previous solutions.  The proposed work addressed two problems: first to find the optimal equation using Genetic Algorithm(GA) and next it adopts an modified  Artificial Bee Colony (ABC) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. CORA dataset is considered to analyze the proposed algorithm.

2014 ◽  
Vol 951 ◽  
pp. 239-244 ◽  
Author(s):  
Xiao Qiang Xu ◽  
De Ming Lei

The lot streaming (LS) problem in job shop with equal-size sub-lots and intermittent idling is considered. An effective swarm intelligence algorithm with an artificial bee colony (ABC) algorithm is proposed for the minimization of total penalties of tardiness and earliness. In the first period of ABC, the employed bee phase and the onlooker bee phase are both for lot/sub-lot scheduling. In the second period, the LS conditions are determined in the employed bee phase and the lot/sub-lot is scheduled in the onlooker phase. The worst solution of the swarm is replaced with the elite one every few cycles. Computational results show the promising advantage of ABC.


Author(s):  
Shi Cheng ◽  
Yuhui Shi ◽  
Quande Qin

Premature convergence occurs in swarm intelligence algorithms searching for optima. A swarm intelligence algorithm has two kinds of abilities: the exploration of new possibilities and the exploitation of old certainties. The exploration ability means that an algorithm can explore more search places to increase the possibility that the algorithm can find good enough solutions. In contrast, the exploitation ability means that an algorithm focuses on the refinement of found promising areas. An algorithm should have a balance between exploration and exploitation, that is, the allocation of computational resources should be optimized to ensure that an algorithm can find good enough solutions effectively. The diversity measures the distribution of individuals' information. From the observation of the distribution and diversity change, the degree of exploration and exploitation can be obtained.


Sign in / Sign up

Export Citation Format

Share Document