Feature Selection for a Real-World Learning Task

Author(s):  
D. Kollmar ◽  
D.H. Hellmann
Author(s):  
Hamid Naceur Benkhlaed ◽  
Djamal Berrabah ◽  
Nassima Dif ◽  
Faouzi Boufares

One of the important processes in the data quality field is record linkage (RL). RL (also known as entity resolution) is the process of detecting duplicates that refer to the same real-world entity in one or more datasets. The most critical step during the RL process is blocking, which reduces the quadratic complexity of the process by dividing the data into a set of blocks. By that way, matching is done only between the records in the same block. However, selecting the best blocking keys to divide the data is a hard task, and in most cases, it's done by a domain expert. In this paper, a novel unsupervised approach for an automatic blocking key selection is proposed. This approach is based on the recently proposed meta-heuristic bald eagles search (bes) optimization algorithm, where the problem is treated as a feature selection case. The obtained results from experiments on real-world datasets showed the efficiency of the proposition where the BES for feature selection outperformed existed approaches in the literature and returned the best blocking keys.


2019 ◽  
pp. 389
Author(s):  
زينب عبدالأمير ◽  
علياء كريم عبدالحسن

2021 ◽  
Vol 1881 (2) ◽  
pp. 022080
Author(s):  
Zhiqiang Wu ◽  
Lizong Zhang ◽  
Gang Yu ◽  
Ying Wang ◽  
Tao Huang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document