Improving similarity join algorithms using vertical clustering techniques

Author(s):  
Lisa Tan ◽  
Farshad Fotouhi ◽  
William Grosky
Author(s):  
Leonardo Andrade Ribeiro ◽  
Alfredo Cuzzocrea ◽  
Karen Aline Alves Bezerra ◽  
Ben Hur Bahia do Nascimento

2021 ◽  
Vol 12 (3) ◽  
Author(s):  
Leonardo Andrade Ribeiro ◽  
Felipe Ferreira Borges ◽  
Diego Oliveira

We consider the problem of efficiently answering set similarity joins on multi-attribute data. Traditional set similarity join algorithms assume string data represented by a single set and, thus, miss the opportunity to exploit predicates over multiple attributes to reduce the number of similarity computations. In this article, we present a framework to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then instantiate this framework with a lightweight filtering technique based on a simple, yet effective data structure, for which exact and probabilistic implementations are evaluated. In this context, we devise a cost model to identify the best attribute ordering to reduce processing time. Moreover, alternative approaches are also investigated and a new algorithm combining key ideas from previous work is introduced. Finally, we present a thorough experimental evaluation, which demonstrates that our main proposal is efficient and significantly outperforms competing algorithms.


Author(s):  
Leonardo Andrade Ribeiro ◽  
Alfredo Cuzzocrea ◽  
Karen Aline Alves Bezerra ◽  
Ben Hur Bahia do Nascimento

Author(s):  
Leonardo Andrade Ribeiro ◽  
Alfredo Cuzzocrea ◽  
Karen Aline Alves Bezerra ◽  
Ben Hur Bahia do Nascimento

2017 ◽  
Vol 2 (1) ◽  
pp. 214-234
Author(s):  
Amer Al-Badarneh ◽  
◽  
Amnah Al-Abdi ◽  
Sana’a Al-Shboul ◽  
Hassan Najadat ◽  
...  

Author(s):  
Lisa Tan ◽  
Farshad Fotouhi ◽  
William Grosky ◽  
Horia F. Pop ◽  
Noureddine Mouaddib

2020 ◽  
Author(s):  
Andrea Giani ◽  
de Souza Patricia Borges ◽  
Stefania Bartoletti ◽  
Flavio Morselli ◽  
Andrea Conti ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document