Similarity Search Problem Research on Multi-dimensional Data Sets

In this paper we propose a new heuristic algorithm for solving a maximum clique search problem (MCP). While the proposed algorithm (called TrustCLQ) uses a general approach to solving MCP, it is almost independent of the order of vertices and does not exploit a partition of the graph into independent sets. The algorithm was tested on DIMACS library graphs which are often employed for testing MCP solution algorithms. TrustCLQ algorithm was compared with the well-known ILS heuristic algorithm (as well as with a standard algorithm from networkx library) on DIMACS data sets. Moreover, TrustCLQ algorithm has been tested on Facebook social graphs

Download Full-text

A filter method for the weighted local similarity search problem

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/3-540-63220-4_60 ◽

1997 ◽

pp. 191-205

Author(s):

Enno Ohlebusch

Keyword(s):

Similarity Search ◽

Filter Method ◽

Search Problem ◽

Local Similarity

Download Full-text

Finding Similar Documents Using Frequent Pattern Mining Methods

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488519500041 ◽

2019 ◽

Vol 27 (01) ◽

pp. 73-96 ◽

Cited By ~ 1

Author(s):

Mohammad Karim Sohrabi ◽

Hossein Azgomi

Keyword(s):

Similarity Search ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Search Problem ◽

High Quality ◽

Massive Datasets ◽

Dynamic Selection ◽

Support Threshold

Various problems are just rising with regard to mining in massive datasets, among which finding similar documents can be pinpointed. The Shingling method converts this problem to a set-based problem. Some of existing methods have used min-hashing to compress the results already driven from the shingling method and then have exploited LSH method to find candidate pairs for similarity search from all pairs of documents. In this paper, an apriori-based method is proposed for finding similar documents based on frequent itemset mining approach. To this end, the apriori algorithm is modified and is customized for similarity search problem. Modeling the similarity search problem as a frequent pattern mining problem, using a modified version of apriori, and dynamic selection the minimum support threshold are the most important advantages of the proposed method, which lead to its appropriate execution time and high quality results. The proposed method finds similar documents in less time than the combined method and MCVM method because it generates fewer candidate pairs for finding similar documents. Furthermore, experimental results show the high quality of the answers of the proposed methods.

Download Full-text

Ordering-Based Causal Discovery with Reinforcement Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/491 ◽

2021 ◽

Author(s):

Xiaoqiang Wang ◽

Yali Du ◽

Shengyu Zhu ◽

Liangjun Ke ◽

Zhitang Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Directed Graphs ◽

Real Data ◽

Causal Discovery ◽

Small Scale ◽

Search Problem ◽

Data Sets ◽

Proposed Model ◽

Markov Decision ◽

Improved Performance

It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.

Download Full-text

RSSMSO Rapid Similarity Search on Metric Space Object Stored in Cloud Environment

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2016070103 ◽

2016 ◽

Vol 6 (3) ◽

pp. 33-49 ◽

Cited By ~ 2

Author(s):

Raghavendra S. ◽

Nithyashree K. ◽

Geeta C.M. ◽

Rajkumar Buyya ◽

Venugopal K. R. ◽

...

Keyword(s):

Service Provider ◽

Similarity Search ◽

Low Cost ◽

Real Data ◽

Data Transformation ◽

Third Party ◽

Data Sets ◽

Outsourced Data ◽

Search Service ◽

Confidential Data

This paper involves a cloud computing environment in which the dataowner outsource the similarity search service to a third party service provider. Privacy of the outsourced data is important because they may be confidential data. The data should be made available to the authorized client groups, but not to be revealed to the service provider in which the data is stored. Given this scenario, the paper presents a technique called RSSMSO which has build phase, query phase, data transformation and search phase. The build phase and the query phase are about uploading the data and querying the data respectively; the data transformation phase transforms the data before submitting it to the service provider for similarity queries on the transformed data; search phase involves searching similar object with respect to query object. The RSSMSO technique provides enhanced query accuracy with low communication cost. Experiments have been carried out on real data sets which exhibits that the proposed work is capable of providing privacy and achieving accuracy at a low cost in comparison with FDH

Download Full-text