GDC-a-CGI: efficient algorithms for dynamic graph data cleaning and indexing

In this article, we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations—hence, the name HashGraph. HashGraph introduces a new way to deal with hash-collisions that does not use “open-addressing” or “separate-chaining,” yet it has the benefits of both these approaches. HashGraph currently works for static inputs. Recent progress with dynamic graph data structures suggests that HashGraph might be extendable to dynamic inputs as well. We show that HashGraph can deal with a large number of hash values per entry without loss of performance. Last, we show a new querying algorithm for value lookups. We experimentally compare HashGraph to several state-of-the-art implementations and find that it outperforms them on average 2× when the inputs are unique and by as much as 40× when the input contains duplicates. The implementation of HashGraph in this article is for NVIDIA GPUs. HashGraph can build a hash table at a rate of 2.5 billion keys per second on a NVIDIA GV100 GPU and can query at nearly the same rate.

Download Full-text

Continuous matching of evolving patterns over dynamic graph data

World Wide Web ◽

10.1007/s11280-020-00860-5 ◽

2021 ◽

Author(s):

Qianzhen Zhang ◽

Deke Guo ◽

Xiang Zhao ◽

Xi Wang

Keyword(s):

Pattern Matching ◽

Wide Spectrum ◽

Dynamic Graph ◽

Matching Problem ◽

Graph Pattern Matching ◽

Graph Data ◽

Graph Pattern ◽

Matching Process ◽

Data Graph ◽

Pattern Graph

AbstractNowadays, the scale of various graphs soars rapidly, which imposes a serious challenge to develop processing and analytic algorithms. Among them, graph pattern matching is the one of the most primitive tasks that find a wide spectrum of applications, the performance of which is yet often affected by the size and dynamicity of graphs. In order to handle large dynamic graphs, incremental pattern matching is proposed to avoid re-computing matches of patterns over the entire data graph, hence reducing the matching time and improving the overall execution performance. Due to the complexity of the problem, little work has been reported so far to solve the problem, and most of them only solve the graph pattern matching problem under the scenario of the data graph varying alone. In this article, we are devoted to a more complicated but very practical graph pattern matching problem, continuous matching of evolving patterns over dynamic graph data, and the investigation presents a novel algorithm for continuously pattern matching along with changes of both pattern graph and data graph. Specifically, we propose a concise representation of partial matching solutions, which can help to avoid re-computing matches of the pattern and speed up subsequent matching process. In order to enable the updates of data graph and pattern graph, we propose an incremental maintenance strategy, to efficiently maintain the intermediate results. Moreover, we conceive an effective model for estimating step-wise cost of pattern evaluation to drive the matching process. Extensive experiments verify the superiority of .

Download Full-text

Efficient Algorithms for Cleaning and Indexing of Graph data

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2020070101 ◽

2020 ◽

Vol 11 (3) ◽

pp. 1-19

Author(s):

Santhosh Kumar D. K. ◽

Demain Antony DMello

Keyword(s):

Data Cleaning ◽

Graph Data ◽

Data Indexing ◽

Storage And Retrieval ◽

Graph Indexing ◽

Data Inconsistency ◽

Single Piece ◽

Scalable Graph Indexing ◽

Unstructured Information ◽

Existing Data

Information extraction and analysis from the enormous graph data is expanding rapidly. From the survey, it is observed that 80% of researchers spend more than 40% of their project time in data cleaning. This signifies a huge need for data cleaning. Due to the characteristics of big data, the storage and retrieval is another major concern and is addressed by data indexing. The existing data cleaning techniques try to clean the graph data based on information like structural attributes and event log sequences. The cleaning of graph data on a single piece of information alone will not increase the performance of computation. Along with node, the label can also be inconsistent, so it is highly desirable to clean both to improve the performance. This paper addresses aforesaid issue by proposing graph data cleaning algorithm to detect the unstructured information along with inconsistent labeling and clean the data by applying rules and verify based on data inconsistency. The authors propose an indexing algorithm based on CSS-tree to build an efficient and scalable graph indexing on top of Hadoop.

Download Full-text