scalable graph indexing
Recently Published Documents


TOTAL DOCUMENTS

1
(FIVE YEARS 1)

H-INDEX

0
(FIVE YEARS 0)

2020 ◽  
Vol 11 (3) ◽  
pp. 1-19
Author(s):  
Santhosh Kumar D. K. ◽  
Demain Antony DMello

Information extraction and analysis from the enormous graph data is expanding rapidly. From the survey, it is observed that 80% of researchers spend more than 40% of their project time in data cleaning. This signifies a huge need for data cleaning. Due to the characteristics of big data, the storage and retrieval is another major concern and is addressed by data indexing. The existing data cleaning techniques try to clean the graph data based on information like structural attributes and event log sequences. The cleaning of graph data on a single piece of information alone will not increase the performance of computation. Along with node, the label can also be inconsistent, so it is highly desirable to clean both to improve the performance. This paper addresses aforesaid issue by proposing graph data cleaning algorithm to detect the unstructured information along with inconsistent labeling and clean the data by applying rules and verify based on data inconsistency. The authors propose an indexing algorithm based on CSS-tree to build an efficient and scalable graph indexing on top of Hadoop.


Sign in / Sign up

Export Citation Format

Share Document