An Efficient KNN Query Processing Approach on High-Dimensional Data Objects in P2P Systems

In high-dimensional index, overflow split has been verified to be critical to the performance of kNN query processing. A node is split into two parts in the traditional way, however, these methods tend to result in many overlap regions in high-dimensional spaces which will significantly degrade the performance of retrieval. In this paper, we propose a method named KSR-Tree, it making use of a clustering based split algorithm to divides the node into multiple parts, and most of overlap regions will guarantee to be placed into the same node. This approach not only increased the capacity for newly arrived records, but also decreases the splitting overhead and reduces the overlap regions, thus the frequency of node splitting will reduced and meanwhile the retrieval performance obtains improvement. In the experiments, our results showed that the performance KSR-Tree significantly improved the performance of kNN query processing.

Download Full-text

PROM: Efficient matching query processing on high-dimensional data

Information Sciences ◽

10.1016/j.ins.2015.05.005 ◽

2015 ◽

Vol 322 ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Chunyang Ma ◽

Yongluan Zhou ◽

Lidan Shou ◽

Gang Chen

Keyword(s):

Query Processing ◽

High Dimensional Data ◽

High Dimensional ◽

Efficient Matching

Download Full-text

An Efficient Approach for Query Processing of Incomplete High Dimensional Data Streams

Advances in Intelligent Systems and Computing - Advanced Machine Learning Technologies and Applications ◽

10.1007/978-3-030-69717-4_57 ◽

2021 ◽

pp. 602-612

Author(s):

Fatma M. Najib ◽

Rasha M. Ismail ◽

Nagwa L. Badr ◽

Tarek F. Gharib

Keyword(s):

Query Processing ◽

Data Streams ◽

High Dimensional Data ◽

High Dimensional ◽

Efficient Approach

Download Full-text

An efficient algorithm for hyperspherical range query processing in high-dimensional data space

Information Processing Letters ◽

10.1016/s0020-0190(01)00318-0 ◽

2002 ◽

Vol 83 (2) ◽

pp. 115-123

Author(s):

Dong-Ho Lee ◽

Shin Heu ◽

Hyoung-Joo Kim

Keyword(s):

Query Processing ◽

Efficient Algorithm ◽

High Dimensional Data ◽

Range Query ◽

High Dimensional ◽

Data Space ◽

Range Query Processing

Download Full-text

Fuzzy Relational Scattered Distance Based Clustering Method for Sparsely Distributed High Dimensional Data Objects

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6633.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 4044-4049

Keyword(s):

Data Analysis ◽

Distance Measure ◽

High Dimensional Data ◽

Fuzzy Model ◽

Sparse Data ◽

High Dimensional ◽

Measure Model ◽

Data Points ◽

Data Objects ◽

Grid Based

Clustering is one of the most significant ideas in data mining. It is an unsupervised learning model. Clustering technique in handling high dimensional data is more complex due to intrinsic sparsity nature of high dimensional data. Though, existing methods to reduce immaterial clusters were based on spectral clustering algorithm and graph-based learning algorithm, whose lack of sparsity and polynomial time complexity compromises their efficiency when applied to sparse high dimensional data. This paper concentrates to cluster the sparsely distributed high dimensional data objects. Fuzzy Relational Scattered Distance Based Clustering (FRSDBC) method is developed with three models such as Geometric Median Based Fuzzy model, Scattered Distance measure model, Grid based clustered sparse data representation model. Geometric Median Based Fuzzy model calculates the geometric median of similar sparse data and then the non similar sparse data objects to fitting the relational fuzziness across data points. It involves in the subspace reduction of data objects. Scattered Distance measure model is used to measure the distance between the inner and outer object. Grid based clustering is used to calculate the area of the cluster in FRSDBC method. The main idea of the FRSDBC method is to clustering data points over sparsely distributed data within limited processing time. The Clustering Time, Clustering Accuracy and Space Complexity of each method is analyzed. The result of the FRSDBC method is compared with other techniques, the results obtained are more accurate, easy to understand and the clustering time was substantially low in FRSDBC method. It is widely used in many practical applications such as weather forecast, share trading, medical data analysis and aerial data analysis.

Download Full-text