An Efficient KNN Query Processing Approach on High-Dimensional Data Objects in P2P Systems

Author(s):  
Baiyou Qiao ◽  
Linlin Ding ◽  
Xiaoyang Wang ◽  
Yong Wei
2013 ◽  
Vol 411-414 ◽  
pp. 366-369
Author(s):  
Wei Zhang ◽  
Han Hu Wang ◽  
Hui Li

In high-dimensional index, overflow split has been verified to be critical to the performance of kNN query processing. A node is split into two parts in the traditional way, however, these methods tend to result in many overlap regions in high-dimensional spaces which will significantly degrade the performance of retrieval. In this paper, we propose a method named KSR-Tree, it making use of a clustering based split algorithm to divides the node into multiple parts, and most of overlap regions will guarantee to be placed into the same node. This approach not only increased the capacity for newly arrived records, but also decreases the splitting overhead and reduces the overlap regions, thus the frequency of node splitting will reduced and meanwhile the retrieval performance obtains improvement. In the experiments, our results showed that the performance KSR-Tree significantly improved the performance of kNN query processing.


2015 ◽  
Vol 322 ◽  
pp. 1-19 ◽  
Author(s):  
Chunyang Ma ◽  
Yongluan Zhou ◽  
Lidan Shou ◽  
Gang Chen

2020 ◽  
Vol 8 (5) ◽  
pp. 4044-4049

Clustering is one of the most significant ideas in data mining. It is an unsupervised learning model. Clustering technique in handling high dimensional data is more complex due to intrinsic sparsity nature of high dimensional data. Though, existing methods to reduce immaterial clusters were based on spectral clustering algorithm and graph-based learning algorithm, whose lack of sparsity and polynomial time complexity compromises their efficiency when applied to sparse high dimensional data. This paper concentrates to cluster the sparsely distributed high dimensional data objects. Fuzzy Relational Scattered Distance Based Clustering (FRSDBC) method is developed with three models such as Geometric Median Based Fuzzy model, Scattered Distance measure model, Grid based clustered sparse data representation model. Geometric Median Based Fuzzy model calculates the geometric median of similar sparse data and then the non similar sparse data objects to fitting the relational fuzziness across data points. It involves in the subspace reduction of data objects. Scattered Distance measure model is used to measure the distance between the inner and outer object. Grid based clustering is used to calculate the area of the cluster in FRSDBC method. The main idea of the FRSDBC method is to clustering data points over sparsely distributed data within limited processing time. The Clustering Time, Clustering Accuracy and Space Complexity of each method is analyzed. The result of the FRSDBC method is compared with other techniques, the results obtained are more accurate, easy to understand and the clustering time was substantially low in FRSDBC method. It is widely used in many practical applications such as weather forecast, share trading, medical data analysis and aerial data analysis.


2008 ◽  
Vol 2 (3) ◽  
pp. 234-247 ◽  
Author(s):  
Mei Li ◽  
Wang-Chien Lee ◽  
Anand Sivasubramaniam ◽  
Jizhong Zhao

2018 ◽  
Vol 30 (10) ◽  
pp. 1838-1851 ◽  
Author(s):  
Mingjie Tang ◽  
Yongyang Yu ◽  
Walid G. Aref ◽  
Qutaibah M. Malluhi ◽  
Mourad Ouzzani

Sign in / Sign up

Export Citation Format

Share Document