Distributed Mining of Outliers from Large Multi-Dimensional Databases
A data point is given dataset is considered to be outlier when it is not distant to all its nearest neighbours. Obviously it is based on distance measure. However, in distributed environments it is challenging to detect outliers. Many approaches to mine outliers such environments came into existence. However, a faster and more efficient way is desired. In this paper we employ a novel index tree which is hierarchical in nature. Its hierarchical structure paves way for space pruning while its clustering property helps in faster search of finding neighbours of a given data point. Its time complexity is linear to the size of dataset and its dimensions. On top of the hierarchical tree (Hi-tree) nearest neighbour search avoids unnecessary computations besides pruning unpromising points. An algorithm by name Distributed Mining of Outliers using Hi-tree (DMOH) is proposed. The index tree can be exploited with parallel processing phenomenon. We built a prototype application to demonstrate proof of the concept. Our empirical study revealed the efficiency of the proposed algorithm on top of Hi-tree.