A k-Nearest Neighbor Centroid-Based Outlier Detection Method

Author(s):  
Xiaochun Wang ◽  
Xiali Wang ◽  
Mitch Wilkes
2012 ◽  
Vol 468-471 ◽  
pp. 2504-2509
Author(s):  
Qiang Da Yang ◽  
Zhen Quan Liu

The on-line estimation of some key hard-to-measure process variables by using soft-sensor technique has received extensive concern in industrial production process. The precision of on-line estimation is closely related to the accuracy of soft-sensor model, while the accuracy of soft-sensor model depends strongly on the accuracy of modeling data. Aiming at the special character of the definition for outliers in soft-sensor modeling process, an outlier detection method based on k-nearest neighbor (k-NN) is proposed in this paper. The proposed method can be realized conveniently from data without priori knowledge and assumption of the process. The simulation result and practical application show that the proposed outlier detection method based on k-NN has good detection effect and high application value.


Author(s):  
Rico Andrian ◽  
Saipul Anwar ◽  
Meizano Ardhi Muhammad ◽  
Akmal Junaidi

Lampung has the only breeding of in situ butterflies engineered in Indonesia namely Gita Persada Butterfly Park, which has approximately 211 butterfly species. Butterflies can be classified according to patterns found on the wings of a butterfly. The weakness of the human eye in distinguishing patterns on butterflies is a foundation in building butterfly identification based on pattern recognition. This study uses 6 species of butterflies: Papilio memnon, Troides helena, Papilio nephelus, Cethosia penthesilea, Papilio peranthus, and Pachliopta aristolochiae. The butterfly dataset used is 600 images. The butterfly image used is in the form of the upper wing side. The pre-processing stage uses the method of scaling, segmentation, and grayscale. The feature extraction stage uses the canny edge detection method by applying smoothing, edge strength, edge direction, non maximum suppression, and hyterisis thresholding. The classification phase uses the K-Nearest Neighbor (KNN) method with values k = 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23 obtained under the Rule of Thumb. The identification of butterfly require a classification time of 8 seconds. The highest accuracy is obtained from testing with a value of k = 5 by 80%.


2018 ◽  
Vol 8 (8) ◽  
pp. 1248 ◽  
Author(s):  
Haiqing Yao ◽  
Xiuwen Fu ◽  
Yongsheng Yang ◽  
Octavian Postolache

Outlier detection has attracted a wide range of attention for its broad applications, such as fault diagnosis and intrusion detection, among which the outlier analysis in data streams with high uncertainty and infinity is more challenging. Recent major work of outlier detection has focused on principle research of the local outlier factor, and there are few studies on incremental updating strategies, which are vital to outlier detection in data streams. In this paper, a novel incremental local outlier detection approach is introduced to dynamically evaluate the local outlier in the data stream. An extended local neighborhood consisting of k nearest neighbors, reverse nearest neighbors and shared nearest neighbors is estimated for each data. The theoretical evidence of algorithm complexity for the insertion of new data and deletion of old data in the composite neighborhood shows that the amount of affected data in the incremental calculation is finite. Finally, experiments performed on both synthetic and real datasets verify its scalability and outlier detection accuracy. All results show that the proposed approach has comparable performance with state-of-the-art k nearest neighbor-based methods.


Sign in / Sign up

Export Citation Format

Share Document