SNN_flow: a shared nearest-neighbor-based clustering method for inhomogeneous origin-destination flows

Author(s):  
Qiliang Liu ◽  
Jie Yang ◽  
Min Deng ◽  
Ci Song ◽  
Wenkai Liu
2019 ◽  
Vol 11 (3) ◽  
pp. 350 ◽  
Author(s):  
Qiang Li ◽  
Qi Wang ◽  
Xuelong Li

A hyperspectral image (HSI) has many bands, which leads to high correlation between adjacent bands, so it is necessary to find representative subsets before further analysis. To address this issue, band selection is considered as an effective approach that removes redundant bands for HSI. Recently, many band selection methods have been proposed, but the majority of them have extremely poor accuracy in a small number of bands and require multiple iterations, which does not meet the purpose of band selection. Therefore, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection, claiming the following contributions: (1) the local density of each band is obtained by shared nearest neighbor, which can more accurately reflect the local distribution characteristics; (2) in order to acquire a band subset containing a large amount of information, the information entropy is taken as one of the weight factors; (3) a method for automatically selecting the optimal band subset is designed by the slope change. The experimental results reveal that compared with other methods, the proposed method has competitive computational time and the selected bands achieve higher overall classification accuracy on different data sets, especially when the number of bands is small.


2018 ◽  
Vol 74 ◽  
pp. 1-14 ◽  
Author(s):  
Yikun Qin ◽  
Zhu Liang Yu ◽  
Chang-Dong Wang ◽  
Zhenghui Gu ◽  
Yuanqing Li

Author(s):  
V. Suresh Babu ◽  
P. Viswanath ◽  
Narasimha M. Murty

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.


2021 ◽  
pp. 487-494
Author(s):  
Yufang Zhang ◽  
Yongfang Zha ◽  
Lintao Li ◽  
Zhongyang Xiong

Sign in / Sign up

Export Citation Format

Share Document