A new k-nearest neighbor density-based clustering method and its application to hyperspectral images

Author(s):  
Claude Cariou ◽  
Kacem Chehdi
Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2213
Author(s):  
Ahyeong Lee ◽  
Saetbyeol Park ◽  
Jinyoung Yoo ◽  
Jungsook Kang ◽  
Jongguk Lim ◽  
...  

Biofilms formed on the surface of agro-food processing facilities can cause food poisoning by providing an environment in which bacteria can be cultured. Therefore, hygiene management through initial detection is important. This study aimed to assess the feasibility of detecting Escherichia coli (E. coli) and Salmonella typhimurium (S. typhimurium) on the surface of food processing facilities by using fluorescence hyperspectral imaging. E. coli and S. typhimurium were cultured on high-density polyethylene and stainless steel coupons, which are the main materials used in food processing facilities. We obtained fluorescence hyperspectral images for the range of 420–730 nm by emitting UV light from a 365 nm UV light source. The images were used to perform discriminant analyses (linear discriminant analysis, k-nearest neighbor analysis, and partial-least squares discriminant analysis) to identify and classify coupons on which bacteria could be cultured. The discriminant performances of specificity and sensitivity for E. coli (1–4 log CFU·cm−2) and S. typhimurium (1–6 log CFU·cm−2) were over 90% for most machine learning models used, and the highest performances were generally obtained from the k-nearest neighbor (k-NN) model. The application of the learning model to the hyperspectral image confirmed that the biofilm detection was well performed. This result indicates the possibility of rapidly inspecting biofilms using fluorescence hyperspectral images.


2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.


2018 ◽  
Vol 74 ◽  
pp. 1-14 ◽  
Author(s):  
Yikun Qin ◽  
Zhu Liang Yu ◽  
Chang-Dong Wang ◽  
Zhenghui Gu ◽  
Yuanqing Li

2005 ◽  
Vol 02 (02) ◽  
pp. 167-180
Author(s):  
SEUNG-JOON OH ◽  
JAE-YEARN KIM

Clustering of sequences is relatively less explored but it is becoming increasingly important in data mining applications such as web usage mining and bioinformatics. The web user segmentation problem uses web access log files to partition a set of users into clusters such that users within one cluster are more similar to one another than to the users in other clusters. Similarly, grouping protein sequences that share a similar structure can help to identify sequences with similar functions. However, few clustering algorithms consider sequentiality. In this paper, we study how to cluster sequence datasets. Due to the high computational complexity of hierarchical clustering algorithms for clustering large datasets, a new clustering method is required. Therefore, we propose a new scalable clustering method using sampling and a k-nearest-neighbor method. Using a splice dataset and a synthetic dataset, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional algorithms.


2020 ◽  
Vol 12 (22) ◽  
pp. 3745
Author(s):  
Claude Cariou ◽  
Steven Le Moan ◽  
Kacem Chehdi

We investigated nearest-neighbor density-based clustering for hyperspectral image analysis. Four existing techniques were considered that rely on a K-nearest neighbor (KNN) graph to estimate local density and to propagate labels through algorithm-specific labeling decisions. We first improved two of these techniques, a KNN variant of the density peaks clustering method dpc, and a weighted-mode variant of knnclust, so the four methods use the same input KNN graph and only differ by their labeling rules. We propose two regularization schemes for hyperspectral image analysis: (i) a graph regularization based on mutual nearest neighbors (MNN) prior to clustering to improve cluster discovery in high dimensions; (ii) a spatial regularization to account for correlation between neighboring pixels. We demonstrate the relevance of the proposed methods on synthetic data and hyperspectral images, and show they achieve superior overall performances in most cases, outperforming the state-of-the-art methods by up to 20% in kappa index on real hyperspectral images.


2021 ◽  
Vol 11 (19) ◽  
pp. 9124
Author(s):  
Hongzhe Jiang ◽  
Liancheng Ye ◽  
Xingpeng Li ◽  
Minghong Shi

Chinese walnuts have extraordinary nutritional and organoleptic qualities, and counterfeit Chinese walnut products are pervasive in the market. The aim of this study was to investigate the feasibility of hyperspectral imaging (HSI) technique to accurately identify and visualize Chinese walnut varieties. Hyperspectral images of 400 Chinese walnuts including 200 samples of Ningguo variety and 200 samples of Lin’an variety were acquired in range of 400–1000 nm. Spectra were extracted from representative regions of interest (ROIs), and principal component analysis (PCA) of spectra showed that the characteristic second principal component (PC2) was potentially effective in variety identification. The PC transformation was also conducted to hyperspectral images to make an exploratory visualization according to pixel-wise PC scores. Three different modeling methods including partial least squares-discriminant analysis (PLS-DA), k-nearest neighbor (KNN), and support vector machine (SVM) were individually employed to develop classification models. Results indicated that raw full spectra constructed PLS-DA model performed best with correct classification rates (CCRs) of 97.33%, 95.33%, and 92.00% in calibration, cross-validation, and prediction sets, respectively. Successful projects algorithm (SPA), competitive adaptive reweighted sampling (CARS), and PC loadings were individually used for effective wavelengths selection. Subsequently, simplified PLS-DA model based on wavelengths selected by CARS yielded the best 96.33%, 95.67% and 91.00% CCRs in the three sets. This optimal CARS-PLS-DA model acquired a sensitivity of 93.62%, a specificity of 88.68%, the area under the receiver operating characteristic curve (AUC) value of 0.91, and Kappa coefficient of 0.82 in prediction set. Classification maps were finally generated by classifying the varieties of each pixel in multispectral images at CARS-selected wavelengths, and the general variety was then readily discernible. These results demonstrated that features extracted from HSI had outstanding ability, and could be applied as a reliable tool for the further development of an on-line identification system for Chinese walnut variety.


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


Sign in / Sign up

Export Citation Format

Share Document