Navigation Decision Support: Discover of Vessel Traffic Anomaly According to the Historic Marine Data

During the last years, marine traffic dramatically increases. Marine traffic safety highly depends on the mariner’s decisions and particular situations. The watch officer must continuously observe the marine traffic for anomalies because the anomaly detection is crucial to predict dangerous situations and to make a decision in time for safe marine navigation. In this paper, we present marine traffic anomaly detection by the combination of the DBSCAN clustering algorithm (Density- Based Spatial Clustering of Applications with Noise) with k-nearest neighbors analysis among the clusters and particular vessels. The clustering algorithm is applied to the historic marine traffic data – a set of vessel turn points. In our experiments, the total number of turn points was about 3 million, and about 160 megabytes of computer store was used. A formal numerical criterion to com-pare anomaly with normal traffic flow case has been proposed. It gives us a possibility to detect the vessels outside the typical traffic pattern. The proposed meth-od ensures the right decisions in different oceanic scale or hydro meteorology conditions in the detection of anomaly situation of the vessel.

Download Full-text

Research on Anomaly Detection Method Based on DBSCAN Clustering Algorithm

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) ◽

10.1109/isctt51595.2020.00083 ◽

2020 ◽

Author(s):

Dingsheng Deng

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Detection Method ◽

Dbscan Clustering

Download Full-text

DBSCANI: Noise-Resistant Method for Missing Value Imputation

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0172 ◽

2016 ◽

Vol 25 (3) ◽

pp. 431-440 ◽

Cited By ~ 1

Author(s):

Archana Purwar ◽

Sandeep Kumar Singh

Keyword(s):

Spatial Data ◽

Missing Values ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Data Sets ◽

Quality Of Data ◽

Data Set ◽

Dbscan Clustering ◽

Density Based Clustering

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.

Download Full-text

Multivariate weather anomaly detection using DBSCAN clustering algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1869/1/012077 ◽

2021 ◽

Vol 1869 (1) ◽

pp. 012077

Author(s):

S Wibisono ◽

M T Anwar ◽

A Supriyanto ◽

I H A Amin

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Dbscan Clustering

Download Full-text

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering

The Electronic Library ◽

10.1108/el-12-2018-0232 ◽

2019 ◽

Vol 37 (2) ◽

pp. 225-239 ◽

Cited By ~ 1

Author(s):

Hongqi Han ◽

Yongsheng Yu ◽

Lijun Wang ◽

Xiaorui Zhai ◽

Yaxin Ran ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

The United States ◽

Comparison Function ◽

Binary Number ◽

Content Type ◽

Dbscan Clustering ◽

String Comparison ◽

And Storage

PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.

Download Full-text

Dimensional Reduction of Data for Anomaly Detection and Speed Performance using PCA and DBSCAN

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1041.1291s219 ◽

2019 ◽

Vol 9 (1S2) ◽

pp. 39-41

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Test Results ◽

High Complexity ◽

Medical Sciences ◽

Data Set ◽

Network Intrusion ◽

Speed Performance ◽

Data Points

Anomaly detection is the major problem facing by many of industries. It includes network intrusion and medical sciences. Several fields like Astronomy and research also facing difficulties in finding effective anomaly detection. They have included several techniques to solve such problems. Clustering is the technique which has been employed by many of the researchers. The most commonly used algorithm to perform clustering is DBSCAN. It is well known clustering algorithm used in data mining and Machine learning. It is referred as Density based spatial clustering of application with noise. Because of its high complexity in computation, it must be decreased in terms of dimensionality of data points. PCA is a method used then to reduce dimensionality and produced a new data set which is again undergo DBSCAN. Here by the nature of the test results was precise there by such a methodology can be adjusted. The mix of PCA and DBSCAN was acutely confirmed and resultant examination shows that a speedup of 25% was improved while the quality was 80% diminishing the dimensionality of informational index of half.

Download Full-text

Massively scalable density based clustering (DBSCAN) on the HPCC systems big data platform

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp207-214 ◽

2021 ◽

Vol 10 (1) ◽

pp. 207

Author(s):

Yatish H. R. ◽

Shubham Milind Phal ◽

Tanmay Sanjay Hukkeri ◽

Lili Xu ◽

Shobha G ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Computation Time ◽

Large Data ◽

Single Node ◽

Data Set ◽

Traffic Pattern ◽

Density Based Clustering ◽

Data Points ◽

Hpcc Systems

<span id="docs-internal-guid-919b015d-7fff-56da-f81d-8f032097bce2"><span>Dealing with large samples of unlabeled data is a key challenge in today’s world, especially in applications such as traffic pattern analysis and disaster management. DBSCAN, or density based spatial clustering of applications with noise, is a well-known density-based clustering algorithm. Its key strengths lie in its capability to detect outliers and handle arbitrarily shaped clusters. However, the algorithm, being fundamentally sequential in nature, proves expensive and time consuming when operated on extensively large data chunks. This paper thus presents a novel implementation of a parallel and distributed DBSCAN algorithm on the HPCC Systems platform. The algorithm seeks to fully parallelize the algorithm implementation by making use of HPCC Systems optimal distributed architecture and performing a tree-based union to merge local clusters. The proposed approach* was tested both on synthetic as well as standard datasets (MFCCs Data Set) and found to be completely accurate. Additionally, when compared against a single node setup, a significant decrease in computation time was observed with no impact to accuracy. The parallelized algorithm performed eight times better for higher number of data points and takes exponentially lesser time as the number of data points increases.</span></span>

Download Full-text

GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211922 ◽

2021 ◽

pp. 1-13

Author(s):

Li Yihong ◽

Wang Yunpeng ◽

Li Tao ◽

Lan Xiaolong ◽

Song Han

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Adjusted Rand Index ◽

K Nearest Neighbors ◽

Normalized Mutual Information ◽

Core Samples ◽

Real World Datasets

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.

Download Full-text

A Quantification Method for Supraharmonic Emissions Based on Outlier Detection Algorithms

Energies ◽

10.3390/en14196404 ◽

2021 ◽

Vol 14 (19) ◽

pp. 6404

Author(s):

Hui Zhou ◽

Zesen Gui ◽

Jiang Zhang ◽

Qun Zhou ◽

Xueshan Liu ◽

...

Keyword(s):

Outlier Detection ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Skewed Distribution ◽

Distribution Model ◽

Distribution Data ◽

Original Spectrum ◽

Detection Algorithms ◽

Quantification Method ◽

Dbscan Clustering

Based on outlier detection algorithms, a feasible quantification method for supraharmonic emission signals is presented. It is designed to tackle the requirements of high-resolution and low data volume simultaneously in the frequency domain. The proposed method was developed from the skewed distribution data model and the self-tuning parameters of density-based spatial clustering of applications with noise (DBSCAN) algorithm. Specifically, the data distribution of the supraharmonic band was analyzed first by the Jarque–Bera test. The threshold was determined based on the distribution model to filter out noise. Subsequently, the DBSCAN clustering algorithm parameters were adjusted automatically, according to the k-dist curve slope variation and the dichotomy parameter seeking algorithm, followed by the clustering. The supraharmonic emission points were analyzed as outliers. Finally, simulated and experimental data were applied to verify the effectiveness of the proposed method. On the basis of the detection results, a spectrum with the same resolution as the original spectrum was obtained. The amount of data declined by more than three orders of magnitude compared to the original spectrum. The presented method will benefit the analysis of quantification for the amplitude and frequency of supraharmonic emissions.

Download Full-text

A Modified DBSCAN Algorithm for Anomaly Detection in Time-series Data with Seasonality

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/3 ◽

2022 ◽

Vol 19 (1) ◽

Author(s):

Praphula Jain ◽

Mani Shankar Bajpai ◽

Rajendra Pamula

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Clustering Algorithm ◽

Time Series Data ◽

Spatial Clustering ◽

Series Data ◽

Practical Applications ◽

Dbscan Algorithm ◽

Local Anomalies ◽

Seasonal Data

Anomaly detection concerns identifying anomalous observations or patterns that are a deviation from the dataset's expected behaviour. The detection of anomalies has significant and practical applications in several industrial domains such as public health, finance, Information Technology (IT), security, medical, energy, and climate studies. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Algorithm is a density-based clustering algorithm with the capability of identifying anomalous data. In this paper, a modified DBSCAN algorithm is proposed for anomaly detection in time-series data with seasonality. For experimental evaluation, a monthly temperature dataset was employed and the analysis set forth the advantages of the modified DBSCAN over the standard DBSCAN algorithm for the seasonal datasets. From the result analysis, we may conclude that DBSCAN is used for finding the anomalies in a dataset but fails to find local anomalies in seasonal data. The proposed Modified DBSCAN approach helps to find both the global and local anomalies from the seasonal data. Using normal DBSCAN we are able to get 19 (2.16%) anomaly points. While using the modified approach for DBSCAN we are able to get 42 (4.79%) anomaly points. In comparison we can say that we are able to get 2.11% more anomalies using the modified DBSCAN approach. Hence, the proposed Modified DBSCAN algorithm outperforms in comparison with the DBSCAN algorithm to find local anomalies.

Download Full-text

Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph

Complexity ◽

10.1155/2020/1731075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Qi Diao ◽

Yaping Dai ◽

Qichao An ◽

Weixing Li ◽

Xiaoxue Feng ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Local Density ◽

Search Algorithm ◽

Real Data ◽

Nearest Neighbors ◽

Adjusted Rand Index ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Density Peaks

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).

Download Full-text