Highway Event Detection Algorithm Based on Improved Fast Peak Clustering

Aiming at the mining of traffic events based on large amounts of highway data, this paper proposes an improved fast peak clustering algorithm to process highway toll data. The highway toll data are first analyzed, and a data cleaning method based on the sum of similar coefficients is proposed to process the original data. Next, to avoid the shortcomings of the excessive subjectivity of the original algorithm, an improved fast peak clustering algorithm is proposed. Finally, the improved algorithm is applied to highway traffic condition analysis and abnormal event mining to obtain more accurate and intuitive clustering results. Compared with two classical algorithms, namely, the k-means and density-based spatial clustering of applications with noise (DBSCAN) algorithms, as well as the unimproved original fast peak clustering algorithm, the proposed algorithm is faster and more accurate and can reveal the complex relationships among massive data more efficiently. During the process of reforming the toll system, the algorithm can automatically and more efficiently analyze massive toll data and detect abnormal events, thereby providing a theoretical basis and data support for the operation monitoring and maintenance of highways.

Download Full-text

Toward Global Earthquake Early Warning with the MyShake Smartphone Seismic Network, Part 1: Simulation Platform and Detection Algorithm

Seismological Research Letters ◽

10.1785/0220190177 ◽

2020 ◽

Vol 91 (4) ◽

pp. 2206-2217

Author(s):

Qingkai Kong ◽

Robert Martin-Short ◽

Richard M. Allen

Keyword(s):

Early Warning ◽

Large Scale ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Detection Algorithm ◽

Seismic Network ◽

Earthquake Early Warning ◽

Simulation Platform ◽

Simulation Performance ◽

The World

Abstract The MyShake project aims to build a global smartphone seismic network to facilitate large-scale earthquake early warning and other applications by leveraging the power of crowdsourcing. The MyShake mobile application first detects earthquake shaking on a single phone. The earthquake is then confirmed on the MyShake servers using a “network detection” algorithm that is activated by multiple single-phone detections. In this part one of the two article series, we present a simulation platform and a network detection algorithm to test earthquake scenarios at various locations around the world. The proposed network detection algorithm is built on the classic density-based spatial clustering of applications with noise spatial clustering algorithm, with modifications to take temporal characteristics into account and the association of new triggers. We test our network detection algorithm using real data recorded by MyShake users during the 4 January 2018 M 4.4 Berkeley and the 10 June 2016 M 5.2 Borrego Springs earthquakes to demonstrate the system’s utility. In order to test the entire detection procedure and to understand the first order performance of MyShake in various locations around the world representing different population and tectonic characteristics, we then present a software platform that can simulate earthquake triggers in hypothetical MyShake networks. Part two of this paper series explores our MyShake early warning simulation performance in selected regions around the world.

Download Full-text

Anomaly Detection Algorithm Based on CFSFDP

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2020.p0453 ◽

2020 ◽

Vol 24 (4) ◽

pp. 453-460

Author(s):

Weiwu Ren ◽

Jianfei Zhang ◽

Xiaoqiang Di ◽

Yinan Lu ◽

Bochen Zhang ◽

...

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Data Storage ◽

Clustering Algorithm ◽

Detection Algorithm ◽

Original Algorithm ◽

Redundant Data ◽

Data Points ◽

Behavior Profiles ◽

Improved Algorithm

Clustering by fast search and find of density peak (CFSFDP) is a simple and crisp density-clustering algorithm. The original algorithm is not suitable for direct application to anomaly detection. Its clustering results have a high level of redundant density information. If used directly as behavior profiles, the computation and storage costs of anomaly detection are high. Therefore, an improved algorithm based on CFSFDP is proposed for anomaly detection. The improved algorithm uses a few data points and their radius to support behavior profiles, and deletes the redundant data points without supporting profiles. This method not only reduces the large amount of data storage and distance calculation in the process of generating profiles, but also reduces the search space of profiles in the detection process. Numerous experiments show that the improved algorithm generates profiles faster than density-based spatial clustering of application with noise (DBSCAN), and has better profile precision than adaptive real-time anomaly detection with incremental clustering (ADWICE). The improved algorithm inherits the arbitrary shape clusters of CFSFDP, and improves the storage and computation performance. Compared with DBSCAN and ADWICE, the improved anomaly-detection algorithm based on CFSFDP has more balanced detection precision and real-time performance.

Download Full-text

A DBSCAN based Algorithm for Ship Spot Area Detection in AIS Trajectory Data

MATEC Web of Conferences ◽

10.1051/matecconf/201929101008 ◽

2019 ◽

Vol 291 ◽

pp. 01008 ◽

Cited By ~ 1

Author(s):

Bao Lei

Keyword(s):

Clustering Algorithm ◽

Original Data ◽

Detection Algorithm ◽

Traffic Information ◽

Data Sets ◽

Trajectory Data ◽

Location Data ◽

Spot Area ◽

Maritime Traffic ◽

Sample Data

The big data acquired by AIS system contains abundant maritime traffic information. With the wide application of data mining in various fields in recent years, the mining on AIS data has draw attention of related researchers. Based on the ship AIS location data, this paper studies the relevant spot area detection algorithm. Firstly, the sample data are pre-processed from the original data, and the residence point of each ship is identified according to the ship speed and course change. Then a DBSCAN based clustering algorithm is used to cluster several latitude and longitude lattice, that is spot areas. The experiments on real AIS data sets shows that the algorithm is efficient and correct.

Download Full-text

Big data outlier detection model based on improved density peak algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189456 ◽

2020 ◽

pp. 1-10

Author(s):

Mengliang Shao ◽

Deyu Qi ◽

Huili Xue

Keyword(s):

Big Data ◽

Fault Detection ◽

Outlier Detection ◽

Clustering Algorithm ◽

Detection Algorithm ◽

Load Curve ◽

Density Peak ◽

Original Algorithm ◽

Detection Model ◽

Density Peak Clustering

Outlier detection is an important branch of data mining. This paper proposes an advanced fast density peak outlier detection algorithm based on the characteristics of big data. The algorithm is an outlier detection method based on the improved density peak clustering algorithm. This paper improves the original algorithm. From the perspective of outlier detection, although it is a clustering idea, it avoids the clustering process, reduces the time complexity of the cluster-based outlier detection algorithm, and absorbs. The outlier detection based on neighbors is not sensitive to data dimensions and other advantages. In the power industry, outlier detection can be used in areas such as grid fault detection, equipment fault detection, and power abnormality detection. The simulation experiment of outlier detection based on the daily load curve of single and multiple transformers in a certain province shows that the improved algorithm can effectively detect outliers in the data.

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Community Detection Based on Graph Representation Learning in Evolutionary Networks

Applied Sciences ◽

10.3390/app11104497 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4497

Author(s):

Dongming Chen ◽

Mingshuo Nie ◽

Jie Wang ◽

Yun Kong ◽

Dongqi Wang ◽

...

Keyword(s):

Community Detection ◽

Network Structure ◽

Clustering Algorithm ◽

Laplacian Matrix ◽

Representation Learning ◽

Detection Algorithm ◽

Graph Representation ◽

Time Slice ◽

Current Time ◽

Evolutionary Networks

Aiming at analyzing the temporal structures in evolutionary networks, we propose a community detection algorithm based on graph representation learning. The proposed algorithm employs a Laplacian matrix to obtain the node relationship information of the directly connected edges of the network structure at the previous time slice, the deep sparse autoencoder learns to represent the network structure under the current time slice, and the K-means clustering algorithm is used to partition the low-dimensional feature matrix of the network structure under the current time slice into communities. Experiments on three real datasets show that the proposed algorithm outperformed the baselines regarding effectiveness and feasibility.

Download Full-text

Tree-ART2 Learning Model for Spatial Clustering in Second Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1934 ◽

2014 ◽

Vol 543-547 ◽

pp. 1934-1938

Author(s):

Ming Xiao

Keyword(s):

Network Model ◽

Spatial Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Adaptive Resonance Theory ◽

Spatial Distance ◽

Resonance Theory ◽

Adaptive Resonance ◽

Vector Module

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.

Download Full-text

Event Mining Through Clustering

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0025 ◽

2014 ◽

Vol 23 (1) ◽

pp. 59-73

Author(s):

E. Umamaheswari ◽

T.V. Geetha

Keyword(s):

Clustering Algorithm ◽

Semantic Representation ◽

Clustering Algorithms ◽

Event Semantics ◽

Information Retrieval Evaluation ◽

Event Mining ◽

Unique Word ◽

Event Clustering ◽

Cluster Efficiency ◽

Single Cluster

AbstractTraditional document clustering algorithms consider text-based features such as unique word count, concept count, etc. to cluster documents. Meanwhile, event mining is the extraction of specific events, their related sub-events, and the associated semantic relations from documents. This work discusses an approach to event mining through clustering. The Universal Networking Language (UNL)-based subgraph, a semantic representation of the document, is used as the input for clustering. Our research focuses on exploring the use of three different feature sets for event clustering and comparing the approaches used for specific event mining. In our previous work, the clustering algorithm used UNL-based event semantics to represent event context for clustering. However, this approach resulted in different events with similar semantics being clustered together. Hence, instead of considering only UNL event semantics, we considered assigning additional weights to similarity between event contexts with event-related attributes such as time, place, and persons. Although we get specific events in a single cluster, sub-events related to the specific events are not necessarily in a single cluster. Therefore, to improve our cluster efficiency, connective terms between two sentences and their representation as UNL subgraphs were also considered for similarity determination. By combining UNL semantics, event-specific arguments similarity, and connective term concepts between sentences, we were able to obtain clusters for specific events and their sub-events. We have used 112 000 Tamil documents from the Forum for Information Retrieval Evaluation data corpus and achieved good results. We have also compared our approach with the previous state-of-the-art approach for Router-RCV1 corpus and achieved 30% improvements in precision.

Download Full-text

Density-based adaptive spatial clustering algorithm for identifying local high-density areas in georeferenced documents

2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2014.6973959 ◽

2014 ◽

Cited By ~ 7

Author(s):

Tatsuhiro Sakai ◽

Keiichi Tamura ◽

Hajime Kitakami

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

High Density

Download Full-text

Aerosol Plume Detection Algorithm Based on Image Segmentation of Scanning Atmospheric Lidar Data

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-15-0125.1 ◽

2016 ◽

Vol 33 (4) ◽

pp. 697-712 ◽

Cited By ~ 4

Author(s):

R. Andrew Weekley ◽

R. Kent Goodrich ◽

Larry B. Cornman

Keyword(s):

Coordinate System ◽

Original Data ◽

Detection Algorithm ◽

Polar Coordinate System ◽

Lidar Data ◽

Image Processing Algorithm ◽

Polar Coordinate ◽

Scanning Lidar ◽

Performance Statistics ◽

Background Fields

AbstractAn image-processing algorithm has been developed to identify aerosol plumes in scanning lidar backscatter data. The images in this case consist of lidar data in a polar coordinate system. Each full lidar scan is taken as a fixed image in time, and sequences of such scans are considered functions of time. The data are analyzed in both the original backscatter polar coordinate system and a lagged coordinate system. The lagged coordinate system is a scatterplot of two datasets, such as subregions taken from the same lidar scan (spatial delay), or two sequential scans in time (time delay). The lagged coordinate system processing allows for finding and classifying clusters of data. The classification step is important in determining which clusters are valid aerosol plumes and which are from artifacts such as noise, hard targets, or background fields. These cluster classification techniques have skill since both local and global properties are used. Furthermore, more information is available since both the original data and the lag data are used. Performance statistics are presented for a limited set of data processed by the algorithm, where results from the algorithm were compared to subjective truth data identified by a human.

Download Full-text