A DBSCAN based Algorithm for Ship Spot Area Detection in AIS Trajectory Data

The big data acquired by AIS system contains abundant maritime traffic information. With the wide application of data mining in various fields in recent years, the mining on AIS data has draw attention of related researchers. Based on the ship AIS location data, this paper studies the relevant spot area detection algorithm. Firstly, the sample data are pre-processed from the original data, and the residence point of each ship is identified according to the ship speed and course change. Then a DBSCAN based clustering algorithm is used to cluster several latitude and longitude lattice, that is spot areas. The experiments on real AIS data sets shows that the algorithm is efficient and correct.

Download Full-text

Pedestrian detection algorithm based on improved muti-scale feature fusion

Journal of Physics Conference Series ◽

10.1088/1742-6596/2078/1/012008 ◽

2021 ◽

Vol 2078 (1) ◽

pp. 012008

Author(s):

Hui Liu ◽

Keyang Cheng

Keyword(s):

Clustering Algorithm ◽

Feature Fusion ◽

Pedestrian Detection ◽

Detection Algorithm ◽

Data Sets ◽

False Detection ◽

Scale Feature ◽

Multi Scale ◽

Dilated Convolution ◽

Small Targets

Abstract Aiming at the problem of false detection and missed detection of small targets and occluded targets in the process of pedestrian detection, a pedestrian detection algorithm based on improved multi-scale feature fusion is proposed. First, for the YOLOv4 multi-scale feature fusion module PANet, which does not consider the interaction relationship between scales, PANet is improved to reduce the semantic gap between scales, and the attention mechanism is introduced to learn the importance of different layers to strengthen feature fusion; then, dilated convolution is introduced. Dilated convolution reduces the problem of information loss during the downsampling process; finally, the K-means clustering algorithm is used to redesign the anchor box and modify the loss function to detect a single category. The experimental results show that the improved pedestrian detection algorithm in the INRIA and WiderPerson data sets under different congestion conditions, the AP reaches 96.83% and 59.67%, respectively. Compared with the pedestrian detection results of the YOLOv4 model, the algorithm improves by 2.41% and 1.03%, respectively. The problem of false detection and missed detection of small targets and occlusion has been significantly improved.

Download Full-text

An efficient trajectory-clustering algorithm based on an index tree

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331211423284 ◽

2011 ◽

Vol 34 (7) ◽

pp. 850-861 ◽

Cited By ~ 15

Author(s):

Guan Yuan ◽

Shixiong Xia ◽

Lei Zhang ◽

Yong Zhou ◽

Cheng Ji

Keyword(s):

Radio Frequency Identification ◽

Clustering Algorithm ◽

Real Data ◽

Structural Similarity ◽

Location Based Services ◽

Similarity Function ◽

Data Sets ◽

Trajectory Clustering ◽

Trajectory Data ◽

Index Tree

With the development of location-based services, such as the Global Positioning System and Radio Frequency Identification, a great deal of trajectory data can be collected. Therefore, how to mine knowledge from these data has become an attractive topic. In this paper, we propose an efficient trajectory-clustering algorithm based on an index tree. Firstly, an index tree is proposed to store trajectories and their similarity matrix, with which trajectories can be retrieved efficiently; secondly, a new conception of trajectory structure is introduced to analyse both the internal and external features of trajectories; then, trajectories are partitioned into trajectory segments according to their corners; furthermore, the similarity between every trajectory segment pairs is compared by presenting the structural similarity function; finally, trajectory segments are grouped into different clusters according to their location in the different levels of the index tree. Experimental results on real data sets demonstrate not only the efficiency and effectiveness of our algorithm, but also the great flexibility that feature sensitivity can be adjusted by different parameters, and the cluster results are more practically significant.

Download Full-text

An Analysis of Differential Privacy Research in Location and Trajectory Data

10.21203/rs.3.rs-94765/v1 ◽

2020 ◽

Author(s):

Fatima Zahra Errounda ◽

Yan Liu

Keyword(s):

Location Privacy ◽

Differential Privacy ◽

State Of The Art ◽

Original Data ◽

Privacy Preserving ◽

The State ◽

Trajectory Data ◽

Powerful Technique ◽

Location Data ◽

Single User

Abstract Location and trajectory data are routinely collected to generate valuable knowledge about users' pattern behavior. However, releasing location data may jeopardize the privacy of the involved individuals. Differential privacy is a powerful technique that prevents an adversary from inferring the presence or absence of an individual in the original data solely based on the observed data. The first challenge in applying differential privacy in location is that a it usually involves a single user. This shifts the adversary's target to the user's locations instead of presence or absence in the original data. The second challenge is that the inherent correlation between location data, due to people's movement regularity and predictability, gives the adversary an advantage in inferring information about individuals. In this paper, we review the differentially private approaches to tackle these challenges. Our goal is to help newcomers to the field to better understand the state-of-the art by providing a research map that highlights the different challenges in designing differentially private frameworks that tackle the characteristics of location data. We find that in protecting an individual's location privacy, the attention of differential privacy mechanisms shifts to preventing the adversary from inferring the original location based on the observed one. Moreover, we find that the privacy-preserving mechanisms make use of the predictability and regularity of users' movements to design and protect the users' privacy in trajectory data. Finally, we explore how well the presented frameworks succeed in protecting users' locations and trajectories against well-known privacy attacks.

Download Full-text

An Automatic Extraction Method of Coach Operation Information from Historical Trajectory Data

Journal of Advanced Transportation ◽

10.1155/2019/3634942 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15 ◽

Cited By ~ 2

Author(s):

Jun Li ◽

Qingqi Li ◽

Yan Zhu ◽

Yan Ma ◽

Yubin Xu ◽

...

Keyword(s):

Clustering Algorithm ◽

Information Service ◽

Economic Cost ◽

Site Investigation ◽

Road Transport ◽

Traffic Information ◽

Trajectory Data ◽

Dbscan Clustering ◽

Dense Point ◽

Historical Trajectory

Quality of travel service for road transport relies heavily on richness of transport operation data. Currently, most types of data including coach operation data are collected by manual investigation which is time-consuming and labor-intensive, and this significantly hinders the realization of intelligent traffic information service. In view of the above problems, this paper is aimed at introducing a method of automatically extracting coach operation information using historical GPS trajectory data of massive coaches. The method first analyzes trajectory characteristics of coaches within stations and identifies the highly dense point clusters as coach stations using the DBSCAN clustering algorithm. Then the schedule information is obtained by conducting error adjustment on the actual arrival and departure time series of multiple shifts, and the name of coach station is queried from point of interest (POI) and geographical name database provided by online map. Finally, the regular driving route of coaches is extracted by an incremental trajectory merging method. The proposed method is applied in handling historical trajectory data in the Beijing-Tianjin-Hebei region in China, and experimental results show that the extraction accuracy is 84% and verify its effectiveness and feasibility. The proposed method makes use of data mining techniques to extract coach operation information from big trajectory data and saves a lot of labor work, time, and economic cost required by on-site investigation.

Download Full-text

An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

Discrete Dynamics in Nature and Society ◽

10.1155/2015/793010 ◽

2015 ◽

Vol 2015 ◽

pp. 1-18 ◽

Cited By ~ 7

Author(s):

Dawen Xia ◽

Binfeng Wang ◽

Yantao Li ◽

Zhuobo Rong ◽

Zili Zhang

Keyword(s):

Intelligent Transportation Systems ◽

Large Scale ◽

Clustering Algorithm ◽

Transportation Systems ◽

Division Problem ◽

Data Sets ◽

Trajectory Data ◽

Computing Platform ◽

Distributed Computing Platform ◽

Parallel Clustering

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-PhaseK-Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy ofK-Means and then employ a MapReduce paradigm to redesign the optimizedK-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared withK-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Download Full-text

Highway Event Detection Algorithm Based on Improved Fast Peak Clustering

Mathematical Problems in Engineering ◽

10.1155/2021/7318216 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Lili Pei ◽

Zhaoyun Sun ◽

Yuxi Han ◽

Wei Li ◽

Huaixin Zhao

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Original Data ◽

Detection Algorithm ◽

Massive Data ◽

Highway Traffic ◽

Traffic Condition ◽

Original Algorithm ◽

Event Mining ◽

Complex Relationships

Aiming at the mining of traffic events based on large amounts of highway data, this paper proposes an improved fast peak clustering algorithm to process highway toll data. The highway toll data are first analyzed, and a data cleaning method based on the sum of similar coefficients is proposed to process the original data. Next, to avoid the shortcomings of the excessive subjectivity of the original algorithm, an improved fast peak clustering algorithm is proposed. Finally, the improved algorithm is applied to highway traffic condition analysis and abnormal event mining to obtain more accurate and intuitive clustering results. Compared with two classical algorithms, namely, the k-means and density-based spatial clustering of applications with noise (DBSCAN) algorithms, as well as the unimproved original fast peak clustering algorithm, the proposed algorithm is faster and more accurate and can reveal the complex relationships among massive data more efficiently. During the process of reforming the toll system, the algorithm can automatically and more efficiently analyze massive toll data and detect abnormal events, thereby providing a theoretical basis and data support for the operation monitoring and maintenance of highways.

Download Full-text

Important Location Identification and Personal Location Inference Based on Mobile Subscriber Location Data Preparation of Camera-Ready Contributions to SCITEPRESS Proceedings

MATEC Web of Conferences ◽

10.1051/matecconf/201817303086 ◽

2018 ◽

Vol 173 ◽

pp. 03086 ◽

Cited By ~ 1

Author(s):

Zhen Yang ◽

Wang Hong-jun

Keyword(s):

Language Processing ◽

Clustering Algorithm ◽

Smart Cities ◽

Mobile Terminal ◽

Trajectory Data ◽

Density Peak ◽

Data Set ◽

Location Data ◽

Or Groups ◽

Density Peak Clustering

As an emerging spatial trajectory data, mobile terminal location data can be widely used to analyze the behavior characteristics and interests of individuals or groups in smart cities, transportation planning and other civil fields. It can also be used to track suspects in anti-terrorism security and public opinion management. Aiming at the problem that it is difficult to determine suitable input parameters of clustering caused by different subscriber location data size and distribution difference, an improved density peak clustering algorithm is proposed and the performance of the improved algorithm is verified on the UCI data set. Firstly the important location is identified by the proposed algorithm, and the personal location is further inferred by the algorithm based on the subscriber's schedule and maximum cluster. Then, the algorithm adopts Google's inverse geocoding technology to obtain the semantic names corresponding to the coordinate points, and introduces the natural language processing technology to achieve word frequency statistics and keyword extraction. The simulation results based on the Geolife data set show that the algorithm is feasible for identifying important locations and inferring personal locations.

Download Full-text

Vessel Crowd Movement Pattern Mining for Maritime Traffic Management

LOGI – Scientific Journal on Transport and Logistics ◽

10.2478/logi-2019-0020 ◽

2019 ◽

Vol 10 (2) ◽

pp. 105-115

Author(s):

Rong Wen ◽

Wenjing Yan

Keyword(s):

Decision Making ◽

Traffic Management ◽

Clustering Algorithm ◽

Pattern Mining ◽

Movement Pattern ◽

Temporal Data Mining ◽

Trajectory Data ◽

Maritime Traffic ◽

Travel Behaviors ◽

Crowd Movement

Abstract The goal of maritime traffic management is to provide a safe and efficient maritime environment for different type of vessels facilitating port logistics and supply chain business. However, current maritime traffic management mainly relies on the massive individual vessel’s data for decision making. Lack of macro-level understanding of vessel crowd movement around port challenges maritime safety and traffic efficiency. In this paper, we describe a spatio-temporal data mining method to discover crowd movement patterns of vessels from their short-term history data. The method first captures vessels’ crowd movement features by building vessels’ tracklets with their speed and location. A movement vector clustering algorithm is developed to find different travel behaviors for different group of vessels. With nonparametric regression on the classified vessel movement vectors which represent the crowd travel behaviors, an overall vessel movement pattern can then be discovered. In this research, we tested real trajectory data of vessels near Singapore ports. Comparing with the actual massive vessel movement data, we found that this method was able to extract vessels’ crowd movement information. The hotspots on risk area in terms of vessel traffic and speed can be identified. The method can be used to provide decision-making support for maritime traffic management.

Download Full-text

Design of intelligent acquisition system for moving object trajectory data under cloud computing

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0152 ◽

2021 ◽

Vol 30 (1) ◽

pp. 763-773

Author(s):

Yang Zhang ◽

Abhinav Asthana ◽

Sudeep Asthana ◽

Shaweta Khanna ◽

Ioan-Cosmin Mihai

Keyword(s):

Cloud Computing ◽

Clustering Algorithm ◽

Programming Model ◽

Hot Spot ◽

Moving Object ◽

Trajectory Clustering ◽

Trajectory Data ◽

Spot Area ◽

Time Period ◽

Moving Object Trajectory

Abstract In order to study the intelligent collection system of moving object trajectory data under cloud computing, information useful to passengers and taxi drivers is collected from massive trajectory data. This paper uses cloud computing technology, through clustering algorithm and density-based DBSCAN algorithm combined with Map Reduce programming model and design trajectory clustering algorithm. The results show that based on the 8-day data of 15,000 taxis in Shenzhen, the characteristic time period is determined. The passenger hot spot area is obtained by clustering the passenger load points in each time period, which verifies the feasibility of the passenger load point recommendation application based on trajectory clustering. Therefore, in the absence of holidays, the number of passenger hotspots tends to be stable. It is reliable to perform cluster analysis. The recommended application has been demonstrated through experiments, and the implementation results show the rationality of the recommended application design and the feasibility of practice.

Download Full-text

RECENT RESULTS IN HIERARCHICAL CLUSTERING: I–THE REDUCIBLE NEIGHBORHOODS CLUSTERING ALGORITHM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001493000285 ◽

1993 ◽

Vol 07 (03) ◽

pp. 541-571 ◽

Cited By ~ 5

Author(s):

MICHEL BRUYNOOGHE

Keyword(s):

Hierarchical Clustering ◽

Speech Processing ◽

Clustering Algorithm ◽

Large Data ◽

Original Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Hierarchical Clustering Algorithm ◽

Better Than

The clustering of large data sets is of great interest in fields such as pattern recognition, numerical taxonomy, image or speech processing. The traditional Ascendant Hierarchical Algorithm (AHC) cannot be run for sets of more than a few thousand elements. The reducible neighborhoods clustering algorithm, which is presented in this paper, has overtaken the limits of the traditional hierarchical clustering algorithm by generating an exact hierarchy on a large data set. The theoretical justification of this algorithm is the so-called Bruynooghe reducibility principle, that lays down the condition under which the exact hierarchy may be constructed locally, by carrying out aggregations in restricted regions of the representation space. As for the Day and Edelsbrunner algorithm, the maximum theoretical time complexity of the reducible neighborhoods clustering algorithm is O(n2 log n), regardless of the chosen clustering strategy. But the reducible neighborhoods clustering algorithm uses the original data table and its practical performances are by far better than Day and Edelsbrunner’s algorithm, thus allowing the hierarchical clustering of large data sets, i.e. composed of more than 10 000 objects.

Download Full-text