Evaluating the effect of compressing algorithms for trajectory similarity and classification problems

GeoInformatica ◽

10.1007/s10707-021-00434-1 ◽

2021 ◽

Author(s):

Antonios Makris ◽

Camila Leite da Silva ◽

Vania Bogorny ◽

Luis Otavio Alvares ◽

Jose Antonio Macedo ◽

...

Keyword(s):

Trajectory Analysis ◽

Similarity Measures ◽

Classification Problems ◽

Trajectory Data ◽

Compression Algorithms ◽

Time Ratio ◽

Ratio Speed ◽

Trajectory Similarity ◽

Real World Datasets ◽

The Impact

AbstractDuring the last few years the volumes of the data that synthesize trajectories have expanded to unparalleled quantities. This growth is challenging traditional trajectory analysis approaches and solutions are sought in other domains. In this work, we focus on data compression techniques with the intention to minimize the size of trajectory data, while, at the same time, minimizing the impact on the trajectory analysis methods. To this extent, we evaluate five lossy compression algorithms: Douglas-Peucker (DP), Time Ratio (TR), Speed Based (SP), Time Ratio Speed Based (TR_SP) and Speed Based Time Ratio (SP_TR). The comparison is performed using four distinct real world datasets against six different dynamically assigned thresholds. The effectiveness of the compression is evaluated using classification techniques and similarity measures. The results showed that there is a trade-off between the compression rate and the achieved quality. The is no “best algorithm” for every case and the choice of the proper compression algorithm is an application-dependent process.

Download Full-text

An Approach to Spatiotemporal Trajectory Clustering Based on Community Detection

Wireless Communications and Mobile Computing ◽

10.1155/2021/5582341 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Xin Wang ◽

Xinzheng Niu ◽

Jiahui Zhu ◽

Zuoyan Liu

Keyword(s):

Community Detection ◽

Trajectory Analysis ◽

Moving Objects ◽

Clustering Algorithms ◽

Similarity Measures ◽

Detection Algorithm ◽

Similarity Matrix ◽

Trajectory Data ◽

Trajectory Similarity ◽

Community Detection Algorithm

Nowadays, large volumes of multimodal data have been collected for analysis. An important type of data is trajectory data, which contains both time and space information. Trajectory analysis and clustering are essential to learn the pattern of moving objects. Computing trajectory similarity is a key aspect of trajectory analysis, but it is very time consuming. To address this issue, this paper presents an improved branch and bound strategy based on time slice segmentation, which reduces the time to obtain the similarity matrix by decreasing the number of distance calculations required to compute similarity. Then, the similarity matrix is transformed into a trajectory graph and a community detection algorithm is applied on it for clustering. Extensive experiments were done to compare the proposed algorithms with existing similarity measures and clustering algorithms. Results show that the proposed method can effectively mine the trajectory cluster information from the spatiotemporal trajectories.

Download Full-text

Multi-Aspect Embedding for Attribute-Aware Trajectories

Symmetry ◽

10.3390/sym11091149 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1149

Author(s):

Thapana Boonchoo ◽

Xiang Ao ◽

Qing He

Keyword(s):

Real World ◽

Execution Time ◽

State Of The Art ◽

Representation Learning ◽

Learning Approach ◽

Trajectory Data ◽

Trajectory Mining ◽

Trajectory Similarity ◽

Effectiveness And Efficiency ◽

Real World Datasets

Motivated by the proliferation of trajectory data produced by advanced GPS-enabled devices, trajectory is gaining in complexity and beginning to embroil additional attributes beyond simply the coordinates. As a consequence, this creates the potential to define the similarity between two attribute-aware trajectories. However, most existing trajectory similarity approaches focus only on location based proximities and fail to capture the semantic similarities encompassed by these additional asymmetric attributes (aspects) of trajectories. In this paper, we propose multi-aspect embedding for attribute-aware trajectories (MAEAT), a representation learning approach for trajectories that simultaneously models the similarities according to their multiple aspects. MAEAT is built upon a sentence embedding algorithm and directly learns whole trajectory embedding via predicting the context aspect tokens when given a trajectory. Two kinds of token generation methods are proposed to extract multiple aspects from the raw trajectories, and a regularization is devised to control the importance among aspects. Extensive experiments on the benchmark and real-world datasets show the effectiveness and efficiency of the proposed MAEAT compared to the state-of-the-art and baseline methods. The results of MAEAT can well support representative downstream trajectory mining and management tasks, and the algorithm outperforms other compared methods in execution time by at least two orders of magnitude.

Download Full-text

Identifying Human Mobility via Trajectory Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/234 ◽

2017 ◽

Cited By ~ 39

Author(s):

Qiang Gao ◽

Fan Zhou ◽

Kunpeng Zhang ◽

Goce Trajcevski ◽

Xucheng Luo ◽

...

Keyword(s):

Human Mobility ◽

Route Planning ◽

Classification Problem ◽

Personalized Recommendation ◽

Mobility Patterns ◽

Classification Problems ◽

Trajectory Data ◽

Trajectory Classification ◽

Spatio Temporal ◽

Real World Datasets

Understanding human trajectory patterns is an important task in many location based social networks (LBSNs) applications, such as personalized recommendation and preference-based route planning. Most of the existing methods classify a trajectory (or its segments) based on spatio-temporal values and activities, into some predefined categories, e.g., walking or jogging. We tackle a novel trajectory classification problem: we identify and link trajectories to users who generate them in the LBSNs, a problem called Trajectory-User Linking (TUL). Solving the TUL problem is not a trivial task because: (1) the number of the classes (i.e., users) is much larger than the number of motion patterns in the common trajectory classification problems; and (2) the location based trajectory data, especially the check-ins, are often extremely sparse. To address these challenges, a Recurrent Neural Networks (RNN) based semi-supervised learning model, called TULER (TUL via Embedding and RNN) is proposed, which exploits the spatio-temporal data to capture the underlying semantics of user mobility patterns. Experiments conducted on real-world datasets demonstrate that TULER achieves better accuracy than the existing methods.

Download Full-text

Trajectory Similarity Learning with Auxiliary Supervision and Optimal Matching

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/444 ◽

2020 ◽

Author(s):

Hanyuan Zhang ◽

Xinyu Zhang ◽

Qize Jiang ◽

Baihua Zheng ◽

Zhenbang Sun ◽

...

Keyword(s):

Real World ◽

Representation Learning ◽

Similarity Learning ◽

Trajectory Data ◽

Optimal Matching ◽

Training Samples ◽

Trajectory Similarity ◽

Similarity Computation ◽

Real World Datasets ◽

Relationship Of

Trajectory similarity computation is a core problem in the field of trajectory data queries. However, the high time complexity of calculating the trajectory similarity has always been a bottleneck in real-world applications. Learning-based methods can map trajectories into a uniform embedding space to calculate the similarity of two trajectories with embeddings in constant time. In this paper, we propose a novel trajectory representation learning framework Traj2SimVec that performs scalable and robust trajectory similarity computation. We use a simple and fast trajectory simplification and indexing approach to obtain triplet training samples efficiently. We make the framework more robust via taking full use of the sub-trajectory similarity information as auxiliary supervision. Furthermore, the framework supports the point matching query by modeling the optimal matching relationship of trajectory points under different distance metrics. The comprehensive experiments on real-world datasets demonstrate that our model substantially outperforms all existing approaches.

Download Full-text

A comparative analysis of trajectory similarity measures

GIScience & Remote Sensing ◽

10.1080/15481603.2021.1908927 ◽

2021 ◽

pp. 1-27

Author(s):

Yaguang Tao ◽

Alan Both ◽

Rodrigo I. Silveira ◽

Kevin Buchin ◽

Stef Sijben ◽

...

Keyword(s):

Comparative Analysis ◽

Similarity Measures ◽

Trajectory Similarity

Download Full-text

A Person-to-Person and Person-to-Place COVID-19 Contact Tracing System Based on OGC IndoorGML

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010002 ◽

2020 ◽

Vol 10 (1) ◽

pp. 2 ◽

Cited By ~ 1

Author(s):

Soroush Ojagh ◽

Sara Saeedi ◽

Steve H. L. Liang

Keyword(s):

Data Model ◽

Low Cost ◽

Large Body ◽

Contact Tracing ◽

Trajectory Data ◽

Sequential Order ◽

Human Contact ◽

Wide Availability ◽

Data Points ◽

The Impact

With the wide availability of low-cost proximity sensors, a large body of research focuses on digital person-to-person contact tracing applications that use proximity sensors. In most contact tracing applications, the impact of SARS-CoV-2 spread through touching contaminated surfaces in enclosed places is overlooked. This study is focused on tracing human contact within indoor places using the open OGC IndoorGML standard. This paper proposes a graph-based data model that considers the semantics of indoor locations, time, and users’ contexts in a hierarchical structure. The functionality of the proposed data model is evaluated for a COVID-19 contact tracing application with scalable system architecture. Indoor trajectory preprocessing is enabled by spatial topology to detect and remove semantically invalid real-world trajectory points. Results show that 91.18% percent of semantically invalid indoor trajectory data points are filtered out. Moreover, indoor trajectory data analysis is innovatively empowered by semantic user contexts (e.g., disinfecting activities) extracted from user profiles. In an enhanced contact tracing scenario, considering the disinfecting activities and sequential order of visiting common places outperformed contact tracing results by filtering out unnecessary potential contacts by 44.98 percent. However, the average execution time of person-to-place contact tracing is increased by 58.3%.

Download Full-text

Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare

Complex & Intelligent Systems ◽

10.1007/s40747-021-00435-5 ◽

2021 ◽

Author(s):

Shwet Ketu ◽

Pramod Kumar Mishra

Keyword(s):

Air Pollution ◽

Air Quality ◽

Class Imbalance ◽

Imbalanced Data ◽

Classification Algorithm ◽

Quality Data ◽

Pollution Level ◽

Classification Problems ◽

Chi Square ◽

The Impact

AbstractIn the last decade, we have seen drastic changes in the air pollution level, which has become a critical environmental issue. It should be handled carefully towards making the solutions for proficient healthcare. Reducing the impact of air pollution on human health is possible only if the data is correctly classified. In numerous classification problems, we are facing the class imbalance issue. Learning from imbalanced data is always a challenging task for researchers, and from time to time, possible solutions have been developed by researchers. In this paper, we are focused on dealing with the imbalanced class distribution in a way that the classification algorithm will not compromise its performance. The proposed algorithm is based on the concept of the adjusting kernel scaling (AKS) method to deal with the multi-class imbalanced dataset. The kernel function's selection has been evaluated with the help of weighting criteria and the chi-square test. All the experimental evaluation has been performed on sensor-based Indian Central Pollution Control Board (CPCB) dataset. The proposed algorithm with the highest accuracy of 99.66% wins the race among all the classification algorithms i.e. Adaboost (59.72%), Multi-Layer Perceptron (95.71%), GaussianNB (80.87%), and SVM (96.92). The results of the proposed algorithm are also better than the existing literature methods. It is also clear from these results that our proposed algorithm is efficient for dealing with class imbalance problems along with enhanced performance. Thus, accurate classification of air quality through our proposed algorithm will be useful for improving the existing preventive policies and will also help in enhancing the capabilities of effective emergency response in the worst pollution situation.

Download Full-text

A New Approach to Measuring the Similarity of Indoor Semantic Trajectories

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020090 ◽

2021 ◽

Vol 10 (2) ◽

pp. 90

Author(s):

Jin Zhu ◽

Dayu Cheng ◽

Weiwei Zhang ◽

Ci Song ◽

Jie Chen ◽

...

Keyword(s):

Similarity Measure ◽

Semantic Information ◽

Edit Distance ◽

Similarity Measures ◽

Indoor Positioning ◽

Synthetic Dataset ◽

Shopping Mall ◽

Indoor Space ◽

Trajectory Similarity ◽

Indoor Spaces

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.

Download Full-text

Trajectory Similarity Analysis with the Weight of Direction and k-Neighborhood for AIS Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110757 ◽

2021 ◽

Vol 10 (11) ◽

pp. 757

Author(s):

Pin Nie ◽

Zhenjie Chen ◽

Nan Xia ◽

Qiuhao Huang ◽

Feixue Li

Keyword(s):

Traffic Management ◽

Similarity Analysis ◽

Automatic Identification ◽

Identification System ◽

Trajectory Data ◽

Motion Direction ◽

Maritime Traffic ◽

A Cell ◽

Robustness To Noise ◽

Trajectory Similarity

Automatic Identification System (AIS) data have been widely used in many fields, such as collision detection, navigation, and maritime traffic management. Similarity analysis is an important process for most AIS trajectory analysis topics. However, most traditional AIS trajectory similarity analysis methods calculate the distance between trajectory points, which requires complex and time-consuming calculations, often leading to substantial errors when processing AIS trajectory data characterized by substantial differences in length or uneven trajectory points. Therefore, we propose a cell-based similarity analysis method that combines the weight of the direction and k-neighborhood (WDN-SIM). This method quantifies the similarity between trajectories based on the degree of proximity and differences in motion direction. In terms of its effectiveness and efficiency, WDN-SIM outperformed seven traditional methods for trajectory similarity analysis. Particularly, WDN-SIM has a high robustness to noise and can distinguish the similarities between trajectories under complex situations, such as when there are opposing directions of motion, large differences in length, and uneven point distributions.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text