scholarly journals Similarity Preserving Representation Learning for Time Series Clustering

Author(s):  
Qi Lei ◽  
Jinfeng Yi ◽  
Roman Vaculin ◽  
Lingfei Wu ◽  
Inderjit S. Dhillon

A considerable amount of clustering algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with various lengths to an instance-feature matrix. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation , thus the learned feature representation is particularly suitable for the time series clustering task. Given a set of $n$ time series, we first construct an $n\times n$ partially-observed similarity matrix by randomly sampling $\mathcal{O}(n \log n)$ pairs of time series and computing their pairwise similarities. We then propose an efficient algorithm that solves a non-convex and NP-hard problem to learn new features based on the partially-observed similarity matrix. By conducting extensive empirical studies, we demonstrate that the proposed framework is much more effective, efficient, and flexible compared to other state-of-the-art clustering methods.

2016 ◽  
Vol 50 (1) ◽  
pp. 41-57 ◽  
Author(s):  
Linghe Huang ◽  
Qinghua Zhu ◽  
Jia Tina Du ◽  
Baozhen Lee

Purpose – Wiki is a new form of information production and organization, which has become one of the most important knowledge resources. In recent years, with the increase of users in wikis, “free rider problem” has been serious. In order to motivate editors to contribute more to a wiki system, it is important to fully understand their contribution behavior. The purpose of this paper is to explore the law of dynamic contribution behavior of editors in wikis. Design/methodology/approach – After developing a dynamic model of contribution behavior, the authors employed both the metrological and clustering methods to process the time series data. The experimental data were collected from Baidu Baike, a renowned Chinese wiki system similar to Wikipedia. Findings – There are four categories of editors: “testers,” “dropouts,” “delayers” and “stickers.” Testers, who contribute the least content and stop contributing rapidly after editing a few articles. After editing a large amount of content, dropouts stop contributing completely. Delayers are the editors who do not stop contributing during the observation time, but they may stop contributing in the near future. Stickers, who keep contributing and edit the most content, are the core editors. In addition, there are significant time-of-day and holiday effects on the number of editors’ contributions. Originality/value – By using the method of time series analysis, some new characteristics of editors and editor types were found. Compared with the former studies, this research also had a larger sample. Therefore, the results are more scientific and representative and can help managers to better optimize the wiki systems and formulate incentive strategies for editors.


2014 ◽  
Vol 2014 ◽  
pp. 1-19 ◽  
Author(s):  
Seyedjamal Zolhavarieh ◽  
Saeed Aghabozorgi ◽  
Ying Wah Teh

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.


Author(s):  
Pēteris Grabusts ◽  
Arkady Borisov

Clustering Methodology for Time Series MiningA time series is a sequence of real data, representing the measurements of a real variable at time intervals. Time series analysis is a sufficiently well-known task; however, in recent years research has been carried out with the purpose to try to use clustering for the intentions of time series analysis. The main motivation for representing a time series in the form of clusters is to better represent the main characteristics of the data. The central goal of the present research paper was to investigate clustering methodology for time series data mining, to explore the facilities of time series similarity measures and to use them in the analysis of time series clustering results. More complicated similarity measures include Longest Common Subsequence method (LCSS). In this paper, two tasks have been completed. The first task was to define time series similarity measures. It has been established that LCSS method gives better results in the detection of time series similarity than the Euclidean distance. The second task was to explore the facilities of the classical k-means clustering algorithm in time series clustering. As a result of the experiment a conclusion has been drawn that the results of time series clustering with the help of k-means algorithm correspond to the results obtained with LCSS method, thus the clustering results of the specific time series are adequate.


2022 ◽  
Vol 3 (1) ◽  
pp. 1-26
Author(s):  
Omid Hajihassani ◽  
Omid Ardakanian ◽  
Hamzeh Khazaei

The abundance of data collected by sensors in Internet of Things devices and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this article, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.


Author(s):  
Nobuhiko TSUDA ◽  
Yukihiro HAMASUNA ◽  
Yasunori ENDO

2021 ◽  
Author(s):  
◽  
Ali Alqahtani

The use of deep learning has grown increasingly in recent years, thereby becoming a much-discussed topic across a diverse range of fields, especially in computer vision, text mining, and speech recognition. Deep learning methods have proven to be robust in representation learning and attained extraordinary achievement. Their success is primarily due to the ability of deep learning to discover and automatically learn feature representations by mapping input data into abstract and composite representations in a latent space. Deep learning’s ability to deal with high-level representations from data has inspired us to make use of learned representations, aiming to enhance unsupervised clustering and evaluate the characteristic strength of internal representations to compress and accelerate deep neural networks.Traditional clustering algorithms attain a limited performance as the dimensionality in-creases. Therefore, the ability to extract high-level representations provides beneficial components that can support such clustering algorithms. In this work, we first present DeepCluster, a clustering approach embedded in a deep convolutional auto-encoder. We introduce two clustering methods, namely DCAE-Kmeans and DCAE-GMM. The DeepCluster allows for data points to be grouped into their identical cluster, in the latent space, in a joint-cost function by simultaneously optimizing the clustering objective and the DCAE objective, producing stable representations, which is appropriate for the clustering process. Both qualitative and quantitative evaluations of proposed methods are reported, showing the efficiency of deep clustering on several public datasets in comparison to the previous state-of-the-art methods.Following this, we propose a new version of the DeepCluster model to include varying degrees of discriminative power. This introduces a mechanism which enables the imposition of regularization techniques and the involvement of a supervision component. The key idea of our approach is to distinguish the discriminatory power of numerous structures when searching for a compact structure to form robust clusters. The effectiveness of injecting various levels of discriminatory powers into the learning process is investigated alongside the exploration and analytical study of the discriminatory power obtained through the use of two discriminative attributes: data-driven discriminative attributes with the support of regularization techniques, and supervision discriminative attributes with the support of the supervision component. An evaluation is provided on four different datasets.The use of neural networks in various applications is accompanied by a dramatic increase in computational costs and memory requirements. Making use of the characteristic strength of learned representations, we propose an iterative pruning method that simultaneously identifies the critical neurons and prunes the model during training without involving any pre-training or fine-tuning procedures. We introduce a majority voting technique to compare the activation values among neurons and assign a voting score to evaluate their importance quantitatively. This mechanism effectively reduces model complexity by eliminating the less influential neurons and aims to determine a subset of the whole model that can represent the reference model with much fewer parameters within the training process. Empirically, we demonstrate that our pruning method is robust across various scenarios, including fully-connected networks (FCNs), sparsely-connected networks (SCNs), and Convolutional neural networks (CNNs), using two public datasets.Moreover, we also propose a novel framework to measure the importance of individual hidden units by computing a measure of relevance to identify the most critical filters and prune them to compress and accelerate CNNs. Unlike existing methods, we introduce the use of the activation of feature maps to detect valuable information and the essential semantic parts, with the aim of evaluating the importance of feature maps, inspired by novel neural network interpretability. A majority voting technique based on the degree of alignment between a se-mantic concept and individual hidden unit representations is utilized to evaluate feature maps’ importance quantitatively. We also propose a simple yet effective method to estimate new convolution kernels based on the remaining crucial channels to accomplish effective CNN compression. Experimental results show the effectiveness of our filter selection criteria, which outperforms the state-of-the-art baselines.To conclude, we present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a founding contribution to the area of applying deep clustering to time-series data by presenting the first case study in the context of movement behavior clustering utilizing the DeepCluster method. The results are promising, showing that the latent space encodes sufficient patterns to facilitate accurate clustering of movement behaviors. Finally, we identify state-of-the-art and present an outlook on this important field of DTSC from five important perspectives.


2019 ◽  
Author(s):  
Srishti Mishra ◽  
Zohair Shafi ◽  
Santanu Pathak

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Saeed Aghabozorgi ◽  
Teh Ying Wah ◽  
Tutut Herawan ◽  
Hamid A. Jalab ◽  
Mohammad Amin Shaygan ◽  
...  

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using thek-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.


Sign in / Sign up

Export Citation Format

Share Document