scholarly journals Weighted z-Distance-Based Clustering and Its Application to Time-Series Data

2019 ◽  
Vol 9 (24) ◽  
pp. 5469
Author(s):  
Zhao-Yu Wang ◽  
Chen-Yu Wu ◽  
Yan-Ting Lin ◽  
Shie-Jue Lee

Clustering is the practice of dividing given data into similar groups and is one of the most widely used methods for unsupervised learning. Lee and Ouyang proposed a self-constructing clustering (SCC) method in which the similarity threshold, instead of the number of clusters, is specified in advance by the user. For a given set of instances, SCC performs only one training cycle on those instances. Once an instance has been assigned to a cluster, the assignment will not be changed afterwards. The clusters produced may depend on the order in which the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications, e.g., time-series clustering. In this paper, improvements are proposed. Two or more training cycles on the instances are performed. An instance can be re-assigned to another cluster in each cycle. In this way, the clusters produced are less likely to be affected by the feeding order of the instances. Also, each dimension of the input can be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. A number of experiments with real-world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.


2014 ◽  
Vol 2014 ◽  
pp. 1-19 ◽  
Author(s):  
Seyedjamal Zolhavarieh ◽  
Saeed Aghabozorgi ◽  
Ying Wah Teh

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.



Author(s):  
Pēteris Grabusts ◽  
Arkady Borisov

Clustering Methodology for Time Series MiningA time series is a sequence of real data, representing the measurements of a real variable at time intervals. Time series analysis is a sufficiently well-known task; however, in recent years research has been carried out with the purpose to try to use clustering for the intentions of time series analysis. The main motivation for representing a time series in the form of clusters is to better represent the main characteristics of the data. The central goal of the present research paper was to investigate clustering methodology for time series data mining, to explore the facilities of time series similarity measures and to use them in the analysis of time series clustering results. More complicated similarity measures include Longest Common Subsequence method (LCSS). In this paper, two tasks have been completed. The first task was to define time series similarity measures. It has been established that LCSS method gives better results in the detection of time series similarity than the Euclidean distance. The second task was to explore the facilities of the classical k-means clustering algorithm in time series clustering. As a result of the experiment a conclusion has been drawn that the results of time series clustering with the help of k-means algorithm correspond to the results obtained with LCSS method, thus the clustering results of the specific time series are adequate.



2021 ◽  
Vol 5 (6) ◽  
pp. 840-854
Author(s):  
Jesmeen M. Z. H. ◽  
J. Hossen ◽  
Azlan Bin Abd. Aziz

Recent years have seen significant growth in the adoption of smart home devices. It involves a Smart Home System for better visualisation and analysis with time series. However, there are a few challenges faced by the system developers, such as data quality or data anomaly issues. These anomalies can be due to technical or non-technical faults. It is essential to detect the non-technical fault as it might incur economic cost. In this study, the main objective is to overcome the challenge of training learning models in the case of an unlabelled dataset. Another important consideration is to train the model to be able to discriminate abnormal consumption from seasonal-based consumption. This paper proposes a system using unsupervised learning for Time-Series data in the smart home environment. Initially, the model collected data from the real-time scenario. Following seasonal-based features are generated from the time-domain, followed by feature reduction technique PCA to 2-dimension data. This data then passed through four known unsupervised learning models and was evaluated using the Excess Mass and Mass-Volume method. The results concluded that LOF tends to outperform in the case of detecting anomalies in electricity consumption. The proposed model was further evaluated by benchmark anomaly dataset, and it was also proved that the system could work with the different fields containing time-series data. The model will cluster data into anomalies and not. The developed anomaly detector will detect all anomalies as soon as possible, triggering real alarms in real-time for time-series data's energy consumption. It has the capability to adapt to changing values automatically. Doi: 10.28991/esj-2021-01314 Full Text: PDF



2020 ◽  
Vol 34 (04) ◽  
pp. 4683-4690 ◽  
Author(s):  
Shuheng Li ◽  
Dezhi Hong ◽  
Hongning Wang

Smart Building Technologies hold promise for better livability for residents and lower energy footprints. Yet, the rollout of these technologies, from demand response controls to fault detection and diagnosis, significantly lags behind and is impeded by the current practice of manual identification of sensing point relationships, e.g., how equipment is connected or which sensors are co-located in the same space. This manual process is still error-prone, albeit costly and laborious.We study relation inference among sensor time series. Our key insight is that, as equipment is connected or sensors co-locate in the same physical environment, they are affected by the same real-world events, e.g., a fan turning on or a person entering the room, thus exhibiting correlated changes in their time series data. To this end, we develop a deep metric learning solution that first converts the primitive sensor time series to the frequency domain, and then optimizes a representation of sensors that encodes their relations. Built upon the learned representation, our solution pinpoints the relationships among sensors via solving a combinatorial optimization problem. Extensive experiments on real-world buildings demonstrate the effectiveness of our solution.



Kybernetes ◽  
2019 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hossein Abbasimehr ◽  
Mostafa Shabani

Purpose The purpose of this paper is to propose a new methodology that handles the issue of the dynamic behavior of customers over time. Design/methodology/approach A new methodology is presented based on time series clustering to extract dominant behavioral patterns of customers over time. This methodology is implemented using bank customers’ transactions data which are in the form of time series data. The data include the recency (R), frequency (F) and monetary (M) attributes of businesses that are using the point-of-sale (POS) data of a bank. This data were obtained from the data analysis department of the bank. Findings After carrying out an empirical study on the acquired transaction data of 2,531 business customers that are using POS devices of the bank, the dominant trends of behavior are discovered using the proposed methodology. The obtained trends were analyzed from the marketing viewpoint. Based on the analysis of the monetary attribute, customers were divided into four main segments, including high-value growing customers, middle-value growing customers, prone to churn and churners. For each resulted group of customers with a distinctive trend, effective and practical marketing recommendations were devised to improve the bank relationship with that group. The prone-to-churn segment contains most of the customers; therefore, the bank should conduct interesting promotions to retain this segment. Practical implications The discovered trends of customer behavior and proposed marketing recommendations can be helpful for banks in devising segment-specific marketing strategies as they illustrate the dynamic behavior of customers over time. The obtained trends are visualized so that they can be easily interpreted and used by banks. This paper contributes to the literature on customer relationship management (CRM) as the proposed methodology can be effectively applied to different businesses to reveal trends in customer behavior. Originality/value In the current business condition, customer behavior is changing continually over time and customers are churning due to the reduced switching costs. Therefore, choosing an effective customer segmentation methodology which can consider the dynamic behaviors of customers is essential for every business. This paper proposes a new methodology to capture customer dynamic behavior using time series clustering on time-ordered data. This is an improvement over previous studies, in which static segmentation approaches have often been adopted. To the best of the authors’ knowledge, this is the first study that combines the recency, frequency, and monetary model and time series clustering to reveal trends in customer behavior.



2016 ◽  
Vol 10 (04) ◽  
pp. 461-501 ◽  
Author(s):  
Om Prasad Patri ◽  
Anand V. Panangadan ◽  
Vikrambhai S. Sorathia ◽  
Viktor K. Prasanna

Detecting and responding to real-world events is an integral part of any enterprise or organization, but Semantic Computing has been largely underutilized for complex event processing (CEP) applications. A primary reason for this gap is the difference in the level of abstraction between the high-level semantic models for events and the low-level raw data values received from sensor data streams. In this work, we investigate the need for Semantic Computing in various aspects of CEP, and intend to bridge this gap by utilizing recent advances in time series analytics and machine learning. We build upon the Process-oriented Event Model, which provides a formal approach to model real-world objects and events, and specifies the process of moving from sensors to events. We extend this model to facilitate Semantic Computing and time series data mining directly over the sensor data, which provides the advantage of automatically learning the required background knowledge without domain expertise. We illustrate the expressive power of our model in case studies from diverse applications, with particular emphasis on non-intrusive load monitoring in smart energy grids. We also demonstrate that this powerful semantic representation is still highly accurate and performs at par with existing approaches for event detection and classification.



Author(s):  
Xiaosheng Li ◽  
Jessica Lin ◽  
Liang Zhao

With increasing powering of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires the data mining methods to have low time complexity to handle the huge and fast-changing data. This paper presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 85 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method is faster, and achieves better accuracy compared with other rival methods.



Author(s):  
Pasan Karunaratne ◽  
Masud Moshtaghi ◽  
Shanika Karunasekera ◽  
Aaron Harwood ◽  
Trevor Cohn

In time-series forecasting, regression is a popular method, with Gaussian Process Regression widely held to be the state of the art. The versatility of Gaussian Processes has led to them being used in many varied application domains. However, though many real-world applications involve data which follows a working-week structure, where weekends exhibit substantially different behavior to weekdays, methods for explicit modelling of working-week effects in Gaussian Process Regression models have not been proposed. Not explicitly modelling the working week fails to incorporate a significant source of information which can be invaluable in forecasting scenarios. In this work we provide novel kernel-combination methods to explicitly model working-week effects in time-series data for more accurate predictions using Gaussian Process Regression. Further, we demonstrate that prediction accuracy can be improved by constraining the non-convex optimization process of finding optimal hyperparameter values. We validate the effectiveness of our methods by performing multi-step prediction on two real-world publicly available time-series datasets - one relating to electricity Smart Meter data of the University of Melbourne, and the other relating to the counts of pedestrians in the City of Melbourne.



Author(s):  
Jason Chen

Clustering analysis is a tool used widely in the Data Mining community and beyond (Everitt et al. 2001). In essence, the method allows us to “summarise” the information in a large data set X by creating a very much smaller set C of representative points (called centroids) and a membership map relating each point in X to its representative in C. An obvious but special type of data set that one might want to cluster is a time series data set. Such data has a temporal ordering on its elements, in contrast to non-time series data sets. In this article we explore the area of time series clustering, focusing mainly on a surprising recent result showing that the traditional method for time series clustering is meaningless. We then survey the literature of recent papers and go on to argue how time series clustering can be made meaningful.



Sign in / Sign up

Export Citation Format

Share Document