Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959 ◽

2019 ◽

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

Subsequence Time Series Clustering

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch286 ◽

2011 ◽

pp. 1871-1876

Author(s):

Jason Chen

Keyword(s):

Time Series ◽

Traditional Method ◽

Time Series Data ◽

Large Data ◽

Series Data ◽

Data Sets ◽

Data Set ◽

Time Series Clustering ◽

Mining Community ◽

Representative Points

Clustering analysis is a tool used widely in the Data Mining community and beyond (Everitt et al. 2001). In essence, the method allows us to “summarise” the information in a large data set X by creating a very much smaller set C of representative points (called centroids) and a membership map relating each point in X to its representative in C. An obvious but special type of data set that one might want to cluster is a time series data set. Such data has a temporal ordering on its elements, in contrast to non-time series data sets. In this article we explore the area of time series clustering, focusing mainly on a surprising recent result showing that the traditional method for time series clustering is meaningless. We then survey the literature of recent papers and go on to argue how time series clustering can be made meaningful.

Download Full-text

A Review on Anomaly Detection in Time Series

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/571032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1895-1900

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Monitoring Network ◽

Series Data ◽

Data Sets ◽

Data Set ◽

The Past ◽

Machine Health Monitoring ◽

Machine Health

Time series is a very common class of data sets. Among others, it is very simple to obtain time series data from a variety of various science and finance applications and an anomaly detection technique for time series is becoming a very prominent research topic nowadays. Anomaly identification covers intrusion detection, detection of theft, mistake detection, machine health monitoring, network sensor event detection or habitat disturbance. It is also used for removing suspicious data from the data set before production. This review aims to provide a detailed and organized overview of the Anomaly detection investigation. In this article we will first define what an anomaly in time series is, and then describe quickly some of the methods suggested in the past two or three years for detection of anomaly in time series

Download Full-text

Case Studies in Multi-unit LongitudinalModels with Random Coefficients and Patterned Correlation Structure

Austrian Journal of Statistics ◽

10.17713/ajs.v29i2.503 ◽

2016 ◽

Vol 29 (2) ◽

pp. 93-110

Author(s):

Johannes Ledolter

Keyword(s):

Time Series ◽

Model Building ◽

Time Series Data ◽

Random Coefficients ◽

Correlation Structure ◽

Model Specification ◽

Series Data ◽

Data Sets ◽

Economic Time Series ◽

Data Set

Modelling issues in multi-unit longitudinal models with random coefficients and patterned correlation structure are illustrated in the context of three data sets. The first data set deals with short time series data on annual death rates and alcohol consumption of twenty-five European countries. The second data set deals with glaceologic time series data on snow temperature at 14 different locations within a small glacier in the Austrian Alps. The third data set consists of annual economic time series on factor productivity, anddomestic and foreign research/development (R&D) capital stocks. A practical model building approach–consisting of model specification, estimation, and diagnostic checking–is outlined in the context of these three data sets.

Download Full-text

The CH-IRP data set: a decade of fortnightly data on δ2H and δ18O in streamflow and precipitation in Switzerland

Earth System Science Data ◽

10.5194/essd-12-3057-2020 ◽

2020 ◽

Vol 12 (4) ◽

pp. 3057-3066

Author(s):

Maria Staudinger ◽

Stefan Seeger ◽

Barbara Herbstritt ◽

Michael Stoelzle ◽

Jan Seibert ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Isotope Composition ◽

Series Data ◽

Data Repository ◽

Data Sets ◽

Individual Study ◽

Data Set ◽

Flow Pathways

Abstract. The stable isotopes of oxygen and hydrogen, 18O and 2H, provide information on water flow pathways and hydrologic catchment functioning. Here a data set of time series data on precipitation and streamflow isotope composition in medium-sized Swiss catchments, CH-IRP, is presented that is unique in terms of its long-term multi-catchment coverage along an alpine to pre-alpine gradient. The data set comprises fortnightly time series of both δ2H and δ18O as well as deuterium excess from streamflow for 23 sites in Switzerland, together with summary statistics of the sampling at each station. Furthermore, time series of δ18O and δ2H in precipitation are provided for each catchment derived from interpolated data sets from the ISOT, GNIP and ANIP networks. For each station we compiled relevant metadata describing both the sampling conditions and catchment characteristics and climate information. Lab standards and errors are provided, and potentially problematic measurements are indicated to help the user decide on the applicability for individual study purposes. For the future, the measurements are planned to be continued at 14 stations as a long-term isotopic measurement network, and the CH-IRP data set will, thus, continuously be extended. The data set can be downloaded from data repository Zenodo at https://doi.org/10.5281/zenodo.4057967 (Staudinger et al., 2020).

Download Full-text

Time-series data dynamic density clustering

Intelligent Data Analysis ◽

10.3233/ida-205459 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1487-1506

Author(s):

Hao Chen ◽

Yu Xia ◽

Yuekai Pan ◽

Qing Yang

Keyword(s):

Time Series ◽

Time Series Data ◽

Cluster Structure ◽

Time Slice ◽

Series Data ◽

Multiple Time ◽

Dynamic Density ◽

Data Set ◽

Time Part ◽

Density Clustering

In many clustering problems, the whole data is not always static. Over time, part of it is likely to be changed, such as updated, erased, etc. Suffer this effect, the timeline can be divided into multiple time segments. And, the data at each time slice is static. Then, the data along the timeline shows a series of dynamic intermediate states. The union set of data from all time slices is called the time-series data. Obviously, the traditional clustering process does not apply directly to the time-series data. Meanwhile, repeating the clustering process at every time slices costs tremendous. In this paper, we analyze the transition rules of the data set and cluster structure when the time slice shifts to the next. We find there is a distinct correlation of data set and succession of cluster structure between two adjacent ones, which means we can use it to reduce the cost of the whole clustering process. Inspired by it, we propose a dynamic density clustering method (DDC) for time-series data. In the simulations, we choose 6 representative problems to construct the time-series data for testing DDC. The results show DDC can get high accuracy results for all 6 problems while reducing the overall cost markedly.

Download Full-text

Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration

Information Visualization ◽

10.1057/palgrave.ivs.9500061 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-18 ◽

Cited By ~ 176

Author(s):

Harry Hochheiser ◽

Ben Shneiderman

Keyword(s):

Time Series ◽

User Interfaces ◽

Time Series Data ◽

Series Data ◽

Time Interval ◽

Data Sets ◽

Multiple Time ◽

Interactive Exploration ◽

Additional Support ◽

Dynamic Query

Timeboxes are rectangular widgets that can be used in direct-manipulation graphical user interfaces (GUIs) to specify query constraints on time series data sets. Timeboxes are used to specify simultaneously two sets of constraints: given a set of N time series profiles, a timebox covering time periods x1… x2 ( x1 ≤ x2) and values y1… y2 ( y1 ≤ y2) will retrieve only those n√N that have values y1 ≤ y2 during all times x1 ≤ x ≤ x2. TimeSearcher is an information visualization tool that combines timebox queries with overview displays, query-by-example facilities, and support for queries over multiple time-varying attributes. Query manipulation tools including pattern inversion and ‘leaders & laggards’ graphical bookmarks provide additional support for interactive exploration of data sets. Extensions to the basic timebox model that provide additional expressivity include variable time timeboxes, which can be used to express queries with variability in the time interval, and angular queries, which search for ranges of differentials, rather than absolute values. Analysis of the algorithmic requirements for providing dynamic query performance for timebox queries showed that a sequential search outperformed searches based on geometric indices. Design studies helped identify the strengths and weaknesses of the query tools. Extended case studies involving the analysis of two different types of data from molecular biology experiments provided valuable feedback and validated the utility of both the timebox model and the TimeSearcher tool. Timesearcher is available at http://www.cs.umd.edu/hcil/timesearcher

Download Full-text

Exploring the dynamic contribution behavior of editors in wikis based on time series analysis

Program electronic library and information systems ◽

10.1108/prog-06-2013-0034 ◽

2016 ◽

Vol 50 (1) ◽

pp. 41-57 ◽

Cited By ~ 1

Author(s):

Linghe Huang ◽

Qinghua Zhu ◽

Jia Tina Du ◽

Baozhen Lee

Keyword(s):

Time Series ◽

Time Series Analysis ◽

Time Series Data ◽

Time Of Day ◽

Observation Time ◽

Series Data ◽

Clustering Methods ◽

Content Type ◽

Free Rider Problem ◽

Series Analysis

Purpose – Wiki is a new form of information production and organization, which has become one of the most important knowledge resources. In recent years, with the increase of users in wikis, “free rider problem” has been serious. In order to motivate editors to contribute more to a wiki system, it is important to fully understand their contribution behavior. The purpose of this paper is to explore the law of dynamic contribution behavior of editors in wikis. Design/methodology/approach – After developing a dynamic model of contribution behavior, the authors employed both the metrological and clustering methods to process the time series data. The experimental data were collected from Baidu Baike, a renowned Chinese wiki system similar to Wikipedia. Findings – There are four categories of editors: “testers,” “dropouts,” “delayers” and “stickers.” Testers, who contribute the least content and stop contributing rapidly after editing a few articles. After editing a large amount of content, dropouts stop contributing completely. Delayers are the editors who do not stop contributing during the observation time, but they may stop contributing in the near future. Stickers, who keep contributing and edit the most content, are the core editors. In addition, there are significant time-of-day and holiday effects on the number of editors’ contributions. Originality/value – By using the method of time series analysis, some new characteristics of editors and editor types were found. Compared with the former studies, this research also had a larger sample. Therefore, the results are more scientific and representative and can help managers to better optimize the wiki systems and formulate incentive strategies for editors.

Download Full-text

Studying monthly rainfall over Dibrugarh, Assam: Use of SARIMA approach

MAUSAM ◽

10.54302/mausam.v68i2.637 ◽

2021 ◽

Vol 68 (2) ◽

pp. 349-356

Author(s):

J. HAZARIKA ◽

B. PATHAK ◽

A. N. PATOWARY

Keyword(s):

Time Series ◽

Time Series Data ◽

Moving Average ◽

Demand Management ◽

Arima Model ◽

Monthly Rainfall ◽

Series Data ◽

Data Set ◽

Modeling And Forecasting ◽

Moving Average Model

Perceptive the rainfall pattern is tough for the solution of several regional environmental issues of water resources management, with implications for agriculture, climate change, and natural calamity such as floods and droughts. Statistical computing, modeling and forecasting data are key instruments for studying these patterns. The study of time series analysis and forecasting has become a major tool in different applications in hydrology and environmental fields. Among the most effective approaches for analyzing time series data is the ARIMA (Autoregressive Integrated Moving Average) model introduced by Box and Jenkins. In this study, an attempt has been made to use Box-Jenkins methodology to build ARIMA model for monthly rainfall data taken from Dibrugarh for the period of 1980- 2014 with a total of 420 points. We investigated and found that ARIMA (0, 0, 0) (0, 1, 1)12 model is suitable for the given data set. As such this model can be used to forecast the pattern of monthly rainfall for the upcoming years, which can help the decision makers to establish priorities in terms of agricultural, flood, water demand management etc.

Download Full-text

Time series event correlation with DTW and Hierarchical Clustering methods