Exploratory Time Series Data Mining by Genetic Clustering

Author(s):  
T. Warren Liao

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.

2008 ◽  
pp. 942-962
Author(s):  
T. Warren Liao

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.


2014 ◽  
Vol 1 (4) ◽  
pp. 51-68 ◽  
Author(s):  
Daniel Hebert ◽  
Billie Anderson ◽  
Alan Olinsky ◽  
J. Michael Hardin

Modern technologies have allowed for the amassment of data at a rate never encountered before. Organizations are now able to routinely collect and process massive volumes of data. A plethora of regularly collected information can be ordered using an appropriate time interval. The data would thus be developed into a time series. Time series data mining methodology identifies commonalities between sets of time-ordered data. Time series data mining detects similar time series using a technique known as dynamic time warping (DTW). This research provides a practical application of time series data mining. A real-world data set was provided to the authors by dunnhumby. A time series data mining analysis is performed using retail grocery store chain data and results are provided.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012043
Author(s):  
Wei Wang ◽  
Xiaohui Hu ◽  
Mingye Wang ◽  
Yao Du

Abstract With the rapid development of computer technology, Internet technology and artificial intelligence technology, the amount of global data has exploded. However, the single-machine serial mode of traditional data mining cannot be directly transplanted to the cloud platform. Only by parallelizing and improving many classic data mining algorithms can the cloud computing platform and data mining be effectively combined. Therefore, it is of great significance to the research and implementation of parallel algorithm technology for time series data mining. The purpose of this paper is to study the research and implementation of parallel algorithm technology for time series data mining. This paper adopts the method of literature data, mathematical statistics, logic analysis and other research methods to study the parallel algorithm technology research and realization of time series data mining, mainly to make useful explorations of time series data mining and visualization technology. It embodies the design ideas of big data analysis tools, and finally reflects the power and market value of data analysis tools through the display of the platform. Research shows that running in the same data set and the same experimental environment, the improved parallel collaborative filtering algorithm ACF in this paper has higher time running efficiency than the parallel algorithm MCF based on the cooccurrence matrix, and in the case of larger data sets, the more obvious the time difference.


2021 ◽  
Vol 1 (2) ◽  
pp. 20-37
Author(s):  
Rajae KRIBII ◽  
Youssef FAKIR

In recent times, the urge to collect data and analyze it has grown. Time stamping a data set is an important part of the analysis and data mining as it can give information that is more useful. Different mining techniques have been designed for mining time-series data, sequential patterns for example seeks relationships between occurrences of sequential events and finds if there exist any specific order of the occurrences. Many Algorithms has been proposed to study this data type based on the apriori approach. In this paper we compare two basic sequential algorithms which are General Sequential algorithm (GSP) and Sequential PAttern Discovery using Equivalence classes (SPADE). These two algorithms are based on the Apriori algorithms. Experimental results have shown that SPADE consumes less time than GSP algorithm.


Author(s):  
Marcus Erz ◽  
Jeremy Floyd Kielman ◽  
Bahar Selvi Uzun ◽  
Gabriele Stefanie Guehring

Abstract As the digital transformation is taking place, more and more data is being generated and collected.To generate meaningful information and knowledge researchers use various data mining techniques. In addition to classification, clustering, and forecasting, outlier or anomaly detection is one of the most important research areas in time series analysis. In this paper we present a method for detecting anomalies in multidimensional time series using a graph-based algorithm. We transform time series data to graphs prior to calculating the outlier since it offers a wide range of graph-based methods for anomaly detection. Furthermore the dynamics of the data is taken into consideration by implementing a window of a certain size that leads to multiple graphs in different time frames. We use feature extraction and aggregation to finally compare distance measures of two time-dependent graphs. The effectiveness of our algorithm is demonstrated on the Numenta Anomaly Benchmark with various anomaly types as well as the KPI-Anomaly-Detection data set of 2018 AIOps competition.


AI ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 48-70
Author(s):  
Wei Ming Tan ◽  
T. Hui Teo

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


MAUSAM ◽  
2021 ◽  
Vol 68 (2) ◽  
pp. 349-356
Author(s):  
J. HAZARIKA ◽  
B. PATHAK ◽  
A. N. PATOWARY

Perceptive the rainfall pattern is tough for the solution of several regional environmental issues of water resources management, with implications for agriculture, climate change, and natural calamity such as floods and droughts. Statistical computing, modeling and forecasting data are key instruments for studying these patterns. The study of time series analysis and forecasting has become a major tool in different applications in hydrology and environmental fields. Among the most effective approaches for analyzing time series data is the ARIMA (Autoregressive Integrated Moving Average) model introduced by Box and Jenkins. In this study, an attempt has been made to use Box-Jenkins methodology to build ARIMA model for monthly rainfall data taken from Dibrugarh for the period of 1980- 2014 with a total of 420 points.  We investigated and found that ARIMA (0, 0, 0) (0, 1, 1)12 model is suitable for the given data set. As such this model can be used to forecast the pattern of monthly rainfall for the upcoming years, which can help the decision makers to establish priorities in terms of agricultural, flood, water demand management etc.  


Sign in / Sign up

Export Citation Format

Share Document