Querying of Time Series for Big Data Analytics

Time series data emerge naturally in many fields of applied sciences and engineering including but not limited to statistics, signal processing, mathematical finance, weather and power consumption forecasting. Although time series data have been well studied in the past, they still present a challenge to the scientific community. Advanced operations such as classification, segmentation, prediction, anomaly detection and motif discovery are very useful especially for machine learning as well as other scientific fields. The advent of Big Data in almost every scientific domain motivates us to provide an in-depth study of the state of the art approaches associated with techniques for efficient querying of time series. This chapters aims at providing a comprehensive review of the existing solutions related to time series representation, processing, indexing and querying operations.

Download Full-text

A Long Short Term Memory with Peephole Connections and Generative Adversarial Network Based Collaborative Methodology to Identify Outliers in ECG Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9273 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3798-3803

Author(s):

M. D. Anto Praveena ◽

B. Bharathi

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Time Series Data ◽

Short Term Memory ◽

Big Data Analytics ◽

Data Preprocessing ◽

Heterogeneous Data ◽

Series Data ◽

Outlier Identification

Big Data analytics has become an upward field, and it plays a pivotal role in Healthcare and research practices. Big data analytics in healthcare cover vast numbers of dynamic heterogeneous data integration and analysis. Medical records of patients include several data including medical conditions, medications and test findings. One of the major challenges of analytics and prediction in healthcare is data preprocessing. In data preprocessing the outlier identification and correction is the important challenge. Outliers are exciting values that deviates from other values of the attribute; they may simply experimental errors or novelty. Outlier identification is the method of identifying data objects with somewhat different behaviors than expectations. Detecting outliers in time series data is different from normal data. Time series data are the data that are in a series of certain time periods. This kind of data are identified and cleared to bring the quality dataset. In this proposed work a hybrid outlier detection algorithm extended LSTM-GAN is helped to recognize the outliers in time series data. The outcome of the proposed extended algorithm attained better enactment in the time series analysis on ECG dataset processing compared with traditional methodologies.

Download Full-text

Time Series Data Mining: A Case Study With Big Data Analytics Approach

IEEE Access ◽

10.1109/access.2020.2966553 ◽

2020 ◽

Vol 8 ◽

pp. 14322-14328 ◽

Cited By ~ 3

Author(s):

Fang Wang ◽

Menggang Li ◽

Yiduo Mei ◽

Wenrui Li

Keyword(s):

Data Mining ◽

Time Series ◽

Big Data ◽

Data Analytics ◽

Time Series Data ◽

Big Data Analytics ◽

Series Data ◽

Time Series Data Mining

Download Full-text

Edge4TSC: Binary Distribution Tree-Enabled Time Series Classification in Edge Environment

Sensors ◽

10.3390/s20071908 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1908

Author(s):

Chao Ma ◽

Xiaochuan Shi ◽

Wei Li ◽

Weiping Zhu

Keyword(s):

Time Series ◽

Deep Learning ◽

Classification Accuracy ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

Feature Engineering ◽

Time Series Classification ◽

Binary Distribution ◽

New Time

In the past decade, time series data have been generated from various fields at a rapid speed, which offers a huge opportunity for mining valuable knowledge. As a typical task of time series mining, Time Series Classification (TSC) has attracted lots of attention from both researchers and domain experts due to its broad applications ranging from human activity recognition to smart city governance. Specifically, there is an increasing requirement for performing classification tasks on diverse types of time series data in a timely manner without costly hand-crafting feature engineering. Therefore, in this paper, we propose a framework named Edge4TSC that allows time series to be processed in the edge environment, so that the classification results can be instantly returned to the end-users. Meanwhile, to get rid of the costly hand-crafting feature engineering process, deep learning techniques are applied for automatic feature extraction, which shows competitive or even superior performance compared to state-of-the-art TSC solutions. However, because time series presents complex patterns, even deep learning models are not capable of achieving satisfactory classification accuracy, which motivated us to explore new time series representation methods to help classifiers further improve the classification accuracy. In the proposed framework Edge4TSC, by building the binary distribution tree, a new time series representation method was designed for addressing the classification accuracy concern in TSC tasks. By conducting comprehensive experiments on six challenging time series datasets in the edge environment, the potential of the proposed framework for its generalization ability and classification accuracy improvement is firmly validated with a number of helpful insights.

Download Full-text

Piecewise Trend Approximation: A Ratio-Based Time Series Representation

Abstract and Applied Analysis ◽

10.1155/2013/603629 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Jingpei Dan ◽

Weiren Shi ◽

Fangyan Dong ◽

Kaoru Hirota

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Representation ◽

Feature Space ◽

Original Data ◽

Series Data ◽

High Dimensional ◽

Original Time Series ◽

Data Space ◽

Original Time

A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series; the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively. Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends, and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA is effective for high dimensional time series data mining.

Download Full-text

Using SAR satellite data time-series for regional glacier mapping

10.5194/tc-2017-136 ◽

2017 ◽

Cited By ~ 1

Author(s):

Solveig H. Winsvold ◽

Andreas Kääb ◽

Christopher Nuth ◽

Liss M. Andreassen ◽

Ward van Pelt ◽

...

Keyword(s):

Time Series ◽

Satellite Data ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

Landsat 8 ◽

Change Analysis ◽

Surface Mass ◽

Sentinel 2A ◽

Optical Satellite Images

Abstract. With dense SAR satellite data time-series it is possible to map surface and subsurface glacier properties that vary in time. On Sentinel-1A and Radarsat-2 backscatter images over mainland Norway and Svalbard, we have used descriptive methods for outlining the possibilities of using SAR time-series for mapping glaciers. We present five application scenarios, where the first shows potential for tracking transient snow lines with SAR backscatter time-series, and correlates with both optical satellite images (Sentinel-2A and Landsat 8) and equilibrium line altitudes derived from in situ surface mass balance data. In the second application scenario, time-series representation of glacier facies corresponding to SAR glacier zones shows potential for a more accurate delineation of the zones and how they change in time. The third application scenario investigates the firn evolution using dense SAR backscatter time-series together with a coupled energy balance and multi-layer firn model. We find strong correlation between backscatter signals with both the modeled firn air-content and modeled wetness in the firn. In the fourth application scenario, we highlight how winter rain events can be detected in SAR time-series, revealing important information about the area extent of internal accumulation. Finally, in the last application scenario, averaged summer SAR images were found to have potential in assisting the process of mapping glaciers outlines, especially in the presence of seasonal snow. Altogether we present examples of how to map glaciers and to further understand glaciological processes using the existing and future massive amount of multi-sensor time-series data. Our results reveal the potential of satellite imagery for automatically derived products as important input in modeling assessments and glacier change analysis.

Download Full-text

PhilDB - The time series database with built-in change logging

10.7287/peerj.preprints.1488v1 ◽

2015 ◽

Author(s):

Andrew MacDonald

Keyword(s):

Time Series ◽

Big Data ◽

Open Source ◽

High Performance ◽

Time Series Data ◽

Handling Time ◽

Series Data ◽

Meta Data ◽

Static Data ◽

Data Tracking

PhilDB is an open-source time series database. It supports storage of time series datasets that are dynamic, that is recording updates to existing values in a log as they occur. Recent open-source systems, such as InfluxDB and OpenTSDB, have been developed to indefinitely store long-period, high-resolution time series data. Unfortunately they require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that don’t take the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. PhilDB improves accessing datasets by two methods. Firstly, it uses fast reads which make it practical to select data for analysis. Secondly, it uses simple read methods to minimise effort required to extract data. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances as a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. This paper describes the general approach, architecture, and philosophy of the PhilDB software.

Download Full-text

SMM: Leveraging Metadata for Contextually Salient Multi-Variate Motif Discovery

Applied Sciences ◽

10.3390/app112210873 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10873

Author(s):

Silvestro R. Poccia ◽

K. Selçuk Candan ◽

Maria Luisa Sapino

Keyword(s):

Time Series ◽

Motif Discovery ◽

Time Series Data ◽

Contextual Information ◽

Multimedia Data ◽

Series Data ◽

Motif Search ◽

Fixed Subset ◽

Simplifying Assumptions ◽

Expensive Process

A common challenge in multimedia data understanding is the unsupervised discovery of recurring patterns, or motifs, in time series data. The discovery of motifs in uni-variate time series is a well studied problem and, while being a relatively new area of research, there are also several proposals for multi-variate motif discovery. Unfortunately, motif search among multiple variates is an expensive process, as the potential number of sub-spaces in which a pattern can occur increases exponentially with the number of variates. Consequently, many multi-variate motif search algorithms make simplifying assumptions, such as searching for motifs across all variates individually, assuming that the motifs are of the same length, or that they occur on a fixed subset of variates. In this paper, we are interested in addressing a relatively broad form of multi-variate motif detection, which seeks frequently occurring patterns (of possibly differing lengths) in sub-spaces of a multi-variate time series. In particular, we aim to leverage contextual information to help select contextually salient patterns and identify the most frequent patterns among all. Based on these goals, we first introduce the contextually salient multi-variate motif (CS-motif) discovery problem and then propose a salient multi-variate motif (SMM) algorithm that, unlike existing methods, is able to seek a broad range of patterns in multi-variate time series.

Download Full-text

An end-to-end Novel Forecasting Model for Crime Prediction based on Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9153.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3704-3708

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Linear Time ◽

Big Data Analytics ◽

Series Data ◽

Data Sets ◽

Data Set ◽

Main Category ◽

Crime Prediction

Big data analytics is a field in which we analyse and process information from large or convoluted data sets to be managed by methods of data-processing. Big data analytics is used in analysing the data and helps in predicting the best outcome from the data sets. Big data analytics can be very useful in predicting crime and also gives the best possible solution to solve that crime. In this system we will be using the past crime data set to find out the pattern and through that pattern we will be predicting the range of the incident. The range of the incident will be determined by the decision model and according to the range the prediction will be made. The data sets will be nonlinear and in the form of time series so in this system we will be using the prophet model algorithm which is used to analyse the non-linear time series data. The prophet model categories in three main category and i.e. trends, seasonality, and holidays. This system will help crime cell to predict the possible incident according to the pattern which will be developed by the algorithm and it also helps to deploy right number of resources to the highly marked area where there is a high chance of incidents to occur. The system will enhance the crime prediction system and will help the crime department to use their resources more efficiently.

Download Full-text

A forecasting of stock trading price using time series information based on big data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2548-2554 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2548

Author(s):

Soo-Tai Nam ◽

Chan-Yong Jin ◽

Seong-Yoon Shin

Keyword(s):

Time Series ◽

Big Data ◽

Stock Price ◽

Euclidean Distance ◽

Time Series Data ◽

Series Data ◽

Analysis Tool ◽

Large Set ◽

Data Generation ◽

Management Tools

Big data is a large set of structured or unstructured data that can collect, store, manage, and analyze data with existing database management tools. And it means the technique of extracting value from these data and interpreting the results. Big data has three characteristics: The size of existing data and other data (volume), the speed of data generation (velocity), and the variety of information forms (variety). The time series data are obtained by collecting and recording the data generated in accordance with the flow of time. If the analysis of these time series data, found the characteristics of the data implies that feature helps to understand and analyze time series data. The concept of distance is the simplest and the most obvious in dealing with the similarities between objects. The commonly used and widely known method for measuring distance is the Euclidean distance. This study is the result of analyzing the similarity of stock price flow using 793,800 closing prices of 1,323 companies in Korea. Visual studio and Excel presented calculate the Euclidean distance using an analysis tool. We selected “000100” as a target domestic company and prepared for big data analysis. As a result of the analysis, the shortest Euclidean distance is the code “143860” company, and the calculated value is “11.147”. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Download Full-text

Using SAR satellite data time series for regional glacier mapping

The Cryosphere ◽

10.5194/tc-12-867-2018 ◽

2018 ◽

Vol 12 (3) ◽

pp. 867-890 ◽

Cited By ~ 11

Author(s):

Solveig H. Winsvold ◽

Andreas Kääb ◽

Christopher Nuth ◽

Liss M. Andreassen ◽

Ward J. J. van Pelt ◽

...

Keyword(s):

Time Series ◽

Satellite Data ◽

Time Series Data ◽

Series Representation ◽

Series Data ◽

Landsat 8 ◽

Surface Mass ◽

Rain Events ◽

Sentinel 2A ◽

Optical Satellite Images

Abstract. With dense SAR satellite data time series it is possible to map surface and subsurface glacier properties that vary in time. On Sentinel-1A and RADARSAT-2 backscatter time series images over mainland Norway and Svalbard, we outline how to map glaciers using descriptive methods. We present five application scenarios. The first shows potential for tracking transient snow lines with SAR backscatter time series and correlates with both optical satellite images (Sentinel-2A and Landsat 8) and equilibrium line altitudes derived from in situ surface mass balance data. In the second application scenario, time series representation of glacier facies corresponding to SAR glacier zones shows potential for a more accurate delineation of the zones and how they change in time. The third application scenario investigates the firn evolution using dense SAR backscatter time series together with a coupled energy balance and multilayer firn model. We find strong correlation between backscatter signals with both the modeled firn air content and modeled wetness in the firn. In the fourth application scenario, we highlight how winter rain events can be detected in SAR time series, revealing important information about the area extent of internal accumulation. In the last application scenario, averaged summer SAR images were found to have potential in assisting the process of mapping glaciers outlines, especially in the presence of seasonal snow. Altogether we present examples of how to map glaciers and to further understand glaciological processes using the existing and future massive amount of multi-sensor time series data.

Download Full-text