Mining precise cause and effect rules in large time series data of socio-economic indicators

Time series data occur in many real life applications, ranging from science and engineering to business. In many of these applications, searching through large time series database based on query sequence is often desirable. Such similarity-based retrieval is also the basic subroutine in several advanced time series data mining tasks such as clustering, classification, finding motifs, detecting anomaly patterns, rule discovery and visualization. Although several different approaches have been developed, most are based on the common premise of dimensionality reduction and spatial access methods. This survey gives an overview of recent research and shows how the methods fit into a general framework of feature extraction.

Download Full-text

The durability of economic indicators in container shipping demand: a case study of East Asia–US container transport

Maritime Business Review ◽

10.1108/mabr-12-2020-0075 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tomoya Kawasaki ◽

Takuma Matsuda ◽

Yui-yip Lau ◽

Xiaowen Fu

Keyword(s):

Time Series ◽

East Asia ◽

Time Series Data ◽

Economic Indicators ◽

Series Data ◽

Container Shipping ◽

Content Type ◽

The Usa ◽

Container Movement ◽

The Impact

Purpose In the maritime industry, it is vital to have a reliable forecast of container shipping demand. Although indicators of economic conditions have been used in modeling container shipping demand on major routes such as those from East Asia to the USA, the duration of such indicators’ effects on container movement demand have not been systematically examined. To bridge this gap in research, this study aims to identify the important US economic indicators that significantly affect the volume of container movements and empirically reveal the duration of such impacts. Design/methodology/approach The durability of economic indicators on container movements is identified by a vector autoregression (VAR) model using monthly-based time-series data. In the VAR model, this paper can analyze the effect of economic indicators at t-k on container movement at time t. In the model, this paper considers nine US economic indicators as explanatory variables that are likely to affect container movements. Time-series data are used for 228 months from January 2001 to December 2019. Findings In the mainland China route, “building permission” receives high impact and has a duration of 14 months, reflecting the fact that China exports a high volume of housing-related goods to the USA. Regarding the South Korea and Japan routes, where high volumes of machinery goods are exported to the USA, the “index of industrial production” receives a high impact with 11 and 13 months’ duration, respectively. On the Taiwan route, as several types of goods are transported with significant shares, “building permits” and “index of industrial production” have important effects. Originality/value Freight demand forecasting for bulk cargo is a popular research field because of the public availability of several time-series data. However, no study to date has measured the impact and durability of economic indicators on container movement. To bridge the gap in the literature in terms of the impact of economic indicators and their durability, this paper developed a time-series model of the container movement from East Asia to the USA.

Download Full-text

Interactive Visualization Adopting Dimensionality Reduction Techniques for Pattern Recognition in Large Temporal Datasets

10.23889/suthesis.56896 ◽

2021 ◽

Author(s):

◽

Mohammed Ali

Keyword(s):

Machine Learning ◽

Time Series ◽

Dimensionality Reduction ◽

Visual Analytics ◽

Large Time ◽

Time Series Data ◽

Machine Learning Algorithms ◽

Series Data ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques

In this thesis, we focus on time-series data, which is commonly used by domain experts in different domains to explore and understand phenomena or behaviors under consideration, as-sisting them in making decisions, predicting the future or solving problems. Utilizing sensor devices is one of the common ways of collecting time-series data. These devices collect large volumes of raw data, including multi-dimensional time-series data, and each value is associated with the time-stamp corresponding to when it was recorded. However, finding interesting pat-terns or behaviors in a large amount of data is not simple due to the nature of the data and other challenges related to its size and scalability, high dimensionality, complexity, representation, and unique structure.Researchers tend to use time-series chart visualization, which is usually unsuitable because of the small screen resolution which cannot accommodate the large size of the data. Hence, occlusion and overplotting issues occur, limiting or complicating the exploration and analysis tasks. Another challenge concerns the labeling of patterns in large time-series data, which is time-consuming and requires a great deal of expert knowledge.These issues are addressed in this thesis to improve the exploration, analysis and presen-tation of time-series data and enable users to gain insights into large and multi-dimensional time-series datasets using a combination of dimensionality reduction techniques and interac-tive visual methods. The provided solutions will help researchers from various domains who deal with large and multi-dimensional time-series data to efficiently explore and analyze such data with little effort and in record time.Initially, we explore the area of integration between machine learning algorithms and inter-active visualization techniques for exploring and understanding time-series data, specifically looking at clustering and classification for time-series data in visual analytics. The survey is considered to be a valuable guide for both new researchers and experts in the emerging field of integrating machine learning algorithms into visual analytics.Next, we present a novel approach that aims to explore, analyze, and present large temporal datasets through one image. The proposed approach uses a sliding window and dimensionality reduction techniques to depict a large time-series data as points into a 2D scatter plot. The approach provides novel solutions to many pattern discovery issues and can deal with both univariate and multivariate time-series data.Following this, our proposed approach is combined with both visualization and interaction techniques into one system called TimeCluster, which is a visual analytics tool allowing users to visualize, explore and interact with large time-series data. The system addresses different issues such as anomaly detection, the discovery of frequent patterns, and the labeling of in-teresting patterns in large time-series data all in a single system. We deploy our system with different time-series datasets and report real-world case studies of its utility.Later, the linkage between the 1D view (time-series chart) to the 2D view of the 2D embed-ding of time-series data, and parallel interactions such as selection and labeling, are employed to explore and examine the effectiveness of recent developments in machine learning and di-mension reduction in the context of time-series data exploration. We design a user study to evaluate and validate the effectiveness of the linkage between both a 1D and 2D visualization, and how their fitness in the context of projecting time-series data is, where different dimen-sionality reduction techniques are examined, evaluated and compared within our experimental setting.Lastly, we conclude our findings and outline possible areas for future work.

Download Full-text

Grasp heuristic for time series compression with piecewise aggregate approximation

RAIRO - Operations Research ◽

10.1051/ro/2018089 ◽

2019 ◽

Vol 53 (1) ◽

pp. 243-259 ◽

Cited By ~ 3

Author(s):

Vanel Steve Siyou Fotso ◽

Engelbert Mephu Nguifo ◽

Philippe Vaslin

Keyword(s):

Data Mining ◽

Time Series ◽

Large Time ◽

Time Series Data ◽

Optimal Parameter ◽

Series Data ◽

Efficiency And Effectiveness ◽

Number Of Segments ◽

Grasp Heuristic

The Piecewise Aggregate Approximation (PAA) is widely used in time series data mining because it allows to discretize, to reduce the length of time series and it is used as a subroutine by algorithms for patterns discovery, indexing, and classification of time series. However, it requires setting one parameter: the number of segments to consider during the discretization. The optimal parameter value is highly data dependent in particular on large time series. This paper presents a heuristic for time series compression with PAA which minimizes the loss of information. The heuristic is built upon the well known metaheuristic GRASP and strengthened with an inclusion of specific global search component. An extensive experimental evaluation on several time series datasets demonstrated its efficiency and effectiveness in terms of compression ratio, compression interpretability and classification.

Download Full-text