Mining Massive Time Series Data: With Dimensionality Reduction Techniques

Visual Analytics ◽

Large Time ◽

Time Series Data ◽

Machine Learning Algorithms ◽

Series Data ◽

Reduction Techniques ◽

In this thesis, we focus on time-series data, which is commonly used by domain experts in different domains to explore and understand phenomena or behaviors under consideration, as-sisting them in making decisions, predicting the future or solving problems. Utilizing sensor devices is one of the common ways of collecting time-series data. These devices collect large volumes of raw data, including multi-dimensional time-series data, and each value is associated with the time-stamp corresponding to when it was recorded. However, finding interesting pat-terns or behaviors in a large amount of data is not simple due to the nature of the data and other challenges related to its size and scalability, high dimensionality, complexity, representation, and unique structure.Researchers tend to use time-series chart visualization, which is usually unsuitable because of the small screen resolution which cannot accommodate the large size of the data. Hence, occlusion and overplotting issues occur, limiting or complicating the exploration and analysis tasks. Another challenge concerns the labeling of patterns in large time-series data, which is time-consuming and requires a great deal of expert knowledge.These issues are addressed in this thesis to improve the exploration, analysis and presen-tation of time-series data and enable users to gain insights into large and multi-dimensional time-series datasets using a combination of dimensionality reduction techniques and interac-tive visual methods. The provided solutions will help researchers from various domains who deal with large and multi-dimensional time-series data to efficiently explore and analyze such data with little effort and in record time.Initially, we explore the area of integration between machine learning algorithms and inter-active visualization techniques for exploring and understanding time-series data, specifically looking at clustering and classification for time-series data in visual analytics. The survey is considered to be a valuable guide for both new researchers and experts in the emerging field of integrating machine learning algorithms into visual analytics.Next, we present a novel approach that aims to explore, analyze, and present large temporal datasets through one image. The proposed approach uses a sliding window and dimensionality reduction techniques to depict a large time-series data as points into a 2D scatter plot. The approach provides novel solutions to many pattern discovery issues and can deal with both univariate and multivariate time-series data.Following this, our proposed approach is combined with both visualization and interaction techniques into one system called TimeCluster, which is a visual analytics tool allowing users to visualize, explore and interact with large time-series data. The system addresses different issues such as anomaly detection, the discovery of frequent patterns, and the labeling of in-teresting patterns in large time-series data all in a single system. We deploy our system with different time-series datasets and report real-world case studies of its utility.Later, the linkage between the 1D view (time-series chart) to the 2D view of the 2D embed-ding of time-series data, and parallel interactions such as selection and labeling, are employed to explore and examine the effectiveness of recent developments in machine learning and di-mension reduction in the context of time-series data exploration. We design a user study to evaluate and validate the effectiveness of the linkage between both a 1D and 2D visualization, and how their fitness in the context of projecting time-series data is, where different dimen-sionality reduction techniques are examined, evaluated and compared within our experimental setting.Lastly, we conclude our findings and outline possible areas for future work.

Time Series Data Representation and Dimensionality Reduction Techniques

Algorithms for Intelligent Systems - Applications of Machine Learning ◽

10.1007/978-981-15-3357-0_18 ◽

2020 ◽

pp. 267-284

Author(s):

Anshul Sharma ◽

Abhinav Kumar ◽

Anil Kumar Pandey ◽

Rishav Singh

Keyword(s):

Time Series ◽

Time Series Data ◽

Data Representation ◽

Series Data ◽

Reduction Techniques ◽

Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining

Advances in Data Mining Knowledge Discovery and Applications ◽

10.5772/49941 ◽

2012 ◽

Cited By ~ 21

Author(s):

Carmelo Cassisi ◽

Placido Montalto ◽

Marco Aliotta ◽

Andrea Cannata ◽

Alfredo Pulvirenti

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Similarity Measures ◽

Series Data ◽

Time Series Data Mining ◽

Reduction Techniques ◽

Dimensionality Reduction for the Analysis of Time Series Data from Wind Turbines

Scientific Computing and Algorithms in Industrial Simulations ◽

10.1007/978-3-319-62458-7_16 ◽

2017 ◽

pp. 317-339 ◽

Cited By ~ 2

Author(s):

Jochen Garcke ◽

Rodrigo Iza-Teran ◽

Marvin Marks ◽

Mandar Pathare ◽

Dirk Schollbach ◽

...

Keyword(s):

Time Series ◽

Wind Turbines ◽

Time Series Data ◽

Series Data ◽

Analysis Of Time Series

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data

10.1007/978-3-642-01307-2_105 ◽

2009 ◽

pp. 989-996 ◽

Cited By ~ 3

Author(s):

Poat Sajjipanon ◽

Chotirat Ann Ratanamahatana

Keyword(s):

Time Series ◽

Large Time ◽

Time Series Data ◽

Series Data ◽

Fractal Representation

Dimensionality reduction of fMRI time series data using locally linear embedding

Magnetic Resonance Materials in Physics Biology and Medicine ◽

10.1007/s10334-010-0204-0 ◽

2010 ◽

Vol 23 (5-6) ◽

pp. 327-338 ◽

Cited By ~ 13

Author(s):

Peter Mannfolk ◽

Ronnie Wirestam ◽

Markus Nilsson ◽

Freddy Ståhlberg ◽

Johan Olsrud

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Data ◽

Locally Linear Embedding ◽

Fmri Time Series ◽

Linear Embedding ◽

Locally Linear

Effects of dimensionality reduction techniques on time series similarity measurements

2008 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2008.4493534 ◽

2008 ◽

Author(s):

Ghazi Al-Naymat ◽

Javid Taheri

Keyword(s):

Time Series ◽

Reduction Techniques ◽

A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2020.3028889 ◽

2020 ◽

pp. 1-1

Author(s):

Takanori Fujiwara ◽

Shilpika ◽

Naohisa Sakamoto ◽

Jorji Nonaka ◽

Keiji Yamamoto ◽

...

Keyword(s):

Time Series ◽

Visual Analytics ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data

Automatic Crop Classification in Northeastern China by Improved Nonlinear Dimensionality Reduction for Satellite Image Time Series

Remote Sensing ◽

10.3390/rs12172726 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2726 ◽

Cited By ~ 1

Author(s):

Yongguang Zhai ◽

Nan Wang ◽

Lifu Zhang ◽

Lei Hao ◽

Caihong Hao

Keyword(s):

Time Series ◽

Time Series Data ◽

Satellite Image ◽

Series Data ◽

Nonlinear Dimensionality Reduction ◽

Distribution Map ◽

Dimensionality Reduction Technique ◽

Classification Tasks ◽

Crop Classification

Accurate and timely information on the spatial distribution of crops is of great significance to precision agriculture and food security. Many cropland mapping methods using satellite image time series are based on expert knowledge to extract phenological features to identify crops. It is still a challenge to automatically obtain meaningful features from time-series data for crop classification. In this study, we developed an automated method based on satellite image time series to map the spatial distribution of three major crops including maize, rice, and soybean in northeastern China. The core method used is the nonlinear dimensionality reduction technique. However, the existing nonlinear dimensionality reduction technique cannot handle missing data, and it is not designed for subsequent classification tasks. Therefore, the nonlinear dimensionality reduction algorithm Landmark–Isometric feature mapping (L–ISOMAP) is improved. The advantage of the improved L–ISOMAP is that it does not need to reconstruct time series for missing data, and it can automatically obtain meaningful featured metrics for classification. The improved L–ISOMAP was applied to Landsat 8 full-band time-series data during the crop-growing season in the three northeastern provinces of China; then, the dimensionality reduction bands were inputted into a random forest classifier to complete a crop distribution map. The results show that the area of crops mapped is consistent with official statistics. The 2015 crop distribution map was evaluated through the collected reference dataset, and the overall classification accuracy and Kappa index were 83.68% and 0.7519, respectively. The geographical characteristics of major crops in three provinces in northeast China were analyzed. This study demonstrated that the improved L–ISOMAP method can be used to automatically extract features for crop classification. For future work, there is great potential for applying automatic mapping algorithms to other data or classification tasks.

Parallel Dimensionality Reduction Transformation for Time-Series Data

2009 First Asian Conference on Intelligent Information and Database Systems ◽

10.1109/aciids.2009.48 ◽

2009 ◽

Author(s):

Hoang Chi Thanh

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Data