A Simulation Study on Clustering Multivariate Time Series Using Kernel Variant Multi-Way Principal Component Analysis

2017 ◽  
Vol 25 (2) ◽  
pp. 229-253
Author(s):  
Hwanseok Choi ◽  
Cheolwoo Lee ◽  
Jin Q Jeon

Conventional time series modeling may not satisfy the model validity for short-period time series data. In this study, we apply the Kernel Variant Multi-Way Principal Component Analysis (KMPCA) to cluster multivariate time series data which havemultiple dimensions with auto- and cross-correlations. We then check whether this method works well in clustering those data by employing simulation for generalization. Two simulation studies with two different mean structures with nine combinations of auto- and cross-correlations were conducted. The results showed that KMPCA cluster two different mean structure groups over 90% success rates with an appropriate kernel function. We also found that when the mean structures are the same, auto-correlation, the number of temporal points, and the kernel function parameter have the statistically significant effects on clustering performance. The second and third order interaction effects with each of those factors also have effects on clustering success rates. Among the effects of the main factors, the kernel function parameter is the most critical factor to consider for obtaining better performance. A similar error structure may obstruct the clustering performance: strong cross-correlation, weak auto-correlation, and a larger number of temporal points. The paper also discussed some limitations of the KMPCA model and suggested directions for future research that could improve the model.

2017 ◽  
Vol 139 (6) ◽  
Author(s):  
Afshin Abbasi Hoseini ◽  
Sverre Steen

A framework is presented for data mining in multivariate time series collected over hours of ship operation to extract vessel states from the data. The measurements made by a ship monitoring system lead to a collection of time-organized in-service data. Usually, these time series datasets are big, complicated, and highly dimensional. The purpose of time-series data mining is to bridge the gap between a massive database and meaningful information hidden behind the data. An important aspect of the framework proposed is selecting relevant variables, eliminating unnecessary information or noises, and extracting the essential features of the problem so that the vessel behavior can be identified reliably. Principal component analysis (PCA) is employed to address the issues of multicollinearity in the data and dimensionality reduction. The data mining approach itself is established on unsupervised data clustering using self-organizing map (SOM) and k-means, and k-nearest neighbors search (k-NNS) for searching and recovering specific information from the database. As a case study, the results are based on onboard monitoring data of the Norwegian University of Science and Technology (NTNU) research vessel, “Gunnerus.” The scope of this work is limited to detecting ship maneuvers. However, it is extendable to a wide range of smart marine applications. As illustrated in the results, this approach is effective in identifying the prior unknown states of the ship with acceptable accuracy.


Author(s):  
Fayed Alshammri ◽  
Jiazhu Pan

AbstractThis paper proposes an extension of principal component analysis to non-stationary multivariate time series data. A criterion for determining the number of final retained components is proposed. An advance correlation matrix is developed to evaluate dynamic relationships among the chosen components. The theoretical properties of the proposed method are given. Many simulation experiments show our approach performs well on both stationary and non-stationary data. Real data examples are also presented as illustrations. We develop four packages using the statistical software R that contain the needed functions to obtain and assess the results of the proposed method.


Water ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 1633
Author(s):  
Elena-Simona Apostol ◽  
Ciprian-Octavian Truică ◽  
Florin Pop ◽  
Christian Esposito

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.


2021 ◽  
Vol 13 (3) ◽  
pp. 67
Author(s):  
Eric Hitimana ◽  
Gaurav Bajpai ◽  
Richard Musabe ◽  
Louis Sibomana ◽  
Jayavel Kayalvizhi

Many countries worldwide face challenges in controlling building incidence prevention measures for fire disasters. The most critical issues are the localization, identification, detection of the room occupant. Internet of Things (IoT) along with machine learning proved the increase of the smartness of the building by providing real-time data acquisition using sensors and actuators for prediction mechanisms. This paper proposes the implementation of an IoT framework to capture indoor environmental parameters for occupancy multivariate time-series data. The application of the Long Short Term Memory (LSTM) Deep Learning algorithm is used to infer the knowledge of the presence of human beings. An experiment is conducted in an office room using multivariate time-series as predictors in the regression forecasting problem. The results obtained demonstrate that with the developed system it is possible to obtain, process, and store environmental information. The information collected was applied to the LSTM algorithm and compared with other machine learning algorithms. The compared algorithms are Support Vector Machine, Naïve Bayes Network, and Multilayer Perceptron Feed-Forward Network. The outcomes based on the parametric calibrations demonstrate that LSTM performs better in the context of the proposed application.


2018 ◽  
Vol 15 (147) ◽  
pp. 20180695 ◽  
Author(s):  
Simone Cenci ◽  
Serguei Saavedra

Biotic interactions are expected to play a major role in shaping the dynamics of ecological systems. Yet, quantifying the effects of biotic interactions has been challenging due to a lack of appropriate methods to extract accurate measurements of interaction parameters from experimental data. One of the main limitations of existing methods is that the parameters inferred from noisy, sparsely sampled, nonlinear data are seldom uniquely identifiable. That is, many different parameters can be compatible with the same dataset and can generalize to independent data equally well. Hence, it is difficult to justify conclusive assertions about the effect of biotic interactions without information about their associated uncertainty. Here, we develop an ensemble method based on model averaging to quantify the uncertainty associated with the effect of biotic interactions on community dynamics from non-equilibrium ecological time-series data. Our method is able to detect the most informative time intervals for each biotic interaction within a multivariate time series and can be easily adapted to different regression schemes. Overall, this novel approach can be used to associate a time-dependent uncertainty with the effect of biotic interactions. Moreover, because we quantify uncertainty with minimal assumptions about the data-generating process, our approach can be applied to any data for which interactions among variables strongly affect the overall dynamics of the system.


Sign in / Sign up

Export Citation Format

Share Document