A Complete Software Stack for IoT Time-Series Analysis that Combines Semantics and Machine Learning—Lessons Learned from the Dyversify Project

Companies are increasingly gathering and analyzing time-series data, driven by the rising number of IoT devices. Many works in literature describe analysis systems built using either data-driven or semantic (knowledge-driven) techniques. However, little to no works describe hybrid combinations of these two. Dyversify, a collaborative project between industry and academia, investigated how event and anomaly detection can be performed on time-series data in such a hybrid setting. We built a proof-of-concept analysis platform, using a microservice architecture to ensure scalability and fault-tolerance. The platform comprises time-series ingestion, long term storage, data semantification, event detection using data-driven and semantic techniques, dynamic visualization, and user feedback. In this work, we describe the system architecture of this hybrid analysis platform and give an overview of the different components and their interactions. As such, the main contribution of this work is an experience report with challenges faced and lessons learned.

Download Full-text

Causal Mechanism Transfer Network for Time Series Domain Adaptation in Mechanical Systems

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3445033 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-21

Author(s):

Zijian Li ◽

Ruichu Cai ◽

Hong Wei Ng ◽

Marianne Winslett ◽

Tom Z. J. Fu ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Domain Adaptation ◽

Mechanical Systems ◽

Lessons Learned ◽

Data Driven ◽

Series Data ◽

Causal Mechanism ◽

Model Generalization ◽

High Dynamics

Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry for massive labels, mostly contributed by human engineers at a high cost. Fortunately, domain adaptation enhances the model generalization by utilizing the labeled source data and the unlabeled target data. However, the mainstream domain adaptation methods cannot achieve ideal performance on time series data, since they assume that the conditional distributions are equal. This assumption works well in the static data but is inapplicable for the time series data. Even the first-order Markov dependence assumption requires the dependence between any two consecutive time steps. In this article, we assume that the causal mechanism is invariant and present our Causal Mechanism Transfer Network (CMTN) for time series domain adaptation. By capturing causal mechanisms of time series data, CMTN allows the data-driven models to exploit existing data and labels from similar systems, such that the resulting model on a new system is highly reliable even with limited data. We report our empirical results and lessons learned from two real-world case studies, on chiller plant energy optimization and boiler fault detection, which outperform the existing state-of-the-art method.

Download Full-text

Particularities of data mining in medicine: lessons learned from patient medical time series data analysis

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1582-2 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Shadi Aljawarneh ◽

Aurea Anguera ◽

John William Atwood ◽

Juan A. Lara ◽

David Lizcano

Keyword(s):

Data Mining ◽

Time Series ◽

Knowledge Discovery ◽

Time Series Data ◽

Medical Patient ◽

Lessons Learned ◽

Physiological Signals ◽

Knowledge Discovery In Databases ◽

Series Data ◽

Data Mining Techniques

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.

Download Full-text

Estimating infection-related human mobility networks based on time series data of COVID-19 infection in Japan

10.1101/2021.08.02.21261486 ◽

2021 ◽

Author(s):

Tetsuya Yamada ◽

Shoi Shi

Keyword(s):

Time Series ◽

Infectious Diseases ◽

Time Series Data ◽

Human Mobility ◽

Emerging Infectious Diseases ◽

Human Movement ◽

Data Driven ◽

Disease Spread ◽

Series Data ◽

Data Infrastructure

Comprehensive and evidence-based countermeasures against emerging infectious diseases have become increasingly important in recent years. COVID-19 and many other infectious diseases are spread by human movement and contact, but complex transportation networks in 21 century make it difficult to predict disease spread in rapidly changing situations. It is especially challenging to estimate the network of infection transmission in the countries that the traffic and human movement data infrastructure is not yet developed. In this study, we devised a method to estimate the network of transmission of COVID-19 from the time series data of its infection and applied it to determine its spread across areas in Japan. We incorporated the effects of soft lockdowns, such as the declaration of a state of emergency, and changes in the infection network due to government-sponsored travel promotion, and predicted the spread of infection using the Tokyo Olympics as a model. The models used in this study are available online, and our data-driven infection network models are scalable, whether it be at the level of a city, town, country, or continent, and applicable anywhere in the world, as long as the time-series data of infections per region is available. These estimations of effective distance and the depiction of infectious disease networks based on actual infection data are expected to be useful in devising data-driven countermeasures against emerging infectious diseases worldwide.

Download Full-text

Partial Correlation-Based Attention for Multivariate Time Series Forecasting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7132 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13720-13721

Author(s):

Won Kyung Lee

Keyword(s):

Time Series ◽

Partial Correlation ◽

Time Series Data ◽

Multivariate Time Series ◽

Time Series Forecasting ◽

Data Driven ◽

Series Data ◽

Time Lags ◽

Dependency Structure ◽

Agnostic Learning

A multivariate time-series forecasting has great potentials in various domains. However, it is challenging to find dependency structure among the time-series variables and appropriate time-lags for each variable, which change dynamically over time. In this study, I suggest partial correlation-based attention mechanism which overcomes the shortcomings of existing pair-wise comparisons-based attention mechanisms. Moreover, I propose data-driven series-wise multi-resolution convolutional layers to represent the input time-series data for domain agnostic learning.

Download Full-text

A Hybrid Learning Approach to Prognostics and Health Management Applied to Military Ground Vehicles Using Time-Series and Maintenance Event Data

Annual Conference of the PHM Society ◽

10.36001/phmconf.2020.v12i1.1146 ◽

2020 ◽

Vol 12 (1) ◽

pp. 10

Author(s):

W Glenn Bond ◽

Haley Dozier ◽

Thomas L Arnold ◽

Michael Y Lam ◽

Quyen T Dong ◽

...

Keyword(s):

Time Series ◽

Data Analytics ◽

High Performance ◽

Time Series Data ◽

Data Driven ◽

Series Data ◽

Event Data ◽

Ground Vehicles ◽

Operational Data ◽

Operational Time

Attempts to leverage operational time-series data in Condition Based Maintenance (CBM) approaches to optimize the life cycle management and Reliability, Availability, and Maintainability (RAM) of military vehicles have encountered several obstacles over decades of data collection. These obstacles have beset similar approaches on civilian ground vehicles, as well as on aircraft and other complex systems. Analysis of operational data is critical because it represents a continuous recording of the state of the system. Applying rudimentary data analytics to operational data can provide insights like fuel usage patterns or observed reliability of one vehicle or even a fleet. Monitoring trends and analyzing patterns in this data over time, however, can provide insight into the health of a vehicle, a complex system, or a fleet, predicting mean time to failure or compiling logistic or life cycle needs. Such High-Performance Data Analytics (HPDA) on operational time-series datasets has been historically difficult due to the large amount of data gathered from vehicle sensors, the lack of association between clusters observed in the data and failures or unscheduled maintenance events, and the deficiency of unsupervised learning techniques for time-series data. We present an HPDA environment and a method of discovering patterns in vehicle operational data that determines models for predicting the likelihood of imminent failure, referred to as Parameter-Based Indicators (PBIs). Our method is a data-driven approach that uses both time-series and relational maintenance data. This hybrid approach combines both supervised and unsupervised machine learning and data analytic techniques to correlate labeled, relational maintenance event data with unlabeled operational time-series data utilizing the DoD High Performance Computing (HPC) capabilities at the U.S. Army Engineer Research and Development Center. In leveraging both time-series and relational data, we demonstrate a means of fast, purely data-driven model creation that is more broadly applicable and requires less a priori information than physics informed, data-driven models. By blending these approaches, this system will be able to relate some lifecycle management goals through the workflow to generate specific PBIs that will predict failures or highlight appropriate areas of concern in individual or collective vehicle histories.

Download Full-text

Energy Demand Relationship: Theory and Empirical Application. A Short Note

10.20944/preprints202001.0008.v1 ◽

2020 ◽

Author(s):

Fakhri J. Hasanov ◽

Jeyhun L. Mikayilov

Keyword(s):

Time Series ◽

Production Function ◽

Energy Demand ◽

Demand Function ◽

Time Series Data ◽

Short Note ◽

Data Driven ◽

Series Data ◽

Industrial Energy ◽

Empirical Analyses

In this short note, the described step-by-step derivations of the industrial energy demand function from the production function framework and provided researchers with two specifications. Then we applied these theoretical specifications to the time series data as empirical analysis. We concluded that theories should be considered at the beginning of the empirical analyses but the data also should be allowed to speak freely. Hence, the main suggestion of this short note is that it would be a better strategy to consider the combination of theory-driven and data-driven approaches in the empirical analyses.

Download Full-text

Time series prediction using machine learning: a case of Bitcoin returns

Studies in Economics and Finance ◽

10.1108/sef-06-2021-0217 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Irfan Haider Shakri

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Time Series Prediction ◽

Predictive Ability ◽

Absolute Error ◽

Data Driven ◽

Series Data ◽

Policy Uncertainty ◽

Content Type

Purpose The purpose of this study is to compare five data-driven-based ML techniques to predict the time series data of Bitcoin returns, namely, alternating model tree, random forest (RF), multiple linear regression, multi-layer perceptron regression and M5 Tree algorithms. Design/methodology/approach The data used to forecast time series data of Bitcoin returns ranges from 8 July 2010 to 30 Aug 2020. This study used several predictors to predict bitcoin returns including economic policy uncertainty, equity market volatility index, S&P returns, USD/EURO exchange rates, oil and gold prices, volatilities and returns. Five statistical indexes, namely, correlation coefficient, mean absolute error, root mean square error, relative absolute error and root relative squared error are determined. The results of these metrices are used to develop colour intensity ranking. Findings Among the machine learning (ML) techniques used in this study, RF models has shown superior predictive ability for estimating the Bitcoin returns. Originality/value This study is first of its kind to use and compare ML models in the prediction of Bitcoins. More studies can be carried out by using further cryptocurrencies and other ML data-driven models in future.

Download Full-text

Data-driven approach for noise reduction in pressure-sensitive paint data based on modal expansion and time-series data at optimally placed points

Physics of Fluids ◽

10.1063/5.0049071 ◽

2021 ◽

Vol 33 (7) ◽

pp. 077105

Author(s):

Tomoki Inoue ◽

Yu Matsuda ◽

Tsubasa Ikami ◽

Taku Nonomura ◽

Yasuhiro Egami ◽

...

Keyword(s):

Time Series ◽

Noise Reduction ◽

Time Series Data ◽

Data Driven ◽

Series Data ◽

Pressure Sensitive Paint ◽

Modal Expansion ◽

Pressure Sensitive ◽

Data Driven Approach

Download Full-text

A Data-Driven Long Time-Series Electrical Line Trip Fault Prediction Method Using an Improved Stacked-Informer Network

Sensors ◽

10.3390/s21134466 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4466

Author(s):

Li Guo ◽

Runze Li ◽

Bin Jiang

Keyword(s):

Time Series ◽

Prediction Accuracy ◽

Time Series Data ◽

Short Term Memory ◽

Fault Prediction ◽

Data Driven ◽

Series Data ◽

Sequence Prediction ◽

Long Time Series ◽

Long Time

The monitoring of electrical equipment and power grid systems is very essential and important for power transmission and distribution. It has great significances for predicting faults based on monitoring a long sequence in advance, so as to ensure the safe operation of the power system. Many studies such as recurrent neural network (RNN) and long short-term memory (LSTM) network have shown an outstanding ability in increasing the prediction accuracy. However, there still exist some limitations preventing those methods from predicting long time-series sequences in real-world applications. To address these issues, a data-driven method using an improved stacked-Informer network is proposed, and it is used for electrical line trip faults sequence prediction in this paper. This method constructs a stacked-Informer network to extract underlying features of long sequence time-series data well, and combines the gradient centralized (GC) technology with the optimizer to replace the previously used Adam optimizer in the original Informer network. It has a superior generalization ability and faster training efficiency. Data sequences used for the experimental validation are collected from the wind and solar hybrid substation located in Zhangjiakou city, China. The experimental results and concrete analysis prove that the presented method can improve fault sequence prediction accuracy and achieve fast training in real scenarios.

Download Full-text