A Robust Data-Driven Method for Multiseasonality and Heteroscedasticity in Time Series Preprocessing

Internet of Things (IoT) is emerging, and 5G enables much more data transport from mobile and wireless sources. The data to be transmitted is too much compared to link capacity. Labelling data and transmit only useful part of the collected data or their features is a promising solution for this challenge. Abnormal data are valuable due to the need to train models and to detect anomalies when being compared to already overflowing normal data. Labelling can be done in data sources or edges to balance the load and computing between sources, edges, and centres. However, unsupervised labelling method is still a challenge preventing to implement the above solutions. Two main problems in unsupervised labelling are long-term dynamic multiseasonality and heteroscedasticity. This paper proposes a data-driven method to handle modelling and heteroscedasticity problems. The method contains the following main steps. First, raw data are preprocessed and grouped. Second, main models are built for each group. Third, models are adapted back to the original measured data to get raw residuals. Fourth, raw residuals go through deheteroscedasticity and become normalized residuals. Finally, normalized residuals are used to conduct anomaly detection. The experimental results with real-world data show that our method successfully increases receiver-operating characteristic (AUC) by about 30%.

Download Full-text

A comparison of data sources for creating a long-term time series of daily gridded solar radiation for Europe

Solar Energy ◽

10.1016/j.solener.2013.11.007 ◽

2014 ◽

Vol 99 ◽

pp. 152-171 ◽

Cited By ~ 49

Author(s):

Jędrzej S. Bojanowski ◽

Anton Vrieling ◽

Andrew K. Skidmore

Keyword(s):

Time Series ◽

Solar Radiation ◽

Data Sources

Download Full-text

Data-driven decomposition of long-term echosounder time series from ocean observatories

The Journal of the Acoustical Society of America ◽

10.1121/1.5014920 ◽

2017 ◽

Vol 142 (4) ◽

pp. 2719-2719 ◽

Cited By ~ 1

Author(s):

Wu-Jung Lee ◽

Valentina Staneva ◽

Bernease Herman ◽

Aleksandr Aravkin

Keyword(s):

Time Series ◽

Data Driven

Download Full-text

Fast and effective pseudo transfer entropy for bivariate data-driven causal inference

Scientific Reports ◽

10.1038/s41598-021-87818-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Riccardo Silini ◽

Cristina Masoller

Keyword(s):

Time Series ◽

Causal Inference ◽

Gaussian Approximation ◽

Computational Cost ◽

Transfer Entropy ◽

Data Driven ◽

Computational Time ◽

Real World Data ◽

Short Time Series ◽

Short Time

AbstractIdentifying, from time series analysis, reliable indicators of causal relationships is essential for many disciplines. Main challenges are distinguishing correlation from causality and discriminating between direct and indirect interactions. Over the years many methods for data-driven causal inference have been proposed; however, their success largely depends on the characteristics of the system under investigation. Often, their data requirements, computational cost or number of parameters limit their applicability. Here we propose a computationally efficient measure for causality testing, which we refer to as pseudo transfer entropy (pTE), that we derive from the standard definition of transfer entropy (TE) by using a Gaussian approximation. We demonstrate the power of the pTE measure on simulated and on real-world data. In all cases we find that pTE returns results that are very similar to those returned by Granger causality (GC). Importantly, for short time series, pTE combined with time-shifted (T-S) surrogates for significance testing strongly reduces the computational cost with respect to the widely used iterative amplitude adjusted Fourier transform (IAAFT) surrogate testing. For example, for time series of 100 data points, pTE and T-S reduce the computational time by $$82\%$$ 82 % with respect to GC and IAAFT. We also show that pTE is robust against observational noise. Therefore, we argue that the causal inference approach proposed here will be extremely valuable when causality networks need to be inferred from the analysis of a large number of short time series.

Download Full-text

RECURRENCE QUANTIFICATION ANALYSIS IN WATERSHED ECOSYSTEM RESEARCH

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127411028921 ◽

2011 ◽

Vol 21 (04) ◽

pp. 1113-1125 ◽

Cited By ~ 5

Author(s):

HOLGER LANGE

Keyword(s):

Time Series ◽

Recurrence Quantification Analysis ◽

Data Driven ◽

Model Data ◽

Ecosystem Research ◽

Data Comparison ◽

Recurrence Quantification ◽

Quantification Analysis ◽

Term Monitoring

In ecosystem research, data-driven approaches to modeling are of major importance. Models are more often than not shaped by the spatiotemporal structure of the observations: an inverse modeling approach prevails. Here, I investigate the insights obtained from Recurrence Quantification Analysis of observed ecosystem time series. As a typical example of available long-term monitoring data, I choose time series from hydrology and hydrochemistry. Besides providing insights into the nonstationary and nonlinear dynamics of these variables, RQA also enables a detailed and temporally local model-data comparison.

Download Full-text

Data-driven methods for dengue prediction and surveillance using real-world and Big Data: A systematic review

PLoS Neglected Tropical Diseases ◽

10.1371/journal.pntd.0010056 ◽

2022 ◽

Vol 16 (1) ◽

pp. e0010056

Author(s):

Emmanuelle Sylvestre ◽

Clarisse Joachim ◽

Elsa Cécilia-Joseph ◽

Guillaume Bouzillé ◽

Boris Campillo-Gimenez ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Big Data ◽

Real World ◽

Data Sources ◽

Data Driven ◽

Real World Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Dengue Prediction

Background Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. Methodology/Principal findings We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts. Conclusions/Significance Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders.

Download Full-text

Traffic Speed Prediction Based on Heterogeneous Graph Attention Residual Time Series Convolutional Networks

AI ◽

10.3390/ai2040039 ◽

2021 ◽

Vol 2 (4) ◽

pp. 650-661

Author(s):

Yan Du ◽

Xizhong Qin ◽

Zhenhong Jia ◽

Kun Yu ◽

Mengmeng Lin

Keyword(s):

Time Series ◽

Traffic Accidents ◽

Traffic Prediction ◽

Traffic Data ◽

Unified Framework ◽

Real World Data ◽

Convolutional Network ◽

Residual Time ◽

Social Events

Accurate and timely traffic forecasting is an important task for the realization of urban smart traffic. The random occurrence of social events such as traffic accidents will make traffic prediction particularly difficult. At the same time, most of the existing prediction methods rely on prior knowledge to obtain traffic maps and the obtained map structure cannot be guaranteed to be accurate for the current learning task. In addition, traffic data is highly non-linear and long-term dependent, so it is more difficult to achieve accurate prediction. In response to the above problems, this paper proposes a new integrated unified architecture for traffic prediction based on heterogeneous graph attention network combined with residual-time-series convolutional network, which is called HGA-ResTCN. First, the heterogeneous graph attention is used to capture the changes in the relationship between the traffic graph nodes caused by social events, so as to learn the link weights between the target node and its neighbor nodes; at the same time, by introducing the timing of residual links convolutional network to capture the long-term dependence of complex traffic data. These two models are integrated into a unified framework to learn in an end-to-end manner. Through testing on real-world data sets, the results show that the accuracy of the model in this paper is better than other proposed baselines.

Download Full-text

A new dam structural response estimation paradigm powered by deep learning and transfer learning techniques

Structural Health Monitoring ◽

10.1177/14759217211009780 ◽

2021 ◽

pp. 147592172110097

Author(s):

Yangtao Li ◽

Tengfei Bao ◽

Zhixin Gao ◽

Xiaosong Shu ◽

Kang Zhang ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Transfer Learning ◽

Health Monitoring ◽

Structural Response ◽

Data Driven ◽

Structural Health ◽

Learning Techniques

With the rapid development of information and communication techniques, dam structural health assessment based on data collected from structural health monitoring systems has become a trend. This allows for applying data-driven methods for dam safety analysis. However, data-driven models in most related literature are statistical and shallow machine learning models, which cannot capture the time series patterns or learn from long-term dependencies of dam structural response time series. Furthermore, the effectiveness and applicability of these models are only validated in a small data set and part of monitoring points in a dam structural health monitoring system. To address the problems, this article proposes a new modeling paradigm based on various deep learning and transfer learning techniques. The paradigm utilizes one-dimensional convolutional neural networks to extract the inherent features from dam structural response–related environmental quantity monitoring data. Then bidirectional gated recurrent unit with a self-attention mechanism is used to learn from long-term dependencies, and transfer learning is utilized to transfer knowledge learned from the typical monitoring point to the others. The proposed paradigm integrates the powerful modeling capability of deep learning networks and the flexible transferability of transfer learning. Rather than traditional models that rely on experience for feature selection, the proposed deep learning–based paradigm directly utilizes environmental monitoring time series as inputs to accurately estimate dam structural response changes. A high arch dam in long-term service is selected as the case study, and three monitoring items, including dam displacement, crack opening displacement, and seepage are used as the research objects. The experimental results show that the proposed paradigm outperforms conventional and shallow machine learning–based methods in all 41 tested monitoring points, which indicates that the proposed paradigm is capable of dealing with dam structural response estimation with high accuracy and robustness.

Download Full-text

878-P: Real-World Data from U.S. Patients Using a Long-Term Implantable Continuous Glucose Monitoring (CGM) System: Age Effect on Glycemic Control

Diabetes ◽

10.2337/db20-878-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 878-P

Author(s):

KATHERINE TWEDEN ◽

SAMANWOY GHOSH-DASTIDAR ◽

ANDREW D. DEHENNIS ◽

FRANCINE KAUFMAN

Keyword(s):

Glycemic Control ◽

Continuous Glucose Monitoring ◽

Real World ◽

Glucose Monitoring ◽

Age Effect ◽

Real World Data ◽

World Data

Download Full-text

401-P: Mortality, Major Adverse Cardiovascular Events (MACDE), and Diabetic Complications in Men with Hypogonadism and Type 2 Diabetes (T2DM) Receiving Long-Term Treatment with Testosterone Undecanoate Injections (TU): 11-Year Real-World Data

Diabetes ◽

10.2337/db20-401-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 401-P

Author(s):

AHMAD HAIDER ◽

KARIM S. HAIDER ◽

FARID SAAD

Keyword(s):

Type 2 Diabetes ◽

Diabetic Complications ◽

Cardiovascular Events ◽

Term Treatment ◽

Major Adverse Cardiovascular Events ◽

Real World Data ◽

Testosterone Undecanoate ◽

Long Term Treatment

Download Full-text

Aerosol optical depth retrievals at the Izaña Atmospheric Observatory from 1941 to 2013 by using artificial neural networks

Atmospheric Measurement Techniques ◽

10.5194/amt-9-53-2016 ◽

2016 ◽

Vol 9 (1) ◽

pp. 53-62 ◽

Cited By ~ 13

Author(s):

R. D. García ◽

O. E. García ◽

E. Cuevas ◽

V. E. Cachorro ◽

A. Barreto ◽

...

Keyword(s):

Neural Networks ◽

Time Series ◽

Artificial Neural Networks ◽

Aerosol Optical Depth ◽

Optical Depth ◽

Mineral Dust ◽

Dust Particles ◽

Filter Radiometer ◽

Artificial Neural

Abstract. This paper presents the reconstruction of a 73-year time series of the aerosol optical depth (AOD) at 500 nm at the subtropical high-mountain Izaña Atmospheric Observatory (IZO) located in Tenerife (Canary Islands, Spain). For this purpose, we have combined AOD estimates from artificial neural networks (ANNs) from 1941 to 2001 and AOD measurements directly obtained with a Precision Filter Radiometer (PFR) between 2003 and 2013. The analysis is limited to summer months (July–August–September), when the largest aerosol load is observed at IZO (Saharan mineral dust particles). The ANN AOD time series has been comprehensively validated against coincident AOD measurements performed with a solar spectrometer Mark-I (1984–2009) and AERONET (AErosol RObotic NETwork) CIMEL photometers (2004–2009) at IZO, obtaining a rather good agreement on a daily basis: Pearson coefficient, R, of 0.97 between AERONET and ANN AOD, and 0.93 between Mark-I and ANN AOD estimates. In addition, we have analysed the long-term consistency between ANN AOD time series and long-term meteorological records identifying Saharan mineral dust events at IZO (synoptical observations and local wind records). Both analyses provide consistent results, with correlations >  85 %. Therefore, we can conclude that the reconstructed AOD time series captures well the AOD variations and dust-laden Saharan air mass outbreaks on short-term and long-term timescales and, thus, it is suitable to be used in climate analysis.

Download Full-text