scholarly journals Estimating the Causal Effect from Partially Observed Time Series

Author(s):  
Akane Iseki ◽  
Yusuke Mukuta ◽  
Yoshitaka Ushiku ◽  
Tatsuya Harada

Many real-world systems involve interacting time series. The ability to detect causal dependencies between system components from observed time series of their outputs is essential for understanding system behavior. The quantification of causal influences between time series is based on the definition of some causality measure. Partial Canonical Correlation Analysis (Partial CCA) and its extensions are examples of methods used for robustly estimating the causal relationships between two multidimensional time series even when the time series are short. These methods assume that the input data are complete and have no missing values. However, real-world data often contain missing values. It is therefore crucial to estimate the causality measure robustly even when the input time series is incomplete. Treating this problem as a semi-supervised learning problem, we propose a novel semi-supervised extension of probabilistic Partial CCA called semi-Bayesian Partial CCA. Our method exploits the information in samples with missing values to prevent the overfitting of parameter estimation even when there are few complete samples. Experiments based on synthesized and real data demonstrate the ability of the proposed method to estimate causal relationships more correctly than existing methods when the data contain missing values, the dimensionality is large, and the number of samples is small.

Author(s):  
Amir Hossein Adineh ◽  
Zahra Narimani ◽  
Suresh Chandra Satapathy

Over last decades, time series data analysis has been in practice of specific importance. Different domains such as financial data analysis, analyzing biological data and speech recognition inherently deal with time dependent signals. Monitoring the past behavior of signals is a key for precise predicting the behavior of a system in near future. In scenarios such as financial data prediction, the predominant signal has a periodic behavior (starting from beginning of the month, week, etc.) and a general trend and seasonal behavior can also be assumed. Autoregressive Integrated Moving Average (ARIMA) model and its seasonal extension, SARIMA, have been widely used in forecasting time-series data, and are also capable of dealing with the seasonal behavior/trend in the data. Although the behavior of data may be autoregressive and trends and seasonality can be detected and handled by SARIMA, the data is not always exactly compatible with SARIMA (or more generally ARIMA) assumptions. In addition, the existence of missing data is not pre-assumed in SARIMA, while in real-world, there can be always missing data for different reasons such as holidays for which no data may be recorded. For different week days, different working hours may be a cause of observing irregular patterns compared to what is expected by SARIMA assumptions. In this paper, we investigate the effectiveness of applying SARIMA on such real-world data, and demonstrate preprocessing methods that can be applied in order to make the data more suitable to be modeled by SARIMA model. The data in the existing research is derived from transactions of a mutual fund investment company, which contains missing values (single point and intervals) and also irregularities as a result of the number of working hours per week days being different from each other which makes the data inconsistent leading to poor result without preprocessing. In addition, the number of data points was not adequate at the time of analysis in order to fit a SARIM model. Preprocessing steps such as filling missing values and tricks to make data consistent has been proposed to deal with existing problems. Results show that prediction performance of SARIMA on this set of real-world data is significantly improved by applying several preprocessing steps introduced in order to deal with mentioned circumstances. The proposed preprocessing steps can be used in other real-world time-series data analysis.


Author(s):  
Marcelo N. de Sousa ◽  
Ricardo Sant’Ana ◽  
Rigel P. Fernandes ◽  
Julio Cesar Duarte ◽  
José A. Apolinário ◽  
...  

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 969
Author(s):  
Miguel C. Soriano ◽  
Luciano Zunino

Time-delayed interactions naturally appear in a multitude of real-world systems due to the finite propagation speed of physical quantities. Often, the time scales of the interactions are unknown to an external observer and need to be inferred from time series of observed data. We explore, in this work, the properties of several ordinal-based quantifiers for the identification of time-delays from time series. To that end, we generate artificial time series of stochastic and deterministic time-delay models. We find that the presence of a nonlinearity in the generating model has consequences for the distribution of ordinal patterns and, consequently, on the delay-identification qualities of the quantifiers. Here, we put forward a novel ordinal-based quantifier that is particularly sensitive to nonlinearities in the generating model and compare it with previously-defined quantifiers. We conclude from our analysis on artificially generated data that the proper identification of the presence of a time-delay and its precise value from time series benefits from the complementary use of ordinal-based quantifiers and the standard autocorrelation function. We further validate these tools with a practical example on real-world data originating from the North Atlantic Oscillation weather phenomenon.


2019 ◽  
Vol 22 (2) ◽  
pp. 255-270 ◽  
Author(s):  
Manuel D. Ortigueira ◽  
Valeriy Martynyuk ◽  
Mykola Fedula ◽  
J. Tenreiro Machado

Abstract The ability of the so-called Caputo-Fabrizio (CF) and Atangana-Baleanu (AB) operators to create suitable models for real data is tested with real world data. Two alternative models based on the CF and AB operators are assessed and compared with known models for data sets obtained from electrochemical capacitors and the human body electrical impedance. The results show that the CF and AB descriptions perform poorly when compared with the classical fractional derivatives.


2020 ◽  
Vol 34 (04) ◽  
pp. 5956-5963
Author(s):  
Xianfeng Tang ◽  
Huaxiu Yao ◽  
Yiwei Sun ◽  
Charu Aggarwal ◽  
Prasenjit Mitra ◽  
...  

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.


Author(s):  
Juheng Zhang ◽  
Xiaoping Liu ◽  
Xiao-Bai Li

We study strategically missing data problems in predictive analytics with regression. In many real-world situations, such as financial reporting, college admission, job application, and marketing advertisement, data providers often conceal certain information on purpose in order to gain a favorable outcome. It is important for the decision-maker to have a mechanism to deal with such strategic behaviors. We propose a novel approach to handle strategically missing data in regression prediction. The proposed method derives imputation values of strategically missing data based on the Support Vector Regression models. It provides incentives for the data providers to disclose their true information. We show that with the proposed method imputation errors for the missing values are minimized under some reasonable conditions. An experimental study on real-world data demonstrates the effectiveness of the proposed approach.


Author(s):  
Christos N. Stefanakos

In the present work, return periods of various level values of significant wave height in the Gulf of Mexico are given. The predictions are based on a new method for nonstationary extreme-value calculations that have recently been published. This enhanced method exploits efficiently the nonstationary modeling of wind or wave time series and a new definition of return period using the MEan Number of Upcrossings of the level value x* (MENU method). The whole procedure is applied to long-term measurements of wave height in the Gulf of Mexico. Two kinds of data have been used: long-term time series of buoy measurements, and satellite altimeter data. Measured time series are incomplete and a novel procedure for filling in of missing values is applied before proceeding with the extreme-value calculations. Results are compared with several variants of traditional methods, giving more realistic estimates than the traditional predictions. This is in accordance with the results of other methods that take also into account the dependence structure of the examined time series.


Mathematics ◽  
2019 ◽  
Vol 7 (6) ◽  
pp. 511 ◽  
Author(s):  
Ivo Petráš ◽  
Ján Terpák

This paper deals with the application of the fractional calculus as a tool for mathematical modeling and analysis of real processes, so called fractional-order processes. It is well-known that most real industrial processes are fractional-order ones. The main purpose of the article is to demonstrate a simple and effective method for the treatment of the output of fractional processes in the form of time series. The proposed method is based on fractional-order differentiation/integration using the Grünwald–Letnikov definition of the fractional-order operators. With this simple approach, we observe important properties in the time series and make decisions in real process control. Finally, an illustrative example for a real data set from a steelmaking process is presented.


2017 ◽  
Vol 33 (S1) ◽  
pp. 149-149
Author(s):  
Gordon Bache ◽  
Sukh Tatla ◽  
Deborah Simpson

INTRODUCTION:A conventional approach to communicating value is to model the budget impact of a medicine and the associated formulations in which it is available to be prescribed. However, such an approach does not demonstrate the actual realization of the proposed impact. This abstract outlines an approach to presenting retrospective data back to healthcare professionals (HCP) that blends assumptions and real-world data. For illustrative purposes, we present the results of an application of the model for subcutaneously delivered trastuzumab in an anonymized trust in Yorkshire and Humber.METHODS:The authors developed a model that examined one calendar year (from April 2014) of redistributed sales data for both the intravenous and subcutaneous formulations of trastuzumab for every National Health Service (NHS) trust in England. A series of baseline assumptions (1) were used to model the resource impact of different formulations such as chair time, HCP time, pharmacy preparation time, consumables, wastage, and other considerations. Impacts were estimated at the individual attendance level and scaled to the caseload. These baseline assumptions could then be overwritten by the individual trust using local data.RESULTS:The site delivered approximately 985 doses of subcutaneous trastuzumab over a period of 12 months from April 2014, which represented about 76 percent of the total number of doses delivered. Chair time is estimated to have reduced by 22 minutes per attendance, resulting in a total saving of 361hours. HCP administration time is estimated to have reduced by 23 minutes per attendance, resulting in a total saving of 378 hours based on changing 985 IV doses to SC therapy.CONCLUSIONS:Blending real data and assumptions to provide a retrospective assessment of actual benefits realized back to HCPs is a powerful tool for demonstrating real-world value at both an individual trust and system level.


Author(s):  
Narayan Puthanmadam Subramaniyam ◽  
Reik V. Donner ◽  
Davide Caron ◽  
Gabriella Panuccio ◽  
Jari Hyttinen

AbstractIdentifying causal relationships is a challenging yet crucial problem in many fields of science like epidemiology, climatology, ecology, genomics, economics and neuroscience, to mention only a few. Recent studies have demonstrated that ordinal partition transition networks (OPTNs) allow inferring the coupling direction between two dynamical systems. In this work, we generalize this concept to the study of the interactions among multiple dynamical systems and we propose a new method to detect causality in multivariate observational data. By applying this method to numerical simulations of coupled linear stochastic processes as well as two examples of interacting nonlinear dynamical systems (coupled Lorenz systems and a network of neural mass models), we demonstrate that our approach can reliably identify the direction of interactions and the associated coupling delays. Finally, we study real-world observational microelectrode array electrophysiology data from rodent brain slices to identify the causal coupling structures underlying epileptiform activity. Our results, both from simulations and real-world data, suggest that OPTNs can provide a complementary and robust approach to infer causal effect networks from multivariate observational data.


Sign in / Sign up

Export Citation Format

Share Document