Heterogeneous Graphical Granger Causality by Minimum Message Length

The heterogeneous graphical Granger model (HGGM) for causal inference among processes with distributions from an exponential family is efficient in scenarios when the number of time observations is much greater than the number of time series, normally by several orders of magnitude. However, in the case of “short” time series, the inference in HGGM often suffers from overestimation. To remedy this, we use the minimum message length principle (MML) to determinate the causal connections in the HGGM. The minimum message length as a Bayesian information-theoretic method for statistical model selection applies Occam’s razor in the following way: even when models are equal in their measure of fit-accuracy to the observed data, the one generating the most concise explanation of data is more likely to be correct. Based on the dispersion coefficient of the target time series and on the initial maximum likelihood estimates of the regression coefficients, we propose a minimum message length criterion to select the subset of causally connected time series with each target time series and derive its form for various exponential distributions. We propose two algorithms—the genetic-type algorithm (HMMLGA) and exHMML to find the subset. We demonstrated the superiority of both algorithms in synthetic experiments with respect to the comparison methods Lingam, HGGM and statistical framework Granger causality (SFGC). In the real data experiments, we used the methods to discriminate between pregnancy and labor phase using electrohysterogram data of Islandic mothers from Physionet databasis. We further analysed the Austrian climatological time measurements and their temporal interactions in rain and sunny days scenarios. In both experiments, the results of HMMLGA had the most realistic interpretation with respect to the comparison methods. We provide our code in Matlab. To our best knowledge, this is the first work using the MML principle for causal inference in HGGM.

Download Full-text

Minimum Message Length in Hybrid ARMA and LSTM Model Forecasting

Entropy ◽

10.3390/e23121601 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1601

Author(s):

Zheng Fang ◽

David L. Dowe ◽

Shelton Peiris ◽

Dedi Rosadi

Keyword(s):

Time Series ◽

Environmental Science ◽

Short Term Memory ◽

Arima Model ◽

Theoretic Approach ◽

Arma Models ◽

Information Theoretic ◽

Minimum Message Length ◽

Message Length ◽

Real World Datasets

Modeling and analysis of time series are important in applications including economics, engineering, environmental science and social science. Selecting the best time series model with accurate parameters in forecasting is a challenging objective for scientists and academic researchers. Hybrid models combining neural networks and traditional Autoregressive Moving Average (ARMA) models are being used to improve the accuracy of modeling and forecasting time series. Most of the existing time series models are selected by information-theoretic approaches, such as AIC, BIC, and HQ. This paper revisits a model selection technique based on Minimum Message Length (MML) and investigates its use in hybrid time series analysis. MML is a Bayesian information-theoretic approach and has been used in selecting the best ARMA model. We utilize the long short-term memory (LSTM) approach to construct a hybrid ARMA-LSTM model and show that MML performs better than AIC, BIC, and HQ in selecting the model—both in the traditional ARMA models (without LSTM) and with hybrid ARMA-LSTM models. These results held on simulated data and both real-world datasets that we considered. We also develop a simple MML ARIMA model.

Download Full-text

Causal inference with multiple time series: principles and problems

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0613 ◽

2013 ◽

Vol 371 (1997) ◽

pp. 20110613 ◽

Cited By ~ 42

Author(s):

Michael Eichler

Keyword(s):

Time Series ◽

Causal Inference ◽

Granger Causality ◽

Latent Variables ◽

Time Series Data ◽

Series Data ◽

Multiple Time ◽

Identification Algorithm ◽

Multiple Time Series ◽

Theoretical Justification

I review the use of the concept of Granger causality for causal inference from time-series data. First, I give a theoretical justification by relating the concept to other theoretical causality measures. Second, I outline possible problems with spurious causality and approaches to tackle these problems. Finally, I sketch an identification algorithm that learns causal time-series structures in the presence of latent variables. The description of the algorithm is non-technical and thus accessible to applied scientists who are interested in adopting the method.

Download Full-text

PRESEE: An MDL/MML Algorithm to Time-Series Stream Segmenting

The Scientific World JOURNAL ◽

10.1155/2013/386180 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Kaikuo Xu ◽

Yexi Jiang ◽

Mingjie Tang ◽

Changan Yuan ◽

Changjie Tang

Keyword(s):

Time Series ◽

Real Time ◽

Processing Speed ◽

Data Stream ◽

Minimum Description Length ◽

Data Types ◽

Minimum Message Length ◽

Message Length ◽

Different Types ◽

State Of Art

Time-series stream is one of the most common data types in data mining field. It is prevalent in fields such as stock market, ecology, and medical care. Segmentation is a key step to accelerate the processing speed of time-series stream mining. Previous algorithms for segmenting mainly focused on the issue of ameliorating precision instead of paying much attention to the efficiency. Moreover, the performance of these algorithms depends heavily on parameters, which are hard for the users to set. In this paper, we proposePRESEE(parameter-free, real-time, and scalable time-series stream segmenting algorithm), which greatly improves the efficiency of time-series stream segmenting. PRESEE is based on both MDL (minimum description length) and MML (minimum message length) methods, which could segment the data automatically. To evaluate the performance of PRESEE, we conduct several experiments on time-series streams of different types and compare it with the state-of-art algorithm. The empirical results show that PRESEE is very efficient for real-time stream datasets by improving segmenting speed nearly ten times. The novelty of this algorithm is further demonstrated by the application of PRESEE in segmenting real-time stream datasets from ChinaFLUX sensor networks data stream.

Download Full-text

Minimum Message Length in Hybrid ARMA and LSTM Model Forecasting

10.20944/preprints202110.0049.v1 ◽

2021 ◽

Author(s):

Zheng Fang ◽

David L. Dowe ◽

Shelton Peiris ◽

Dedi Rosadi

Keyword(s):

Time Series ◽

Deep Learning ◽

Short Term Memory ◽

Moving Average ◽

Arma Model ◽

Autoregressive Moving Average ◽

Information Theoretic ◽

Minimum Message Length ◽

Message Length ◽

Long Short Term Memory

We investigate the power of time series analysis based on a variety of information-theoretic approaches from statistics (AIC, BIC) and machine learning (Minimum Message Length) - and we then compare their efficacy with traditional time series model and with hybrids involving deep learning. More specifically, we develop AIC, BIC and Minimum Message Length (MML) ARMA (autoregressive moving average) time series models - with this Bayesian information-theoretic MML ARMA modelling already being new work. We then study deep learning based algorithms in time series forecasting, using Long Short Term Memory (LSTM), and we then combine this with the ARMA modelling to produce a hybrid ARMA-LSTM prediction. Part of the purpose of the use of LSTM is to seek capture any hidden information in the residuals left from the traditional ARMA model. We show that MML not only outperforms earlier statistical approaches to ARMA modelling, but we further show that the hybrid MML ARMA-LSTM models outperform both ARMA models and LSTM models.

Download Full-text

Identifying error and maintenance intervention of pavement roughness time series with minimum message length inference

International Journal of Pavement Engineering ◽

10.1080/10298430802621549 ◽

2010 ◽

Vol 11 (1) ◽

pp. 37-47

Author(s):

Matthew Byrne ◽

David Albrecht ◽

Jay Sanjayan

Keyword(s):

Time Series ◽

Minimum Message Length ◽

Message Length ◽

Pavement Roughness ◽

Maintenance Intervention

Download Full-text

Minimum message length analysis of multiple short time series

Statistics & Probability Letters ◽

10.1016/j.spl.2015.09.021 ◽

2016 ◽

Vol 110 ◽

pp. 318-328 ◽

Cited By ~ 1

Author(s):

Daniel F. Schmidt ◽

Enes Makalic

Keyword(s):

Time Series ◽

Short Time Series ◽

Minimum Message Length ◽

Message Length ◽

Length Analysis ◽

Short Time

Download Full-text

Minimum Message Length Moving Average Time Series Data Mining

2005 ICSC Congress on Computational Intelligence Methods and Applications ◽

10.1109/cima.2005.1662352 ◽

2006 ◽

Author(s):

M. Sak ◽

D.L. Dowe ◽

S. Ray

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Moving Average ◽

Series Data ◽

Minimum Message Length ◽

Time Series Data Mining ◽

Message Length

Download Full-text

Minimum Message Length in Hybrid ARMA and LSTM Model Forecasting

10.20944/preprints202110.0049.v2 ◽

2021 ◽

Author(s):

Zheng Fang ◽

David L. Dowe ◽

Shelton Peiris ◽

Dedi Rosadi

Keyword(s):

Time Series ◽

Deep Learning ◽

Short Term Memory ◽

Moving Average ◽

Arma Model ◽

Autoregressive Moving Average ◽

Information Theoretic ◽

Minimum Message Length ◽

Message Length ◽

Long Short Term Memory

Download Full-text

Poisson Graphical Granger Causality by Minimum Message Length

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-67658-2_30 ◽

2021 ◽

pp. 526-541

Author(s):

Kateřina Hlaváčková-Schindler ◽

Claudia Plant

Keyword(s):

Granger Causality ◽

Minimum Message Length ◽

Message Length ◽

Graphical Granger Causality

Download Full-text

Kumaraswamy Generalized Power Lomax Distributionand Its Applications

Stats ◽

10.3390/stats4010003 ◽

2021 ◽

Vol 4 (1) ◽

pp. 28-45

Author(s):

Vasili B.V. Nagarjuna ◽

R. Vishnu Vardhan ◽

Christophe Chesneau

Keyword(s):

Hazard Rate ◽

Real Data ◽

Rate Function ◽

Maximum Likelihood Estimates ◽

Parameter Estimates ◽

Parameter Distribution ◽

Data Sets ◽

Lomax Distribution ◽

Entropy Measures ◽

Modeling Behavior

In this paper, a new five-parameter distribution is proposed using the functionalities of the Kumaraswamy generalized family of distributions and the features of the power Lomax distribution. It is named as Kumaraswamy generalized power Lomax distribution. In a first approach, we derive its main probability and reliability functions, with a visualization of its modeling behavior by considering different parameter combinations. As prime quality, the corresponding hazard rate function is very flexible; it possesses decreasing, increasing and inverted (upside-down) bathtub shapes. Also, decreasing-increasing-decreasing shapes are nicely observed. Some important characteristics of the Kumaraswamy generalized power Lomax distribution are derived, including moments, entropy measures and order statistics. The second approach is statistical. The maximum likelihood estimates of the parameters are described and a brief simulation study shows their effectiveness. Two real data sets are taken to show how the proposed distribution can be applied concretely; parameter estimates are obtained and fitting comparisons are performed with other well-established Lomax based distributions. The Kumaraswamy generalized power Lomax distribution turns out to be best by capturing fine details in the structure of the data considered.

Download Full-text