scholarly journals Causal Discovery for Climate Time Series in the Presence of Unobserved Variables

Author(s):  
Andreas Gerhardus ◽  
Jakob Runge

<p>Scientific inquiry seeks to understand natural phenomena by understanding their underlying processes, i.e., by identifying cause and effect. In addition to mere scientific curiosity, an understanding of cause and effect relationships is necessary to predict the effect of changing dynamical regimes and for the attribution of extreme events to potential causes. It is thus an important question to ask how, in cases where controlled experiments are not feasible, causation can still be inferred from the statistical dependencies in observed time series.</p><p>A central obstacle for such an inference is the potential existence of unobserved causally relevant variables. Arguably, this is more likely to be the case than not, for example unmeasured deep oceanic variables in atmospheric processes. Unobserved variables can act as confounders (meaning they are a common cause of two or more observed variables) and thus introduce spurious, i.e., non-causal dependencies. Despite these complications, the last three decades have seen the development of so-called causal discovery algorithms (an example being FCI by Spirtes et al., 1999) that are often able to identify spurious associations and to distinguish them from genuine causation. This opens the possibility for a data-driven approach to infer cause and effect relationships among climate variables, thereby contributing to a better understanding of Earth's complex climate system.</p><p>These methods are, however, not yet well adapted to some specific challenges that climate time series often come with, e.g. strong autocorrelation, time lags and nonlinearities. To close this methodological gap, we generalize the ideas of the recent PCMCI causal discovery algorithm (Runge et al., 2019) to time series where unobserved causally relevant variables may exist (in contrast, PCMCI made the assumption of no confounding). Further, we present preliminary applications to modes of climate variability.</p>

2021 ◽  
Author(s):  
Andreas Gerhardus ◽  
Jakob Runge

<p>The quest to understand cause and effect relationships is at the basis of the scientific enterprise. In cases where the classical approach of controlled experimentation is not feasible, methods from the modern framework of causal discovery provide an alternative way to learn about cause and effect from observational, i.e., non-experimental data. Recent years have seen an increasing interest in these methods from various scientific fields, for example in the climate and Earth system sciences (where large scale experimentation is often infeasible) as well as machine learning and artificial intelligence (where models based on an understanding of cause and effect promise to be more robust under changing conditions.)</p><p>In this contribution we present the novel LPCMCI algorithm for learning the cause and effect relationships in multivariate time series. The algorithm is specifically adapted to several challenges that are prevalent in time series considered in the climate and Earth system sciences, for example strong autocorrelations, combinations of time lagged and contemporaneous causal relationships, as well as nonlinearities. It moreover allows for the existence of latent confounders, i.e., it allows for unobserved common causes. While this complication is faced in most realistic scenarios, especially when investigating a system as complex as Earth's climate system, it is nevertheless assumed away in many existing algorithms. We demonstrate applications of LPCMCI to examples from a climate context and compare its performance to competing methods.</p><p>Related reference:<br>Gerhardus, Andreas and Runge, Jakob (2020). High-recall causal discovery for autocorrelated time series with latent confounders. In Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020). </p>


2021 ◽  
Author(s):  
Jakob Runge ◽  
Andreas Gerhardus

<p>Discovering causal dependencies from observational time series datasets is a major problem in better understanding the complex dynamical system Earth. Recent methodological advances have addressed major challenges such as high-dimensionality and nonlinearity (PCMCI, Runge et al. Sci. Adv. 2019), as well as instantaneous causal links (PCMCI+, Runge UAI, 2020) and hidden variables (LPCMCI, Gerhardus and Runge, 2020), but many more remain. In this presentation I will give an overview of challenges and methods and present a recent approach, Ensemble-PCMCI, to analyze ensembles of climate time series. An example for this are initialized ensemble forecasts. Since the individual samples can then be created from several time series instead of different time steps from a single time series, such cases allow to relax the assumption of stationarity and hence to analyze whether and how the underlying causal relationships change over time. We compare Ensemble-PCMCI to other methods and discuss preliminary applications.</p><p>Runge et al., Detecting and quantifying causal associations in large nonlinear time series datasets, Science Advances eeaau4996 (2019).</p><p>Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020, Toronto, Canada, 2019, AUAI Press, 2020</p><p>Gerhardus, A. & Runge, J. High-recall causal discovery for autocorrelated time series with latent confounders. Advances in Neural Information Processing Systems, 2020, 33</p>


Author(s):  
Jennifer L. Castle ◽  
David F. Hendry

Shared features of economic and climate time series imply that tools for empirically modeling nonstationary economic outcomes are also appropriate for studying many aspects of observational climate-change data. Greenhouse gas emissions, such as carbon dioxide, nitrous oxide, and methane, are a major cause of climate change as they cumulate in the atmosphere and reradiate the sun’s energy. As these emissions are currently mainly due to economic activity, economic and climate time series have commonalities, including considerable inertia, stochastic trends, and distributional shifts, and hence the same econometric modeling approaches can be applied to analyze both phenomena. Moreover, both disciplines lack complete knowledge of their respective data-generating processes (DGPs), so model search retaining viable theory but allowing for shifting distributions is important. Reliable modeling of both climate and economic-related time series requires finding an unknown DGP (or close approximation thereto) to represent multivariate evolving processes subject to abrupt shifts. Consequently, to ensure that DGP is nested within a much larger set of candidate determinants, model formulations to search over should comprise all potentially relevant variables, their dynamics, indicators for perturbing outliers, shifts, trend breaks, and nonlinear functions, while retaining well-established theoretical insights. Econometric modeling of climate-change data requires a sufficiently general model selection approach to handle all these aspects. Machine learning with multipath block searches commencing from very general specifications, usually with more candidate explanatory variables than observations, to discover well-specified and undominated models of the nonstationary processes under analysis, offers a rigorous route to analyzing such complex data. To do so requires applying appropriate indicator saturation estimators (ISEs), a class that includes impulse indicators for outliers, step indicators for location shifts, multiplicative indicators for parameter changes, and trend indicators for trend breaks. All ISEs entail more candidate variables than observations, often by a large margin when implementing combinations, yet can detect the impacts of shifts and policy interventions to avoid nonconstant parameters in models, as well as improve forecasts. To characterize nonstationary observational data, one must handle all substantively relevant features jointly: A failure to do so leads to nonconstant and mis-specified models and hence incorrect theory evaluation and policy analyses.


Author(s):  
J. Doblas ◽  
A. Carneiro ◽  
Y. Shimabukuro ◽  
S. Sant’Anna ◽  
L. Aragão ◽  
...  

Abstract. In this study we analyse the factors of variability of Sentinel-1 C-band radar backscattering over tropical rainforests, and propose a method to reduce the effects of this variability on deforestation detection algorithms. To do so, we developed a random forest regression model that relates Sentinel-1 gamma nought values with local climatological data and forest structure information. The model was trained using long time-series of 26 relevant variables, sampled over 6 undisturbed tropical forests areas. The resulting model explained 71.64% and 73.28% of the SAR signal variability for VV and VH polarizations, respectively. Once the best model for every polarization was selected, it was used to stabilize extracted pixel-level data of forested and non-deforested areas, which resulted on a 10 to 14% reduction of time-series variability, in terms of standard deviation. Then a statistically robust deforestation detection algorithm was applied to the stabilized time-series. The results show that the proposed method reduced the rate of false positives on both polarizations, especially on VV (from 21% to 2%, α=0.01). Meanwhile, the omission errors increased on both polarizations (from 27% to 37% in VV and from 27% to 33% on VV, α=0.01). The proposed method yielded slightly better results when compared with an alternative state-of-the-art approach (spatial normalization).


2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
Tatjana Tasic ◽  
Sladjana Jovanovic ◽  
Omer Mohamoud ◽  
Tamara Skoric ◽  
Nina Japundzic-Zigon ◽  
...  

Objectives. This paper analyses temporal dependency in the time series recorded from aging rats, the healthy ones and those with early developed hypertension. The aim is to explore effects of age and hypertension on mutual sample relationship along the time axis.Methods. A copula method is applied to raw and to differentially coded signals. The latter ones were additionally binary encoded for a joint conditional entropy application. The signals were recorded from freely moving male Wistar rats and from spontaneous hypertensive rats, aged 3 months and 12 months.Results. The highest level of comonotonic behavior of pulse interval with respect to systolic blood pressure is observed at time lagsτ=0, 3, and 4, while a strong counter-monotonic behavior occurs at time lagsτ=1and 2.Conclusion. Dynamic range of aging rats is considerably reduced in hypertensive groups. Conditional entropy of systolic blood pressure signal, compared to unconditional, shows an increased level of discrepancy, except for a time lag 1, where the equality is preserved in spite of the memory of differential coder. The antiparallel streams play an important role at single beat time lag.


Sign in / Sign up

Export Citation Format

Share Document