scholarly journals Explainable AI Framework for Multivariate Hydrochemical Time Series

Author(s):  
Michael Thrun ◽  
Alfred Ultsch ◽  
Lutz Breuer

The understanding of water quality and its underlying processes is important for the protection of aquatic environments enabling the rare opportunity of access to a domain expert. Hence, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series resulting in explanations that are interpretable by a domain expert. The XAI combines in three steps a data-driven choice of a distance measure with explainable cluster analysis through supervised decision trees. The multivariate time series consists of water quality measurements, including nitrate, electrical conductivity, and twelve other environmental parameters. The relationships between water quality and the environmental parameters are investigated by identifying similar days within a cluster and dissimilar days between clusters. The XAI does not depend on prior knowledge about data structure, and its explanations are tendentially contrastive. The relationships in the data can be visualized by a topographic map representing high-dimensional structures. Two comparable decision-based XAIs were unable to provide meaningful and relevant explanations from the multivariate time series data. Open-source code in R for the three steps of the XAI framework is provided.

2021 ◽  
Vol 3 (1) ◽  
pp. 170-204
Author(s):  
Michael C. Thrun ◽  
Alfred Ultsch ◽  
Lutz Breuer

The understanding of water quality and its underlying processes is important for the protection of aquatic environments. With the rare opportunity of access to a domain expert, an explainable AI (XAI) framework is proposed that is applicable to multivariate time series. The XAI provides explanations that are interpretable by domain experts. In three steps, it combines a data-driven choice of a distance measure with supervised decision trees guided by projection-based clustering. The multivariate time series consists of water quality measurements, including nitrate, electrical conductivity, and twelve other environmental parameters. The relationships between water quality and the environmental parameters are investigated by identifying similar days within a cluster and dissimilar days between clusters. The framework, called DDS-XAI, does not depend on prior knowledge about data structure, and its explanations are tendentially contrastive. The relationships in the data can be visualized by a topographic map representing high-dimensional structures. Two state of the art XAIs called eUD3.5 and iterative mistake minimization (IMM) were unable to provide meaningful and relevant explanations from the three multivariate time series data. The DDS-XAI framework can be swiftly applied to new data. Open-source code in R for all steps of the XAI framework is provided and the steps are structured application-oriented.


2020 ◽  
Author(s):  
Michael C. Thrun ◽  
Alfred Ultsch ◽  
Lutz Breuer

Abstract. The understanding of water quality and its underlying processes is important for the protection of aquatic environments. Here an explainable AI (XAI) based multivariate time series analytical framework is applied on high-frequency water quality measurements including nitrate and electrical conductivity and twelve other environmental parameters. The relationships between water quality and the environmental parameters are investigated by a cluster analysis which does not depend on prior knowledge about data structure. The cluster analysis is designed to find similar days within a cluster and dissimilar days between clusters. This allows for the data-driven choice of a distance measure. Using a swarm based AI system, the resulting cluster define three states of water bodies, which can be visualized by a topographic map of high-dimensional structures. These structures are explained by rules extracted from decision trees. The rules generated by the XAI system improve the understanding of aquatic environments. The model description presented here allows to extract meaningful, useful, and new knowledge from multivariate time series.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2708
Author(s):  
Achilleas Anastasiou ◽  
Peter Hatzopoulos ◽  
Alex Karagrigoriou ◽  
George Mavridoglou

In this work, we focus on the development of new distance measure algorithms, namely, the Causality Within Groups (CAWG), the Generalized Causality Within Groups (GCAWG) and the Causality Between Groups (CABG), all of which are based on the well-known Granger causality. The proposed distances together with the associated algorithms are suitable for multivariate statistical data analysis including unsupervised classification (clustering) purposes for the analysis of multivariate time series data with emphasis on financial and economic data where causal relationships are frequently present. For exploring the appropriateness of the proposed methodology, we implement, for illustrative purposes, the proposed algorithms to hierarchical clustering for the classification of 19 EU countries based on seven variables related to health resources in healthcare systems.


Water ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 1633
Author(s):  
Elena-Simona Apostol ◽  
Ciprian-Octavian Truică ◽  
Florin Pop ◽  
Christian Esposito

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.


2021 ◽  
Vol 13 (3) ◽  
pp. 67
Author(s):  
Eric Hitimana ◽  
Gaurav Bajpai ◽  
Richard Musabe ◽  
Louis Sibomana ◽  
Jayavel Kayalvizhi

Many countries worldwide face challenges in controlling building incidence prevention measures for fire disasters. The most critical issues are the localization, identification, detection of the room occupant. Internet of Things (IoT) along with machine learning proved the increase of the smartness of the building by providing real-time data acquisition using sensors and actuators for prediction mechanisms. This paper proposes the implementation of an IoT framework to capture indoor environmental parameters for occupancy multivariate time-series data. The application of the Long Short Term Memory (LSTM) Deep Learning algorithm is used to infer the knowledge of the presence of human beings. An experiment is conducted in an office room using multivariate time-series as predictors in the regression forecasting problem. The results obtained demonstrate that with the developed system it is possible to obtain, process, and store environmental information. The information collected was applied to the LSTM algorithm and compared with other machine learning algorithms. The compared algorithms are Support Vector Machine, Naïve Bayes Network, and Multilayer Perceptron Feed-Forward Network. The outcomes based on the parametric calibrations demonstrate that LSTM performs better in the context of the proposed application.


2018 ◽  
Vol 15 (147) ◽  
pp. 20180695 ◽  
Author(s):  
Simone Cenci ◽  
Serguei Saavedra

Biotic interactions are expected to play a major role in shaping the dynamics of ecological systems. Yet, quantifying the effects of biotic interactions has been challenging due to a lack of appropriate methods to extract accurate measurements of interaction parameters from experimental data. One of the main limitations of existing methods is that the parameters inferred from noisy, sparsely sampled, nonlinear data are seldom uniquely identifiable. That is, many different parameters can be compatible with the same dataset and can generalize to independent data equally well. Hence, it is difficult to justify conclusive assertions about the effect of biotic interactions without information about their associated uncertainty. Here, we develop an ensemble method based on model averaging to quantify the uncertainty associated with the effect of biotic interactions on community dynamics from non-equilibrium ecological time-series data. Our method is able to detect the most informative time intervals for each biotic interaction within a multivariate time series and can be easily adapted to different regression schemes. Overall, this novel approach can be used to associate a time-dependent uncertainty with the effect of biotic interactions. Moreover, because we quantify uncertainty with minimal assumptions about the data-generating process, our approach can be applied to any data for which interactions among variables strongly affect the overall dynamics of the system.


Sign in / Sign up

Export Citation Format

Share Document