Missing data imputation for multisite rainfall networks: a comparison between geostatistical interpolation and data-mining estimation on different terrain types

Author(s):  
Fabio Oriani ◽  
Simon Stisen ◽  
Mehmet C. Demirel ◽  
Gregoire Mariethoz

<p>In the era of big data, missing data imputation remains a delicate topic for both the analysis of natural processes and to provide input data for physical models. We propose here a comparative study for missing data imputation on daily rainfall, a variable that can exhibit a complex structure composed of a dry/wet pattern and anisotropic sharp variations.</p><p>The seven algorithms considered can be grouped in two families: geostatistical interpolation techniques based on inverse-distance weighting and Kriging, widely used in gap-filling [1], and data-driven techniques based on the analysis of historical data patterns. This latter family of algorithms has been already applied to rainfall generation [2, 3], but it is not originally suitable to historical datasets presenting many data gaps. This happens because they usually operate in a rigid framework where, when a rainfall value is estimated for a station, the others are considered as predictor variables and require to be informed. To overcome this limitation, we propose here i) an adaptation of k-nearest neighbor (KNN) and ii) a new algorithm called Vector Sampling (VS), that combines concepts of multiple-point statistics and resampling. These data-driven algorithms can draw estimations from largely and variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset.</p><p>Tested on different case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially uniform rain events, geostatistical interpolation tends to minimize the error, while, in mountainous regions with non-stationary rainfall statistics, data mining can recover better the complex rainfall patterns. The VS algorithm, being faster than KNN and requiring minimal parametrization, turns out to be a convenient option for routine application if a representative historical dataset is available. VS is open-source and freely available at .</p><p> </p><p>REFERENCES:</p><p></p><p><span>org/</span></p><p><span>org/</span></p>

2020 ◽  
Vol 21 (10) ◽  
pp. 2325-2341
Author(s):  
Fabio Oriani ◽  
Simon Stisen ◽  
Mehmet C. Demirel ◽  
Gregoire Mariethoz

AbstractMissing rainfall data are a major limitation for distributed hydrological modeling and climate studies. Practitioners need reliable approaches that can be employed on a daily basis, often with too limited data in space to feed complex predictive models. In this study we compare different automatic approaches for missing data imputation, including geostatistical interpolation and pattern-based estimation algorithms. We introduce two pattern-based approaches based on the analysis of historical data patterns: (i) an iterative version of K-nearest neighbor (IKNN) and (ii) a new algorithm called vector sampling (VS) that combines concepts of multiple-point statistics and resampling. Both algorithms can draw estimations from variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset. Tested on five case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially homogeneous rain events, geostatistical interpolation tends to minimize the average error, while in mountainous regions with nonstationary rainfall statistics, data mining can recover better the rainfall patterns. The VS algorithm, requiring minimal parameterization, turns out to be a convenient option for routine application on complex and poorly gauged terrains.


Proceedings ◽  
2018 ◽  
Vol 2 (11) ◽  
pp. 698 ◽  
Author(s):  
Klemen Kenda ◽  
Filip Koprivec ◽  
Dunja Mladenić

In this study an algorithm for missing data imputation is presented. The algorithm uses measurements from neighboring sensors to estimate the missing values. Data-driven approach is used and methodology chooses the optimal available combination of modeling algorithm and available measurements to produce an estimate from the model with lowest error. The methodology was tested on Ljubljana polje aquifer data and has produced close to perfect results.


Attribute Reduction and missing data imputation have considerable influence in classification or other data mining task. New hybridization methodology like fuzzy rough set is more robust method to deal with imprecision and uncertainty for discrete as well as continuous data. Fuzzy rough attribute reduction with imputation (FRARI) algorithm has been proposed for attribute reduction with missing value imputation. So using FRARI algorithm complete reduce data set can be generated which has a great importance in different branches of artificial intelligence for data mining from databases. Efficiency and effectiveness of the proposed algorithm has been shown by experiment with real life data set.


2019 ◽  
Vol 50 (3) ◽  
pp. 860-877 ◽  
Author(s):  
Jie Lin ◽  
NianHua Li ◽  
Md Ashraful Alam ◽  
Yuqing Ma

Abstract Due to cluster instability, not in the cluster monitoring system. This paper focuses on the missing data imputation processing for the cluster monitoring application and proposes a new hybrid multiple imputation framework. This new imputation approach is different from the conventional multiple imputation technologies in the fact that it attempts to impute the missing data for an arbitrary missing pattern with a model-based and data-driven combination architecture. Essentially, the deep neural network, as the data model, extracts deep features from the data and deep features are further calculated then by a regression or data-driven strategies and used to create the estimation of missing data with the arbitrary missing pattern. This paper gives evidence that if we can train a deep neural network to construct the deep features of the data, imputation based on deep features is better than that directly on the original data. In the experiments, we compare the proposed method with other conventional multiple imputation approaches for varying missing data patterns, missing ratios, and different datasets including real cluster data. The result illustrates that when data encounters larger missing ratio and various missing patterns, the proposed algorithm has the ability to achieve more accurate and stable imputation performance.


Sign in / Sign up

Export Citation Format

Share Document