Evaluation of Seven Gap-Filling Techniques for Daily Station-Based Rainfall Datasets in South Ethiopia

Meteorological stations, mainly located in developing countries, have gigantic missing values in the climate dataset (rainfall and temperature). Ignoring the missing values from analyses has been used as a technique to manage it. However, it leads to partial and biased results in data analyses. Instead, filling the data gaps using the reference datasets is a better and widely used approach. Thus, this study was initiated to evaluate the seven gap-filling techniques in daily rainfall datasets in five meteorological stations of Wolaita Zone and the surroundings in South Ethiopia. The considered gap-filling techniques in this study were simple arithmetic means (SAM), normal ratio method (NRM), correlation coefficient weighing (CCW), inverse distance weighting (IDW), multiple linear regression (MLR), empirical quantile mapping (EQM), and empirical quantile mapping plus (EQM+). The techniques were preferred because of their computational simplicity and appreciable accuracies. Their performance was evaluated against mean absolute error (MAE), root mean square error (RMSE), skill scores (SS), and Pearson’s correlation coefficients (R). The results indicated that MLR outperformed other techniques in all of the five meteorological stations. It showed the lowest RMSE and the highest SS and R in all stations. Four techniques (SAM, NRM, CCW, and IDW) showed similar performance and were second-ranked in all of the stations with little exceptions in time series. EQM+ improved (not substantial) the performance levels of gap-filling techniques in some stations. In general, MLR is suggested to fill in the missing values of the daily rainfall time series. However, the second-ranked techniques could also be used depending on the required time series (period) of each station. The techniques have better performance in stations located in higher altitudes. The authors expect a substantial contribution of this paper to the achievement of sustainable development goal thirteen (climate action) through the provision of gap-filling techniques with better accuracy.

Download Full-text

A multiple threshold method for fitting the generalized Pareto distribution and a simple representation of the rainfall process

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-7-4957-2010 ◽

2010 ◽

Vol 7 (4) ◽

pp. 4957-4994 ◽

Cited By ~ 5

Author(s):

R. Deidda

Keyword(s):

Time Series ◽

Distribution Function ◽

Pareto Distribution ◽

Generalized Pareto Distribution ◽

Daily Rainfall ◽

Rainfall Time Series ◽

Threshold Method ◽

Generalized Pareto ◽

Multiple Threshold ◽

Optimum Threshold

Abstract. Previous studies indicate the generalized Pareto distribution (GPD) as a suitable distribution function to reliably describe the exceedances of daily rainfall records above a proper optimum threshold, which should be selected as small as possible to retain the largest sample while assuring an acceptable fitting. Such an optimum threshold may differ from site to site, affecting consequently not only the GPD scale parameter, but also the probability of threshold exceedance. Thus a first objective of this paper is to derive some expressions to parameterize a simple threshold-invariant three-parameter distribution function which is able to describe zero and non zero values of rainfall time series by assuring a perfect overlapping with the GPD fitted on the exceedances of any threshold larger than the optimum one. Since the proposed distribution does not depend on the local thresholds adopted for fitting the GPD, it will only reflect the on-site climatic signature and thus appears particularly suitable for hydrological applications and regional analyses. A second objective is to develop and test the Multiple Threshold Method (MTM) to infer the parameters of interest on the exceedances of a wide range of thresholds using again the concept of parameters threshold-invariance. We show the ability of the MTM in fitting historical daily rainfall time series recorded with different resolutions. Finally, we prove the supremacy of the MTM fit against the standard single threshold fit, often adopted for partial duration series, by evaluating and comparing the performances on Monte Carlo samples drawn by GPDs with different shape and scale parameters and different discretizations.

Download Full-text

A Quantile Mapping Method to Fill in Discontinued Daily Precipitation Time Series

Water ◽

10.3390/w12082304 ◽

2020 ◽

Vol 12 (8) ◽

pp. 2304

Author(s):

Manolis G. Grillakis ◽

Christos Polykretis ◽

Stelios Manoudakis ◽

Konstantinos D. Seiradakis ◽

Dimitrios D. Alexakis

Keyword(s):

Time Series ◽

Missing Values ◽

Climate Models ◽

Daily Precipitation ◽

Mapping Method ◽

Precipitation Time Series ◽

Quantile Mapping ◽

Step Procedure ◽

Mediterranean Island ◽

Data Statistics

We present and assess a method to estimate missing values in daily precipitation time series for the Mediterranean island of Crete. The method involves a quantile mapping methodology originally developed for the bias correction of climate models’ output. The overall methodology is based on a two-step procedure: (a) assessment of missing values from nearby stations and (b) adjustment of the biases in the probability density function of the filled values towards the existing data of the target. The methodology is assessed for its performance in filling-in the time series of a dense precipitation station network with large gaps on the island of Crete, Greece. The results indicate that quantile mapping can benefit the filled-in missing data statistics, as well as the wet day fraction. Conceptual limitations of the method are discussed, and correct methodology application guidance is provided.

Download Full-text

Modelling of monsoon rainfall for a mesoscale catchment in North-West India II: stochastic rainfall simulations

Hydrology and Earth System Sciences ◽

10.5194/hess-10-807-2006 ◽

2006 ◽

Vol 10 (6) ◽

pp. 807-815 ◽

Cited By ~ 1

Author(s):

E. Zehe ◽

A. K. Singh ◽

A. Bárdossy

Keyword(s):

Time Series ◽

Meteorological Data ◽

Daily Rainfall ◽

Time Variability ◽

Western India ◽

Data Sets ◽

Rainfall Time Series ◽

North West ◽

Monthly Scale ◽

Circulation Patterns

Abstract. Within this study we present a robust method for generating precipitation time series for the Anas catchment in North Western India. The method employs a multivariate stochastic simulation model that is driven by a time series of objectively classified circulation patterns (CPs). In a companion study (Zehe et al., 2006) it was already shown that CPs classified from the 500 or 700 Hpa levels are suitable to explain space-time variability of precipitation in that area. The model is calibrated using observed rainfall time series for the period 1985–1992 for two different CP time series, one from the 500 Hpa level and the over from the 700 Hpa level, and 200 realizations of daily rainfall are simulated for the period 85–94. Simulations using the CPs from the 500 Hpa level as input yield a good match of the observed averages and standard deviations of daily rainfall. They show furthermore good performance at the monthly scale. When used with the 700 Hpa level CPs as inputs the model clearly underestimates the standard deviation and performs much worse at the monthly scale, especially in the validation period 93–94. The presented results give evidence that CPs from the 500 Hpa, level in combination with a multivariate stochastic model, make up a suitable tool for reducing the sparsity of precipitation data in developing regions with sparse hydro-meteorological data sets.

Download Full-text

A Comparison of Three Gap Filling Techniques for Eddy Covariance Net Carbon Fluxes in Short Vegetation Ecosystems

Advances in Meteorology ◽

10.1155/2015/260580 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 8

Author(s):

Xiaosong Zhao ◽

Yao Huang

Keyword(s):

Time Series ◽

Standard Deviation ◽

Diurnal Variation ◽

Eddy Covariance ◽

Nonlinear Regression ◽

Missing Values ◽

The Other ◽

Gap Filling ◽

Benchmark Datasets ◽

Filling Method

Missing data is an inevitable problem when measuring CO2, water, and energy fluxes between biosphere and atmosphere by eddy covariance systems. To find the optimum gap-filling method for short vegetations, we review three-methods mean diurnal variation (MDV), look-up tables (LUT), and nonlinear regression (NLR) for estimating missing values of net ecosystem CO2exchange (NEE) in eddy covariance time series and evaluate their performance for different artificial gap scenarios based on benchmark datasets from marsh and cropland sites in China. The cumulative errors for three methods have no consistent bias trends, which ranged between −30 and +30 mgCO2 m−2from May to October at three sites. To reduce sum bias in maximum, combined gap-filling methods were selected for short vegetation. The NLR or LUT method was selected after plant rapidly increasing in spring and before the end of plant growing, and MDV method was used to the other stage. The sum relative error (SRE) of optimum method ranged between −2 and +4% for four-gap level at three sites, except for 55% gaps at soybean site, which also obviously reduced standard deviation of error.

Download Full-text

SC-Earth: A Station-based Serially Complete Earth Dataset from 1950 to 2019

Journal of Climate ◽

10.1175/jcli-d-21-0067.1 ◽

2021 ◽

pp. 1-47

Author(s):

Guoqiang Tang ◽

Martyn P. Clark ◽

Simon Michael Papalexiou

Keyword(s):

Missing Values ◽

Meteorological Data ◽

Oceanic Islands ◽

Dew Point ◽

Time Shift ◽

Gap Filling ◽

Quantile Mapping ◽

Station Data ◽

Historical Climatology ◽

The Tropics

AbstractMeteorological data from ground stations suffer from temporal discontinuities caused by missing values and short measurement periods. Gap filling and reconstruction techniques have proven to be effective in producing serially complete station datasets (SCDs) that are used for a myriad of meteorological applications (e.g., developing gridded meteorological datasets and validating models). To our knowledge, all SCDs are developed at regional scales. In this study, we developed the serially complete Earth (SC-Earth) dataset, which provides daily precipitation, mean temperature, temperature range, dew-point temperature, and wind speed data from 1950 to 2019. SC-Earth utilizes raw station data from the Global Historical Climatology Network-Daily (GHCN-D) and the Global Surface Summary of the Day (GSOD). A unified station repository is generated based on GHCN-D and GSOD after station merging and strict quality control. ERA5 is optimally matched with station data considering the time shift issue and then used to assist the global gap filling. SC-Earth is generated by merging estimates from 15 strategies based on quantile mapping, spatial interpolation, machine learning, and multi-strategy merging. The final estimates are bias corrected using a combination of quantile mapping and quantile delta mapping. Comprehensive validation demonstrates that SC-Earth has high accuracy around the globe, with degraded quality in the tropics and oceanic islands due to sparse station networks, strong spatial precipitation gradients, and degraded ERA5 estimates. Meanwhile, SC-Earth inherits potential limitations such as inhomogeneity and precipitation undercatch from raw station data, which may affect its application in some cases. Overall, the high-quality and high-density SC-Earth dataset will benefit research in fields of hydrology, ecology, meteorology, and climate.

Download Full-text

Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain

Atmosphere ◽

10.3390/atmos12091158 ◽

2021 ◽

Vol 12 (9) ◽

pp. 1158

Author(s):

Juan Antonio Bellido-Jiménez ◽

Javier Estévez Gualda ◽

Amanda Penélope García-Marín

Keyword(s):

Machine Learning ◽

Missing Values ◽

Mean Bias Error ◽

Semiarid Region ◽

Bias Error ◽

Spatial And Temporal Variability ◽

Rainfall Time Series ◽

Learning Models ◽

Gap Filling ◽

Machine Learning Models

The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.

Download Full-text

Hybrid of ARIMA-GARCH Modeling in Rainfall Time Series

Jurnal Teknologi ◽

10.11113/jt.v63.1908 ◽

2013 ◽

Vol 63 (2) ◽

Cited By ~ 2

Author(s):

Fadhilah Yusof ◽

Ibrahim Lawal Kane ◽

Zulkifli Yusop

Keyword(s):

Time Series ◽

Daily Rainfall ◽

Rainfall Series ◽

Empirical Modeling ◽

Dependence Structure ◽

Rainfall Time Series ◽

Arima Models ◽

Starting Point ◽

Garch Modeling ◽

Squared Residuals

The dependence structure of rainfall is usually very complex both in time and space. It is shown in this paper that the daily rainfall series of Ipoh and Alorsetar are affected by nonlinear characteristics of the variance often referred to as variance clustering or volatility, where large changes tend to follow large changes and small changes tend to follow small changes. In most empirical modeling of hydrological time series, the focus was on modeling and predicting the mean behavior of the time series through conventional methods of an Autoregressive Moving Average (ARMA) modeling proposed by the Box Jenkins methodology. The conventional models operate under the assumption that the series is stationary that is: constant mean and either constant variance or season-dependent variances, however, does not take into account the second order moment or conditional variance, but they form a good starting point for time series analysis. The residuals from preliminary ARIMA models derived from the daily rainfall time series were tested for ARCH behavior. The autocorrelation structure of the residuals and the squared residuals were inspected, the residuals are uncorrelated but the squared residuals show autocorrelation, the Ljung-Box test confirmed the results. McLeod-Li test and a test based on the Lagrange multiplier (LM) principle were applied to the squared residuals from ARIMA models. The results of these auxiliary tests show clear evidence to reject the null hypothesis of no ARCH effect. Hence indicates that GARCH modeling is necessary. Therefore the composite ARIMA-GARCH model captures the dynamics of the daily rainfall series in study areas more precisely. On the other hand, Seasonal ARIMA model became a suitable model for the monthly average rainfall series of the same locations treated.

Download Full-text

Rainfall variability in Malay Peninsula region of Southeast Asia using gridded data

E3S Web of Conferences ◽

10.1051/e3sconf/20198101002 ◽

2019 ◽

Vol 81 ◽

pp. 01002

Author(s):

Vishal Singh ◽

Xiaosheng Qin

Keyword(s):

Time Series ◽

Southeast Asia ◽

Rainfall Variability ◽

Rainfall Time Series ◽

Malay Peninsula ◽

Gap Filling ◽

Mathematical Functions ◽

Rainfall Analysis ◽

The Past ◽

Filling Analysis

Southeast Asia is recognized as a climate-change vulnerable region as it has been significantly affected by many extreme events in the past. This study carried out a rainfall analysis over the Malay Peninsula region of Southeast Asia utilizing historical (1981-2007) gridded rainfall datasets (0.5°×0.5°). The rainfall variability was analyzed in an intra-decadal time series duration. The uncertainty involved in all datasets was also checked based on the comparison of multiple global rainfall datasets. Rainfall gap filling analysis was conducted for producing more accurate rainfall time series after testing multiple mathematical functions. Frequency-based rainfall extreme indices such as Dry Days and Wet days are generated to assess the rainfall variability over the study area. Our results revealed a notable variation existed in the rainfalls over Malay Peninsula as per the long historical duration (1981-2007).

Download Full-text

A multiple threshold method for fitting the generalized Pareto distribution to rainfall time series

Hydrology and Earth System Sciences ◽

10.5194/hess-14-2559-2010 ◽

2010 ◽

Vol 14 (12) ◽

pp. 2559-2575 ◽

Cited By ~ 37

Author(s):

R. Deidda

Keyword(s):

Time Series ◽

Distribution Function ◽

Pareto Distribution ◽

Generalized Pareto Distribution ◽

Daily Rainfall ◽

Rainfall Time Series ◽

Threshold Method ◽

Generalized Pareto ◽

Multiple Threshold ◽

Optimum Threshold

Abstract. Previous studies indicate the generalized Pareto distribution (GPD) as a suitable distribution function to reliably describe the exceedances of daily rainfall records above a proper optimum threshold, which should be selected as small as possible to retain the largest sample while assuring an acceptable fitting. Such an optimum threshold may differ from site to site, affecting consequently not only the GPD scale parameter, but also the probability of threshold exceedance. Thus a first objective of this paper is to derive some expressions to parameterize a simple threshold-invariant three-parameter distribution function which assures a perfect overlapping with the GPD fitted on the exceedances over any threshold larger than the optimum one. Since the proposed distribution does not depend on the local thresholds adopted for fitting the GPD, it is expected to reflect the on-site climatic signature and thus appears particularly suitable for hydrological applications and regional analyses. A second objective is to develop and test the Multiple Threshold Method (MTM) to infer the parameters of interest by using exceedances over a wide range of thresholds applying again the concept of parameters threshold-invariance. We show the ability of the MTM in fitting historical daily rainfall time series recorded with different resolutions and with a significative percentage of heavily quantized data. Finally, we prove the supremacy of the MTM fit against the standard single threshold fit, often adopted for partial duration series, by evaluating and comparing the performances on Monte Carlo samples drawn by GPDs with different shape and scale parameters and different discretizations.

Download Full-text

Simulation of rainfall time-series from different climatic regions using the Direct Sampling technique

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-11-3213-2014 ◽

2014 ◽

Vol 11 (3) ◽

pp. 3213-3247 ◽

Cited By ~ 4

Author(s):

F. Oriani ◽

J. Straubhaar ◽

P. Renard ◽

G. Mariethoz

Keyword(s):

Time Series ◽

Daily Rainfall ◽

Sampling Technique ◽

Training Image ◽

Training Dataset ◽

Multiple Point ◽

Rainfall Time Series ◽

Climatic Regions ◽

Direct Sampling ◽

Time Series Simulation

Abstract. The Direct Sampling technique, belonging to the family of multiple-point statistics, is proposed as a non-parametric alternative to the classical autoregressive and Markov-chain based models for daily rainfall time-series simulation. The algorithm makes use of the patterns contained inside the training image (the past rainfall record) to reproduce the complexity of the signal without inferring its prior statistical model: the time-series is simulated by sampling the training dataset where a sufficiently similar neighborhood exists. The advantage of this approach is the capability of simulating complex statistical relations by respecting the similarity of the patterns at different scales. The technique is applied to daily rainfall records from different climate settings, using a standard setup and without performing any optimization of the parameters. The results show that the overall statistics as well as the dry/wet spells patterns are simulated accurately. Also the extremes at the higher temporal scale are reproduced exhaustively, reducing the well known problem of over-dispersion.

Download Full-text