scholarly journals Challenging Problems of Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series Data

Author(s):  
Boris Faybishenko ◽  
Roelof Versteeg ◽  
Gilberto Pastorello ◽  
Dipankar Dwivedi ◽  
Charuleka Varadharajan ◽  
...  

Abstract Representativeness and quality of collected meteorological data impact accuracy and precision of climate, hydrological, and biogeochemical analyses and predictions. We developed a comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework, consisting of three major steps: Step 1—Preliminary data exploration, i.e., processing of raw datasets, with the challenging problems of time formatting and combining datasets of different lengths and different time intervals; Step 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme data; and Step 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The paper includes two use cases based on the time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado), and the Barro Colorado Island (BCI, Panama) meteorological station. The developed statistical methods are suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.

Author(s):  
B. Faybishenko ◽  
R. Versteeg ◽  
G. Pastorello ◽  
D. Dwivedi ◽  
C. Varadharajan ◽  
...  

AbstractRepresentativeness and quality of collected meteorological data impact accuracy and precision of climate, hydrological, and biogeochemical analyses and predictions. We developed a comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework, consisting of three major phases: Phase I—Preliminary data exploration, i.e., processing of raw datasets, with the challenging problems of time formatting and combining datasets of different lengths and different time intervals; Phase II—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme data; and Phase III—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The paper includes two use cases based on the time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado), and the Barro Colorado Island (BCI, Panama) meteorological station. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.


Hydrology ◽  
2018 ◽  
Vol 5 (4) ◽  
pp. 63 ◽  
Author(s):  
Benjamin Nelsen ◽  
D. Williams ◽  
Gustavious Williams ◽  
Candace Berrett

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.


2018 ◽  
Vol 17 (02) ◽  
pp. 1850017 ◽  
Author(s):  
Mahdi Kalantari ◽  
Masoud Yarmohammadi ◽  
Hossein Hassani ◽  
Emmanuel Sirimal Silva

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.


2020 ◽  
Author(s):  
César Capinha ◽  
Ana Ceia-Hasse ◽  
Andrew M. Kramer ◽  
Christiaan Meijer

AbstractTemporal data is ubiquitous in ecology and ecologists often face the challenge of accurately differentiating these data into predefined classes, such as biological entities or ecological states. The usual approach transforms the temporal data into static predictors of the classes. However, recent deep learning techniques can perform the classification using raw time series, eliminating subjective and resource-consuming data transformation steps, and potentially improving classification results. We present a general approach for time series classification that considers multiple deep learning algorithms and illustrate it with three case studies: i) insect species identification from wingbeat spectrograms; ii) species distribution modelling from climate time series and iii) the classification of phenological phases from continuous meteorological data. The deep learning approach delivered ecologically sensible and accurate classifications, proving its potential for wide applicability across subfields of ecology. We recommend deep learning as an alternative to techniques requiring the transformation of time series data.


2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


Author(s):  
Yonghong Luo ◽  
Ying Zhang ◽  
Xiangrui Cai ◽  
Xiaojie Yuan

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.


2018 ◽  
Vol 66 (2) ◽  
pp. 143-152 ◽  
Author(s):  
Marcia S. Batalha ◽  
Maria C. Barbosa ◽  
Boris Faybishenko ◽  
Martinus Th. van Genuchten

AbstractAccurate estimates of infiltration and groundwater recharge are critical for many hydrologic, agricultural and environmental applications. Anticipated climate change in many regions of the world, especially in tropical areas, is expected to increase the frequency of high-intensity, short-duration precipitation events, which in turn will affect the groundwater recharge rate. Estimates of recharge are often obtained using monthly or even annually averaged meteorological time series data. In this study we employed the HYDRUS-1D software package to assess the sensitivity of groundwater recharge calculations to using meteorological time series of different temporal resolutions (i.e., hourly, daily, weekly, monthly and yearly averaged precipitation and potential evaporation rates). Calculations were applied to three sites in Brazil having different climatological conditions: a tropical savanna (the Cerrado), a humid subtropical area (the temperate southern part of Brazil), and a very wet tropical area (Amazonia). To simplify our current analysis, we did not consider any land use effects by ignoring root water uptake. Temporal averaging of meteorological data was found to lead to significant bias in predictions of groundwater recharge, with much greater estimated recharge rates in case of very uneven temporal rainfall distributions during the year involving distinct wet and dry seasons. For example, at the Cerrado site, using daily averaged data produced recharge rates of up to 9 times greater than using yearly averaged data. In all cases, an increase in the time of averaging of meteorological data led to lower estimates of groundwater recharge, especially at sites having coarse-textured soils. Our results show that temporal averaging limits the ability of simulations to predict deep penetration of moisture in response to precipitation, so that water remains in the upper part of the vadose zone subject to upward flow and evaporation.


Sign in / Sign up

Export Citation Format

Share Document