A Data Quality Assessment Approach in the SmartWork Project’s Time-series Data Imputation Paradigm

2021 ◽  
Author(s):  
Georgios Papoulias ◽  
Otilia Kocsis ◽  
Konstantinos Moustakas
Author(s):  
Gilberto Pastorello ◽  
Deb Agarwal ◽  
Dario Papale ◽  
Taghrid Samak ◽  
Carlo Trotta ◽  
...  

Hydrology ◽  
2018 ◽  
Vol 5 (4) ◽  
pp. 63 ◽  
Author(s):  
Benjamin Nelsen ◽  
D. Williams ◽  
Gustavious Williams ◽  
Candace Berrett

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
E Afrifa‐Yamoah ◽  
U. A. Mueller ◽  
S. M. Taylor ◽  
A. J. Fisher

2021 ◽  
pp. 1-24
Author(s):  
Kelly McMann ◽  
Daniel Pemstein ◽  
Brigitte Seim ◽  
Jan Teorell ◽  
Staffan Lindberg

Abstract Political scientists routinely face the challenge of assessing the quality (validity and reliability) of measures in order to use them in substantive research. While stand-alone assessment tools exist, researchers rarely combine them comprehensively. Further, while a large literature informs data producers, data consumers lack guidance on how to assess existing measures for use in substantive research. We delineate a three-component practical approach to data quality assessment that integrates complementary multimethod tools to assess: (1) content validity; (2) the validity and reliability of the data generation process; and (3) convergent validity. We apply our quality assessment approach to the corruption measures from the Varieties of Democracy (V-Dem) project, both illustrating our rubric and unearthing several quality advantages and disadvantages of the V-Dem measures, compared to other existing measures of corruption.


2008 ◽  
Vol 47 (4) ◽  
pp. 1006-1016 ◽  
Author(s):  
Guang-Yu Shi ◽  
Tadahiro Hayasaka ◽  
Atsumu Ohmura ◽  
Zhi-Hua Chen ◽  
Biao Wang ◽  
...  

Abstract Solar radiation is one of the most important factors affecting climate and the environment. Routine measurements of irradiance are valuable for climate change research because of long time series and areal coverage. In this study, a set of quality assessment (QA) algorithms is used to test the quality of daily solar global, direct, and diffuse radiation measurements taken at 122 observatories in China during 1957–2000. The QA algorithms include a physical threshold test (QA1), a global radiation sunshine duration test (QA2), and a standard deviation test applied to time series of annually averaged solar global radiation (QA3). The results show that the percentages of global, direct, and diffuse solar radiation data that fail to pass QA1 are 3.07%, 0.01%, and 2.52%, respectively; the percentages of global solar radiation data that fail to pass the QA2 and QA3 are 0.77% and 0.49%, respectively. The method implemented by the Global Energy Balance Archive is also applied to check the data quality of solar radiation in China. Of the 84 stations with a time series longer that 20 yr, suspect data at 35 of the sites were found. Based on data that passed the QA tests, trends in ground solar radiation and the effect of the data quality assessment on the trends are analyzed. There is a decrease in ground solar global and direct radiation in China over the years under study. Although the quality assessment process has significant effects on the data from individual stations and/or time periods, it does not affect the long-term trends in the data.


2017 ◽  
Vol 23 (1) ◽  
pp. 641-650 ◽  
Author(s):  
Clemens Arbesser ◽  
Florian Spechtenhauser ◽  
Thomas Muhlbacher ◽  
Harald Piringer

Sign in / Sign up

Export Citation Format

Share Document