SCDNA: a serially complete precipitation and temperature dataset for North America from 1979 to 2018

Abstract. Probabilistic methods are useful to estimate the uncertainty in spatial meteorological fields (e.g., the uncertainty in spatial patterns of precipitation and temperature across large domains). In ensemble probabilistic methods, “equally plausible” ensemble members are used to approximate the probability distribution, hence the uncertainty, of a spatially distributed meteorological variable conditioned to the available information. The ensemble members can be used to evaluate the impact of uncertainties in spatial meteorological fields for a myriad of applications. This study develops the Ensemble Meteorological Dataset for North America (EMDNA). EMDNA has 100 ensemble members with daily precipitation amount, mean daily temperature, and daily temperature range at 0.1∘ spatial resolution (approx. 10 km grids) from 1979 to 2018, derived from a fusion of station observations and reanalysis model outputs. The station data used in EMDNA are from a serially complete dataset for North America (SCDNA) that fills gaps in precipitation and temperature measurements using multiple strategies. Outputs from three reanalysis products are regridded, corrected, and merged using Bayesian model averaging. Optimal interpolation (OI) is used to merge station- and reanalysis-based estimates. EMDNA estimates are generated using spatiotemporally correlated random fields to sample from the OI estimates. Evaluation results show that (1) the merged reanalysis estimates outperform raw reanalysis estimates, particularly in high latitudes and mountainous regions; (2) the OI estimates are more accurate than the reanalysis and station-based regression estimates, with the most notable improvements for precipitation evident in sparsely gauged regions; and (3) EMDNA estimates exhibit good performance according to the diagrams and metrics used for probabilistic evaluation. We discuss the limitations of the current framework and highlight that further research is needed to improve ensemble meteorological datasets. Overall, EMDNA is expected to be useful for hydrological and meteorological applications in North America. The entire dataset and a teaser dataset (a small subset of EMDNA for easy download and preview) are available at https://doi.org/10.20383/101.0275 (Tang et al., 2020a).

Download Full-text

SCDNA: a serially complete precipitation and temperature dataset for North America from 1979 to 2018

Earth System Science Data ◽

10.5194/essd-12-2381-2020 ◽

2020 ◽

Vol 12 (4) ◽

pp. 2381-2409 ◽

Cited By ~ 1

Author(s):

Guoqiang Tang ◽

Martyn P. Clark ◽

Andrew J. Newman ◽

Andrew W. Wood ◽

Simon Michael Papalexiou ◽

...

Keyword(s):

North America ◽

Auxiliary Information ◽

Mean Value ◽

Meteorological Station ◽

Maximum Temperature ◽

Quantile Mapping ◽

Sensitivity Experiment ◽

Station Data ◽

Precipitation And Temperature ◽

Observation Period

Abstract. Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially complete station observations for North America, this study seeks to develop an SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin⁡), and maximum temperature (Tmax⁡) data for 27 276 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year-long records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records. Infilling during the observation period and reconstruction beyond the observation period were accomplished by combining estimates from 16 strategies (variants of quantile mapping, spatial interpolation, and machine learning). A sensitivity experiment was conducted by assuming that 30 % of observations from stations were missing – this enabled independent validation and provided a reference for reconstruction. Quantile mapping and mean value corrections were applied to the final estimates. The median Kling–Gupta efficiency (KGE′) values of the final SCDNA for all stations are 0.90, 0.98, and 0.99 for precipitation, Tmin⁡, and Tmax⁡, respectively. The SCDNA is closer to station observations than the four benchmark gridded products and can be used in applications that require either quality-controlled meteorological station observations or reconstructed long-term estimates for analysis and modeling. The dataset is available at https://doi.org/10.5281/zenodo.3735533 (Tang et al., 2020).

Download Full-text

The use of serially complete station data to improve the temporal continuity of gridded precipitation and temperature estimates

Journal of Hydrometeorology ◽

10.1175/jhm-d-20-0313.1 ◽

2021 ◽

Author(s):

Guoqiang Tang ◽

Martyn P. Clark ◽

Simon Michael Papalexiou

Keyword(s):

North America ◽

Spatial Interpolation ◽

Missing Values ◽

Meteorological Data ◽

Maximum Temperature ◽

Spatial Density ◽

Gap Filling ◽

Station Network ◽

Interpolation Methods ◽

Precipitation And Temperature

AbstractStations are an important source of meteorological data, but often suffer from missing values and short observation periods. Gap filling is widely used to generate serially complete datasets (SCDs), which are subsequently used to produce gridded meteorological estimates. However, the value of SCDs in spatial interpolation is scarcely studied. Based on our recent efforts to develop a SCD over North America (SCDNA), we explore the extent to which gap filling improves gridded precipitation and temperature estimates. We address two specific questions: (1) Can SCDNA improve the statistical accuracy of gridded estimates in North America? (2) Can SCDNA improve estimates of trends on gridded data? In addressing these questions, we also evaluate the extent to which results depend on the spatial density of the station network and the spatial interpolation methods used. Results show that the improvement in statistical interpolation due to gap filling is more obvious for precipitation, followed by minimum temperature and maximum temperature. The improvement is larger when the station network is sparse and when simpler interpolation methods are used. SCDs can also notably reduce the uncertainties in spatial interpolation. Our evaluation across North America from 1979 to 2018 demonstrates that SCDs improve the accuracy of interpolated estimates for most stations and days. SCDNA-based interpolation also obtains better trend estimation than observation-based interpolation. This occurs because stations used for interpolation could change during a specific period, causing changepoints in interpolated temperature estimates and affect the long-term trends of observation-based interpolation, which can be avoided using SCDNA. Overall, SCDs improve the performance of gridded precipitation and temperature estimates.

Download Full-text

EMDNA: Ensemble Meteorological Dataset for North America

10.5194/essd-2020-303 ◽

2020 ◽

Author(s):

Guoqiang Tang ◽

Martyn P. Clark ◽

Simon Michael Papalexiou ◽

Andrew J. Newman ◽

Andrew W. Wood ◽

...

Keyword(s):

North America ◽

Bayesian Model Averaging ◽

Precipitation Amount ◽

Daily Temperature ◽

Probabilistic Methods ◽

Optimal Interpolation ◽

Small Subset ◽

Mountainous Regions ◽

The Impact ◽

Precipitation And Temperature

Abstract. Probabilistic methods are very useful to estimate the spatial variability in meteorological conditions (e.g., spatial patterns of precipitation and temperature across large domains). In ensemble probabilistic methods, equally plausible ensemble members are used to approximate the probability distribution, hence uncertainty, of a spatially distributed meteorological variable conditioned on the available information. The ensemble can be used to evaluate the impact of the uncertainties in a myriad of applications. This study develops the Ensemble Meteorological Dataset for North America (EMDNA). EMDNA has 100 members with daily precipitation amount, mean daily temperature, and daily temperature range at 0.1° spatial resolution from 1979 to 2018, derived from a fusion of station observations and reanalysis model outputs. The station data used in EMDNA are from a serially complete dataset for North America (SCDNA) that fills gaps in precipitation and temperature measurements using multiple strategies. Outputs from three reanalysis products are regridded, corrected, and merged using the Bayesian Model Averaging. Optimal Interpolation (OI) is used to merge station- and reanalysis-based estimates. EMDNA estimates are generated based on OI estimates and spatiotemporally correlated random fields. Evaluation results show that (1) the merged reanalysis estimates outperform raw reanalysis estimates, particularly in high latitudes and mountainous regions; (2) the OI estimates are more accurate than the reanalysis and station-based regression estimates, with the most notable improvement for precipitation occurring in sparsely gauged regions; and (3) EMDNA estimates exhibit good performance according to the diagrams and metrics used for probabilistic evaluation. We also discuss the limitations of the current framework and highlight that persistent efforts are needed to further develop probabilistic methods and ensemble datasets. Overall, EMDNA is expected to be useful for hydrological and meteorological applications in North America. The whole dataset and a teaser dataset (a small subset of EMDNA for easy download and preview) are available at https://doi.org/10.20383/101.0275 (Tang et al., 2020a).

Download Full-text

SCDNA: a serially complete precipitation and temperature dataset for North America from 1979 to 2018

10.5194/essd-2020-92 ◽

2020 ◽

Author(s):

Guoqiang Tang ◽

Martyn P. Clark ◽

Andrew J. Newman ◽

Andrew W. Wood ◽

Simon Michael Papalexiou ◽

...

Keyword(s):

North America ◽

Auxiliary Information ◽

Mean Value ◽

Meteorological Station ◽

Maximum Temperature ◽

Quantile Mapping ◽

Sensitivity Experiment ◽

Station Data ◽

Precipitation And Temperature ◽

Observation Period

Abstract. Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially-complete station observations for North America, this study seeks to develop a SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin), and maximum temperature (Tmax) data for 27280 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records and were also used as an assessment benchmark. Infilling during the observation period and reconstruction beyond the observation period were accomplished by combining estimates from 16 strategies (variants of quantile mapping, spatial interpolation, and machine learning). A sensitivity experiment was conducted by assuming 30 % observations of stations were missing – this enabled independent validation and provided a reference for reconstruction. Quantile mapping and mean-value corrections were applied to the final estimates. The median Kling-Gupta efficiency (KGE) values of the final SCDNA for all stations are 0.90, 0.98, and 0.99 for precipitation, Tmin and Tmax, respectively. The SCDNA is closer to station observations than four benchmark gridded product, and can be used in applications that require either quality-controlled meteorological station observations or reconstructed long-term estimates for analysis and modelling. The dataset is available at https://doi.org/10.5281/zenodo.3735534 (Tang et al., 2020).

Download Full-text