LGHAP: a Long-term Gap-free High-resolution Air Pollutants concentration dataset derived via tensor flow based multimodal data fusion
Abstract. Developing a big data analytics framework for generating a Long-term Gap-free High-resolution Air Pollutants concentration dataset (abbreviated as LGHAP) is of great significance for environmental management and earth system science analysis. By synergistically integrating multimodal aerosol data acquired from diverse sources via a tensor flow based data fusion method, a gap-free aerosol optical depth (AOD) dataset with daily 1-km resolution covering the period of 2000–2020 in China was generated. Specifically, data gaps in daily AOD imageries from MODIS aboard Terra were reconstructed based on a set of AOD data tensors acquired from satellites, numerical analysis, and in situ air quality data via integrative efforts of spatial pattern recognition for high dimensional gridded image analysis and knowledge transfer in statistical data mining. To our knowledge, this is the first long-term gap-free high resolution AOD dataset in China, from which spatially contiguous PM2.5 and PM10 concentrations were estimated using an ensemble learning approach. Ground validation results indicate that the LGHAP AOD data are in a good agreement with in situ AOD observations from AERONET, with R of 0.91 and RMSE equaling to 0.21. Meanwhile, PM2.5 and PM10 estimations also agreed well with ground measurements, with R of 0.95 and 0.94 and RMSE of 12.03 and 19.56 μg m−3, respectively. Overall, the LGHAP provides a suite of long-term gap free gridded maps with high-resolution to better examine aerosol changes in China over the past two decades, from which three distinct variation periods of haze pollution were revealed in China. Additionally, the proportion of population exposed to unhealthy PM2.5 was increased from 50.60 % in 2000 to 63.81 % in 2014 across China, which was then drastically reduced to 34.03 % in 2020. Overall, the generated LGHAP aerosol dataset has a great potential to trigger multidisciplinary applications in earth observations, climate change, public health, ecosystem assessment, and environmental management. The daily resolution AOD, PM2.5, and PM10 datasets can be publicly accessed at https://doi.org/10.5281/zenodo.5652257 (Bai et al., 2021a), https://doi.org/10.5281/zenodo.5652265 (Bai et al., 2021b), and https://doi.org/10.5281/zenodo.5652263 (Bai et al., 2021c), respectively. Meanwhile, monthly and annual mean datasets can be found at https://doi.org/10.5281/zenodo.5655797 (Bai et al., 2021d) and https://doi.org/10.5281/zenodo.5655807 (Bai et al., 2021e), respectively. Python, Matlab, R, and IDL codes were also provided to help users read and visualize these data.