AQ-Bench: A Benchmark Dataset for Machine Learning on Global Air Quality Metrics

Mapping Intimacies ◽

10.5194/essd-2020-380 ◽

2021 ◽

Author(s):

Clara Betancourt ◽

Timo Stomberg ◽

Scarlet Stadtler ◽

Ribana Roscher ◽

Martin G. Schultz

Keyword(s):

Machine Learning ◽

Air Quality ◽

Tropospheric Ozone ◽

Environmental Science ◽

Quality Metrics ◽

Quality Data ◽

Easy Access ◽

Linear Regression Method ◽

Learning Methods ◽

Machine Learning Methods

Abstract. With the AQ-Bench dataset, we contribute to the recent developments towards shared data usage and machine learning methods in the field of environmental science. The dataset presented here enables researchers to relate global air quality metrics to easy-access metadata and to explore different machine learning methods for obtaining estimates of air quality based on this metadata. AQ-Bench contains a unique collection of aggregated air quality data from the years 2010–2014 and metadata at more than 5500 air quality monitoring stations all over the world, provided by the first Tropospheric Ozone Assessment Report (TOAR). It focuses in particular on metrics of tropospheric ozone, which has a detrimental effect on climate, human morbidity and mortality, as well as crop yields. We validate these data as a machine learning benchmark by providing a well-defined task together with a suitable evaluation metric. Baseline scores obtained from a linear regression method, a fully connected neural network and random forest are provided for reference. AQ-Bench offers a low-threshold entrance for all machine learners with an interest in environmental science and for atmospheric scientists who are interested in applying machine learning techniques. It enables them to start with a real-world problem relevant to humans and nature. The dataset and introductory machine learning code are available at https://doi.org/10.23728/b2share.30d42b5a87344e82855a486bf2123e9f (Betancourt et al., 2020) and https://gitlab.version.fz-juelich.de/toar/ozone-mapping . AQ-Bench thus provides a blueprint for environmental benchmark datasets as well as an example for data re-use according to the FAIR principles.

Download Full-text

AQ-Bench: a benchmark dataset for machine learning on global air quality metrics

Earth System Science Data ◽

10.5194/essd-13-3013-2021 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3013-3033

Author(s):

Clara Betancourt ◽

Timo Stomberg ◽

Ribana Roscher ◽

Martin G. Schultz ◽

Scarlet Stadtler

Keyword(s):

Machine Learning ◽

Air Quality ◽

Tropospheric Ozone ◽

Environmental Science ◽

Quality Metrics ◽

Quality Data ◽

Easy Access ◽

Linear Regression Method ◽

Learning Methods ◽

Machine Learning Methods

Abstract. With the AQ-Bench dataset, we contribute to the recent developments towards shared data usage and machine learning methods in the field of environmental science. The dataset presented here enables researchers to relate global air quality metrics to easy-access metadata and to explore different machine learning methods for obtaining estimates of air quality based on this metadata. AQ-Bench contains a unique collection of aggregated air quality data from the years 2010–2014 and metadata at more than 5500 air quality monitoring stations all over the world, provided by the first Tropospheric Ozone Assessment Report (TOAR). It focuses in particular on metrics of tropospheric ozone, which has a detrimental effect on climate, human morbidity and mortality, as well as crop yields. The purpose of this dataset is to produce estimates of various long-term ozone metrics based on time-independent local site conditions. We combine this task with a suitable evaluation metric. Baseline scores obtained from a linear regression method, a fully connected neural network and random forest are provided for reference and validation. AQ-Bench offers a low-threshold entrance for all machine learners with an interest in environmental science and for atmospheric scientists who are interested in applying machine learning techniques. It enables them to start with a real-world problem relevant to humans and nature. The dataset and introductory machine learning code are available at https://doi.org/10.23728/b2share.30d42b5a87344e82855a486bf2123e9f (Betancourt et al., 2020) and https://gitlab.version.fz-juelich.de/esde/machine-learning/aq-bench (Betancourt et al., 2021). AQ-Bench thus provides a blueprint for environmental benchmark datasets as well as an example for data re-use according to the FAIR principles.

Download Full-text

Machine Learning Methods for Air Quality Monitoring

Proceedings of the 3rd International Conference on Networking, Information Systems & Security ◽

10.1145/3386723.3387835 ◽

2020 ◽

Author(s):

Mohamed Akram Zaytar ◽

Chaker El Amrani

Keyword(s):

Machine Learning ◽

Air Quality ◽

Quality Monitoring ◽

Air Quality Monitoring ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Applying Machine-Learning Methods Based on Causality Analysis to Determine Air Quality in China

Polish Journal of Environmental Studies ◽

10.15244/pjoes/99639 ◽

2019 ◽

Vol 28 (5) ◽

pp. 3877-3885

Author(s):

Bocheng Wang

Keyword(s):

Machine Learning ◽

Air Quality ◽

Causality Analysis ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Monitoring the Variation of Vegetation Water Content with Machine Learning Methods: Point–Surface Fusion of MODIS Products and GNSS-IR Observations

Remote Sensing ◽

10.3390/rs11121440 ◽

2019 ◽

Vol 11 (12) ◽

pp. 1440 ◽

Cited By ~ 1

Author(s):

Qiangqiang Yuan ◽

Shuwen Li ◽

Linwei Yue ◽

Tongwen Li ◽

Huanfeng Shen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Water Content ◽

Linear Regression Method ◽

Learning Methods ◽

Machine Learning Methods ◽

Vegetation Water Content ◽

Drought Prediction ◽

Vegetation Water ◽

Surface Fusion

Vegetation water content (VWC) is recognized as an important parameter in vegetation growth studies, natural disasters such as forest fires, and drought prediction. Recently, the Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) has emerged as an important technique for monitoring vegetation information. The normalized microwave reflection index (NMRI) was developed to reflect the change of VWC based on this fact. However, NMRI uses local site-based data, and the sparse distribution hinders the application of NMRI. In this study, we obtained a 500 m spatially continuous NMRI product by integrating GNSS-IR site data with other VWC-related products using the point–surface fusion technique. The auxiliary data in the fusion process include the normalized difference vegetation index (NDVI), gross primary productivity (GPP), and precipitation. Meanwhile, the fusion performance of three machine learning methods, i.e., the back-propagation neural network (BPNN), generalized regression neural network (GRNN), and random forest (RF) are compared and analyzed. The machine learning methods achieve satisfactory results, with cross-validation R values of 0.71–0.83 and RMSEs of 0.025–0.037. The results show a clear improvement over the traditional multiple linear regression method, which achieves R (RMSE) values of only about 0.4 (0.045). It indicates that the machine learning methods can better learn the complex nonlinear relationship between NMRI and the input VWC-related index. Among the machine learning methods, the RF model obtained the best results. Long time-series NMRI images with a 500 m spatial resolution in the western part of the continental U.S. were then obtained. The results show that the spatial distribution of the NMRI product is consistent with a drought situation from 2012 to 2014 in the U.S., which verifies the feasibility of analyzing and predicting drought times and distribution ranges by using the 500 m fusion product.

Download Full-text

Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods

Air Quality Atmosphere & Health ◽

10.1007/s11869-016-0414-3 ◽

2016 ◽

Vol 10 (2) ◽

pp. 195-211 ◽

Cited By ~ 22

Author(s):

Huiping Peng ◽

Aranildo R. Lima ◽

Andrew Teakles ◽

Jian Jin ◽

Alex J. Cannon ◽

...

Keyword(s):

Machine Learning ◽

Air Quality ◽

Learning Methods ◽

Machine Learning Methods ◽

Air Quality Forecasting

Download Full-text

Comparison of Different Machine Learning Methods to Forecast Air Quality Index

Lecture Notes in Electrical Engineering - Frontier Computing ◽

10.1007/978-981-13-3648-5_27 ◽

2019 ◽

pp. 235-245

Author(s):

Bo Liu ◽

Chao Shi ◽

Jianqiang Li ◽

Yong Li ◽

Jianlei Lang ◽

...

Keyword(s):

Machine Learning ◽

Air Quality ◽

Quality Index ◽

Air Quality Index ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Air Quality Prediction Using Machine Learning Methods: A Case Study of Bjelave Neighborhood, Sarajevo, BiH

Advanced Technologies, Systems, and Applications V - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-54765-3_29 ◽

2020 ◽

pp. 423-434

Author(s):

Emina Džaferović ◽

Kanita Karađuzović-Hadžiabdić

Keyword(s):

Machine Learning ◽

Air Quality ◽

Quality Prediction ◽

Learning Methods ◽

Machine Learning Methods ◽

Air Quality Prediction

Download Full-text

Identify the contribution of elevated industrial plume to ground air quality by optical and machine learning methods

Environmental Research Communications ◽

10.1088/2515-7620/ab7634 ◽

2020 ◽

Vol 2 (2) ◽

pp. 021005

Author(s):

Limin Feng ◽

Ting Yang ◽

Dawei Wang ◽

Zifa Wang ◽

Yuepeng Pan ◽

...

Keyword(s):

Machine Learning ◽

Air Quality ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A concept of the air quality monitoring system in the city of Lublin with machine learning methods to detect data outliers

MATEC Web of Conferences ◽

10.1051/matecconf/201925203009 ◽

2019 ◽

Vol 252 ◽

pp. 03009 ◽

Cited By ~ 3

Author(s):

Tomasz Cieplak ◽

Tomasz Rymarczyk ◽

Robert Tomaszewski

Keyword(s):

Machine Learning ◽

Air Quality ◽

Monitoring System ◽

Low Cost ◽

Quality Monitoring ◽

Spatiotemporal Variability ◽

Air Quality Monitoring ◽

Learning Methods ◽

Machine Learning Methods ◽

The City

This paper presents a concept of the air quality monitoring system design and describes a selection of data quality analysis methods. A high level of industrialisation affects the risk of natural disasters related to environmental pollution such ase.g.air pollution by gases and clouds of dust (carbon monoxide, sulphur oxides, nitrogen oxides). That is why researches related to the monitoring this type of phenomena are extremely important. Low-cost air quality sensors are more commonly used to monitor air parameters in urban areas. These types of sensors are used to obtain an image of the spatiotemporal variability in the concentration of air pollutants. Aside from their low price , which is important from a point of view of the economic accessibility of society, low-cost sensors are prone to produce erroneous results compared to professional air quality monitors. The described study focuses on the analysis of outliers as particularly interesting for further analysis, as well as modelling with machine learning methods for air quality assessment in the city of Lublin.

Download Full-text

Global fine resolution mapping of ozone metrics through explainable machine learning

10.5194/egusphere-egu21-7596 ◽

2021 ◽

Author(s):

Clara Betancourt ◽

Scarlet Stadtler ◽

Timo Stomberg ◽

Ann-Kathrin Edrich ◽

Ankit Patnala ◽

...

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Tropospheric Ozone ◽

High Performance ◽

Learning Task ◽

Error Modeling ◽

Data Driven ◽

Model Parameters ◽

Learning Methods ◽

Machine Learning Methods

Through the availability of multi-year ground based ozone observations on a global scale, substantial geospatial meta data, and high performance computing capacities, it is now possible to use machine learning for a global data-driven ozone assessment. In this presentation, we will show a novel, completely data-driven approach to map tropospheric ozone globally.Our goal is to interpolate ozone metrics and aggregated statistics from the database of the Tropospheric Ozone Assessment Report (TOAR) onto a global 0.1&#176; x 0.1&#176; resolution grid. &#160;It is challenging to interpolate ozone, a toxic greenhouse gas because its formation depends on many interconnected environmental factors on small scales. We conduct the interpolation with various machine learning methods trained on aggregated hourly ozone data from five years at more than 5500 locations worldwide. We use several geospatial datasets as training inputs to provide proxy input for environmental factors controlling ozone formation, such as precursor emissions and climate. The resulting maps contain different ozone metrics, i.e. statistical aggregations which are widely used to assess air pollution impacts on health, vegetation, and climate.The key aspects of this contribution are twofold: First, we apply explainable machine learning methods to the data-driven ozone assessment. Second, we discuss dominant uncertainties relevant to the ozone mapping and quantify their impact whenever possible. Our methods include a thorough a-priori uncertainty estimation of the various data and methods, assessment of scientific consistency, finding critical model parameters, using ensemble methods, and performing error modeling.Our work aims to increase the reliability and integrity of the derived ozone maps through the provision of scientific robustness to a data-centric machine learning task. This study hence represents a blueprint for how to formulate an environmental machine learning task scientifically, gather the necessary data, and develop a data-driven workflow that focuses on optimizing transparency and applicability of its product to maximize its scientific knowledge return.

Download Full-text