GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms

Author(s):  
Abdulwaheed Tella ◽  
Abdul-Lateef Balogun
2020 ◽  
Author(s):  
Hanna Meyer ◽  
Edzer Pebesma

<p>Spatial mapping is an important task in environmental science to reveal spatial patterns and changes of the environment. In this context predictive modelling using flexible machine learning algorithms has become very popular. However, looking at the diversity of modelled (global) maps of environmental variables, there might be increasingly the impression that machine learning is a magic tool to map everything. Recently, the reliability of such maps have been increasingly questioned, calling for a reliable quantification of uncertainties.</p><p>Though spatial (cross-)validation allows giving a general error estimate for the predictions, models are usually applied to make predictions for a much larger area or might even be transferred to make predictions for an area where they were not trained on. But by making predictions on heterogeneous landscapes, there will be areas that feature environmental properties that have not been observed in the training data and hence not learned by the algorithm. This is problematic as most machine learning algorithms are weak in extrapolations and can only make reliable predictions for environments with conditions the model has knowledge about. Hence predictions for environmental conditions that differ significantly from the training data have to be considered as uncertain.</p><p>To approach this problem, we suggest a measure of uncertainty that allows identifying locations where predictions should be regarded with care. The proposed uncertainty measure is based on distances to the training data in the multidimensional predictor variable space. However, distances are not equally relevant within the feature space but some variables are more important than others in the machine learning model and hence are mainly responsible for prediction patterns. Therefore, we weight the distances by the model-derived importance of the predictors. </p><p>As a case study we use a simulated area-wide response variable for Europe, bio-climatic variables as predictors, as well as simulated field samples. Random Forest is applied as algorithm to predict the simulated response. The model is then used to make predictions for entire Europe. We then calculate the corresponding uncertainty and compare it to the area-wide true prediction error. The results show that the uncertainty map reflects the patterns in the true error very well and considerably outperforms ensemble-based standard deviations of predictions as indicator for uncertainty.</p><p>The resulting map of uncertainty gives valuable insights into spatial patterns of prediction uncertainty which is important when the predictions are used as a baseline for decision making or subsequent environmental modelling. Hence, we suggest that a map of distance-based uncertainty should be given in addition to prediction maps.</p>


2020 ◽  
Vol 20 (22) ◽  
pp. 13562-13570 ◽  
Author(s):  
Ennio Gambi ◽  
Giulia Temperini ◽  
Rossana Galassi ◽  
Linda Senigagliesi ◽  
Adelmo De Santis

Generally, air pollution refer to the release of various pollutants into the air which are threatening the human health and planet as well. The air pollution is the major dangerous vicious to the humanity ever faced. It causes major damage to animals, plants etc., if this keeps on continuing, the human being will face serious situations in the upcoming years. The major pollutants are from the transport and industries. So, to prevent this problem major sectors have to predict the air quality from transport and industries .In existing project there are many disadvantages. The project is about estimating the PM2.5 concentration by designing a photograph based method. But photographic method is not alone sufficient to calculate PM2.5 because it contains only one of the concentration of pollutants and it calculates only PM2.5 so there are some missing out of the major pollutants and the information needed for controlling the pollution .So thereby we proposed the machine learning techniques by user interface of GUI application. In this multiple dataset can be combined from the different source to form a generalized dataset and various machine learning algorithms are used to get the results with maximum accuracy. From comparing various machine learning algorithms we can obtain the best accuracy result. Our evaluation gives the comprehensive manual to sensitivity evaluation of model parameters with regard to overall performance in prediction of air high quality pollutants through accuracy calculation. Additionally to discuss and compare the performance of machine learning algorithms from the dataset with evaluation of GUI based user interface air quality prediction by attributes.


Author(s):  
Zouiten Mohammed ◽  
Chaaouan Hanae ◽  
Setti Larbi

Forest fires have caused considerable losses to ecologies, societies and economies worldwide. To minimize these losses and reduce forest fires, modeling and predicting the occurrence of forest fires are meaningful because they can support forest fire prevention and management. In recent years, the convolutional neural network (CNN) has become an important state-of-the-art deep learning algorithm, and its implementation has enriched many fields. Therefore, a competitive spatial prediction model for automatic early detection of wild forest fire using machine learning algorithms can be proposed. This model can help researchers to predict forest fires and identify risk zonas. System using machine learning algorithm on geodata will be able to notify in real time the interested parts and authorities by providing alerts and presenting on maps based on geographical treatments for more efficacity and analyzing of the situation. This research extends the application of machine learning algorithms for early fire forest prediction to detection and representation in geographical information system (GIS) maps.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Qi Shao ◽  
Yongming Xu ◽  
Hanyi Wu

COVID-19 has swept through the world since December 2019 and caused a large number of patients and deaths. Spatial prediction on the spread of the epidemic is greatly important for disease control and management. In this study, we predicted the cumulative confirmed cases (CCCs) from Jan 17 to Mar 1, 2020, in mainland China at the city level, using machine learning algorithms, geographically weighted regression (GWR), and partial least squares regression (PLSR) based on population flow, geolocation, meteorological, and socioeconomic variables. The validation results showed that machine learning algorithms and GWR achieved good performances. These models could not effectively predict CCCs in Wuhan, the first city that reported COVID-19 cases in China, but performed well in other cities. Random Forest (RF) outperformed other methods with a CV ‐ R 2 of 0.84. In this model, the population flow from Wuhan to other cities (WP) was the most important feature and the other features also made considerable contributions to the prediction accuracy. Compared with RF, GWR showed a slightly worse performance ( CV ‐ R 2 = 0.81 ) but required fewer spatial independent variables. This study explored the spatial prediction of the epidemic based on multisource spatial independent variables, providing references for the estimation of CCCs in the regions lacking accurate and timely.


Sign in / Sign up

Export Citation Format

Share Document