scholarly journals Supplementary material to "Evaluation of digital soil mapping approaches with large sets of environmental covariates"

Author(s):  
Madlene Nussbaum ◽  
Kay Spiess ◽  
Andri Baltensweiler ◽  
Urs Grob ◽  
Armin Keller ◽  
...  
2017 ◽  
Author(s):  
Madlene Nussbaum ◽  
Kay Spiess ◽  
Andri Baltensweiler ◽  
Urs Grob ◽  
Armin Keller ◽  
...  

Abstract. Spatial assessment of soil functions requires maps of basic soil properties. Unfortunately, these are either missing for many regions or are not available at the desired spatial resolution or down to required soil depth. Conventional soil map generation remains costly. Field based generation of large soil data sets and of conventional soil maps remains costly. Meanwhile, soil legacy data and comprehensive sets of spatial environmental data are available for many regions. Digital soil mapping (DSM) approaches – relating soil data (responses) to environmental data (covariates) – are facing the challenge to build statistical models from large sets of covariates originating for example from airborne imaging spectroscopy or multi-scale terrain analysis. We evaluated six approaches for DSM in three study regions in Switzerland (Berne, Greifensee, ZH forest) by mapping effective soil depth available to plants (SD), pH, soil organic matter (SOM), effective cation exchange capacity (ECEC), clay, silt, gravel content and bulk density for four soil layers (totalling 48 responses). Models were built from 300–500 environmental covariates by selecting linear models by (1) grouped lasso and by an ad-hoc stepwise procedure for (2) robust external-drift kriging (EDK). For (3) geoadditive models we selected penalized smoothing spline terms by componentwise gradient boosting (geoGAM). We further used two tree-based methods: (4) boosted regression trees (BRT) and (5) Random Forest (RF). Lastly, we computed (6) weighted model averages (MA) from predictions obtained from methods 1–5. Lasso, georob and geoGAM successfully selected strongly reduced sets of covariates (subsets of 3–6 % of all covariates). To automatically select a sparse trend model for EDK was however difficult, and the applied ad hoc procedure was computationally inefficient and over-fitted the data. Differences in predictive performance, tested on independent validation data, were mostly small and did not reveal a single best method for 48 responses. Nevertheless, RF was on average often best among methods 1–5 (28 of 48 responses), but was outcompeted by MA for 14 of these 28 responses. RF tended to over-fit the data. Performance of BRT was slightly worse than RF. GeoGAM performed poorly on some responses and was only best for 7 of 48 responses. Predictive precision of lasso was intermediate. All models generally had small bias. Only the computationally very efficient lasso had slightly larger bias likely because it tended to under-fit the data. Summarizing, although differences were small, the frequencies of best and worst performance clearly favoured RF if a single method is applied MA if multiple prediction models can be developed.


Geoderma ◽  
2017 ◽  
Vol 303 ◽  
pp. 118-132 ◽  
Author(s):  
Xiao-Lin Sun ◽  
Hui-Li Wang ◽  
Yu-Guo Zhao ◽  
Chaosheng Zhang ◽  
Gan-Lin Zhang

SOIL ◽  
2018 ◽  
Vol 4 (1) ◽  
pp. 1-22 ◽  
Author(s):  
Madlene Nussbaum ◽  
Kay Spiess ◽  
Andri Baltensweiler ◽  
Urs Grob ◽  
Armin Keller ◽  
...  

Abstract. The spatial assessment of soil functions requires maps of basic soil properties. Unfortunately, these are either missing for many regions or are not available at the desired spatial resolution or down to the required soil depth. The field-based generation of large soil datasets and conventional soil maps remains costly. Meanwhile, legacy soil data and comprehensive sets of spatial environmental data are available for many regions.Digital soil mapping (DSM) approaches relating soil data (responses) to environmental data (covariates) face the challenge of building statistical models from large sets of covariates originating, for example, from airborne imaging spectroscopy or multi-scale terrain analysis. We evaluated six approaches for DSM in three study regions in Switzerland (Berne, Greifensee, ZH forest) by mapping the effective soil depth available to plants (SD), pH, soil organic matter (SOM), effective cation exchange capacity (ECEC), clay, silt, gravel content and fine fraction bulk density for four soil depths (totalling 48 responses). Models were built from 300–500 environmental covariates by selecting linear models through (1) grouped lasso and (2) an ad hoc stepwise procedure for robust external-drift kriging (georob). For (3) geoadditive models we selected penalized smoothing spline terms by component-wise gradient boosting (geoGAM). We further used two tree-based methods: (4) boosted regression trees (BRTs) and (5) random forest (RF). Lastly, we computed (6) weighted model averages (MAs) from the predictions obtained from methods 1–5.Lasso, georob and geoGAM successfully selected strongly reduced sets of covariates (subsets of 3–6 % of all covariates). Differences in predictive performance, tested on independent validation data, were mostly small and did not reveal a single best method for 48 responses. Nevertheless, RF was often the best among methods 1–5 (28 of 48 responses), but was outcompeted by MA for 14 of these 28 responses. RF tended to over-fit the data. The performance of BRT was slightly worse than RF. GeoGAM performed poorly on some responses and was the best only for 7 of 48 responses. The prediction accuracy of lasso was intermediate. All models generally had small bias. Only the computationally very efficient lasso had slightly larger bias because it tended to under-fit the data. Summarizing, although differences were small, the frequencies of the best and worst performance clearly favoured RF if a single method is applied and MA if multiple prediction models can be developed.


2021 ◽  
Author(s):  
Ruhollah Taghizadeh-Mehrjardi ◽  
Razieh Sheikhpour ◽  
Norair Toomanian ◽  
Thomas Scholten

<p>The most critical aspect of application of digital soil mapping is its limited transferability. Modelling soil properties for regions where no or only sparse soil information is available is highly uncertain, when using the low-cost geo-spatial environmental covariates alone. To overcome this drawback, transfer learning has been introduced in different environmental sciences, including soil science. The general idea behind extrapolation of soil information with transfer learning in soil science is that the target area to transfer to is alike, e.g. in terms of soil-forming factors, and the same machine learning rules can be applied. Supervised machine learning, so far, has been used to transfer the soil information from the reference to the target areas with very similar environmental characteristics between both. Hence, it is unclear how machine learning can perform for other target regions with different environmental characteristics. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data (reference area) with a large amount of unlabeled data (target area) during training. In this study, we explored if semi-supervised learning could improve the transferability of digital soil mapping relative to supervised learning methods. Soil data for two arid regions and associated environmental covariates were obtained. Semi-supervised learning and supervised learning models were trained based on the data in the reference area and then tested based on the data in the target area. The results of this study indicated the higher power of semi-supervised learning for transferring soil information from one area to another in comparison to the supervised learning method.   </p>


2021 ◽  
Author(s):  
Giulio Genova ◽  
Luis de Sousa ◽  
Tanja Mimmo ◽  
Luigi Borruso ◽  
Laura Poggio

<p>High quality global soil maps are crucial to face several challenges such as reducing soil erosion, climate change adaptation and mitigation, ensuring food and water security, and biodiversity conservation planning. To obtain accurate and robust soil properties maps, research and development are necessary to identify the most appropriate prediction models and to develop efficient and robust workflows. A few recent studies used Artificial Neural Networks (ANN) in Digital Soil Mapping, in some cases improving the accuracy of the predicted maps compared to other methods like Random Forest (RF). In this study we tested different ANN architectures on a global top-soil dataset of ca. 110 000 samples, comparing the results for the different architectures with the more traditional approach of RF. The target variables considered are pH, Soil Organic Carbon, Sand, Silt, and Clay. We selected 40 environmental covariates from a pool of over 400 to represent the most important soil forming factors. We tried simpler architectures (single input – single target) using point observations for one target variable with corresponding raster cell values for spatially explicit environmental covariates. We also used more complex architectures (multi input - multi target) incorporating contextual information surrounding an observation (convolutional) and with multiple target variables. Preliminary results show that increasing the number of hidden layers in the neural network does not significantly influence the results, while changing the type of architecture can play a bigger role in the overall accuracy of the model. The overall prediction accuracy of the ANN was comparable with the RF model. We conclude that ANN are a promising, relatively new, approach for Global Digital Soil Mapping and that further research is needed to improve performance.</p>


2014 ◽  
Vol 38 (3) ◽  
pp. 706-717 ◽  
Author(s):  
Waldir de Carvalho Junior ◽  
Cesar da Silva Chagas ◽  
Philippe Lagacherie ◽  
Braz Calderano Filho ◽  
Silvio Barge Bhering

Soil properties have an enormous impact on economic and environmental aspects of agricultural production. Quantitative relationships between soil properties and the factors that influence their variability are the basis of digital soil mapping. The predictive models of soil properties evaluated in this work are statistical (multiple linear regression-MLR) and geostatistical (ordinary kriging and co-kriging). The study was conducted in the municipality of Bom Jardim, RJ, using a soil database with 208 sampling points. Predictive models were evaluated for sand, silt and clay fractions, pH in water and organic carbon at six depths according to the specifications of the consortium of digital soil mapping at the global level (GlobalSoilMap). Continuous covariates and categorical predictors were used and their contributions to the model assessed. Only the environmental covariates elevation, aspect, stream power index (SPI), soil wetness index (SWI), normalized difference vegetation index (NDVI), and b3/b2 band ratio were significantly correlated with soil properties. The predictive models had a mean coefficient of determination of 0.21. Best results were obtained with the geostatistical predictive models, where the highest coefficient of determination 0.43 was associated with sand properties between 60 to 100 cm deep. The use of a sparse data set of soil properties for digital mapping can explain only part of the spatial variation of these properties. The results may be related to the sampling density and the quantity and quality of the environmental covariates and predictive models used.


Sign in / Sign up

Export Citation Format

Share Document