Performance of three machine learning algorithms for predicting soil organic carbon in German agricultural soil

Abstract. Soil organic carbon (SOC), as the largest terrestrial carbon pool, has the potential to influence climate change and mitigation, and consequently SOC monitoring is important in the frameworks of different international treaties. There is therefore a need for high resolution SOC maps. Machine learning (ML) offers new opportunities to do this due to its capability for data mining of large datasets. The aim of this study, therefore, was to test three commonly used algorithms in digital soil mapping – random forest (RF), boosted regression trees (BRT) and support vector machine for regression (SVR) – on the first German Agricultural Soil Inventory to model agricultural topsoil SOC content. Nested cross-validation was implemented for model evaluation and parameter tuning. Moreover, grid search and differential evolution algorithm were applied to ensure that each algorithm was tuned and optimised suitably. The SOC content of the German Agricultural Soil Inventory was highly variable, ranging from 4 g kg−1 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The results show that SVR provided the best performance with RMSE of 32 g kg−1 when the algorithms were trained on the full dataset. However, the average RMSE of all algorithms decreased by 34 % when mineral and organic soils were modeled separately, with the best result from SVR with RMSE of 21 g kg−1. Model performance is often limited by the size and quality of the available soil dataset for calibration and validation. Therefore, the impact of enlarging the training data was tested by including 1223 data points from the European Land Use/Land Cover Area Frame Survey for agricultural sites in Germany. The model performance was enhanced for maximum 1 % for mineral soils and 2 % for organic soils. Despite the capability of machine learning algorithms in general, and particularly SVR, in modelling SOC on a national scale, the study showed that the most important to improve the model performance was separate modelling of mineral and organic soils.

Download Full-text

Supplementary material to "Performance of three machine learning algorithms for predicting soil organic carbon in German agricultural soil"

10.5194/soil-2021-107-supplement ◽

2021 ◽

Author(s):

Ali Sakhaee ◽

Anika Gebauer ◽

Mareike Ließ ◽

Axel Don

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Agricultural Soil ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supplementary Material

Download Full-text

Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran

Remote Sensing ◽

10.3390/rs12142234 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2234 ◽

Cited By ~ 6

Author(s):

Mostafa Emadi ◽

Ruhollah Taghizadeh-Mehrjardi ◽

Ali Cherati ◽

Majid Danesh ◽

Amir Mosavi ◽

...

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Composite Surface ◽

Auxiliary Data ◽

Extreme Gradient Boosting

Estimation of the soil organic carbon (SOC) content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines (SVM), artificial neural networks (ANN), regression tree, random forest (RF), extreme gradient boosting (XGBoost), and conventional deep neural network (DNN) for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 14.9% of SOC spatial variability followed by the normalized difference vegetation index (12.5%), day temperature index of moderate resolution imaging spectroradiometer (10.6%), multiresolution valley bottom flatness (8.7%) and land use (8.2%), respectively. Based on 10-fold cross-validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 0.59%, a root mean squared error of 0.75%, a coefficient of determination of 0.65, and Lin’s concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 3.71%, followed by the aquic (2.45%) and xeric (2.10%) classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN (hidden layers = 7, and size = 50) is a promising algorithm for handling large numbers of auxiliary data at a province-scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC base-line map and minimal uncertainty.

Download Full-text

The application of machine learning algorithms in predicting soil organic carbon/matter

10.31274/etd-20200624-203 ◽

2020 ◽

Author(s):

Yones Khaledian

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Carbon Matter

Download Full-text

High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.138244 ◽

2020 ◽

Vol 729 ◽

pp. 138244 ◽

Cited By ~ 2

Author(s):

Tao Zhou ◽

Yajun Geng ◽

Jie Chen ◽

Jianjun Pan ◽

Dagmar Haase ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Total Nitrogen ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Soil Total Nitrogen ◽

Digital Mapping ◽

Sentinel 2

Download Full-text

The large scale digital mapping of soil organic carbon using machine learning algorithms

Dokuchaev Soil Bulletin ◽

10.19047/0136-1694-2018-91-46-62 ◽

2018 ◽

Vol 91 ◽

pp. 46-62 ◽

Cited By ~ 1

Author(s):

A. V. Chinilin ◽

◽

I. Yu. Savin ◽

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Digital Mapping

Download Full-text

Prediction of Dansgaard-Oeschger events using machine learning

10.5194/egusphere-egu21-9699 ◽

2021 ◽

Author(s):

Nuno Moniz ◽

Susana Barbosa

Keyword(s):

Machine Learning ◽

Time Series ◽

Prediction Models ◽

Learning Algorithms ◽

Ice Core ◽

Model Performance ◽

Predictive Performance ◽

Oxygen Isotopic Composition ◽

Machine Learning Algorithms ◽

Support Vector

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>

Download Full-text

A comparative study between a new method and other machine learning algorithms for soil organic carbon and total nitrogen prediction using near infrared spectroscopy

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2019.103873 ◽

2019 ◽

Vol 195 ◽

pp. 103873 ◽

Cited By ~ 7

Author(s):

Rabie Reda ◽

Taoufiq Saffaj ◽

Bouzida Ilham ◽

Ouadi Saidi ◽

Kadmiri Issam ◽

...

Keyword(s):

Machine Learning ◽

Infrared Spectroscopy ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Comparative Study ◽

Total Nitrogen ◽

Near Infrared Spectroscopy ◽

Near Infrared ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Comparative Analysis of Two Machine Learning Algorithms in Predicting Site-Level Net Ecosystem Exchange in Major Biomes

Remote Sensing ◽

10.3390/rs13122242 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2242

Author(s):

Jianzhao Liu ◽

Yunjiang Zuo ◽

Nannan Wang ◽

Fenghui Yuan ◽

Xinhao Zhu ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Substantial Reduction ◽

Model Performance ◽

Extreme Heat ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ecological Data ◽

Extreme Gradient Boosting ◽

The Impact

The net ecosystem CO2 exchange (NEE) is a critical parameter for quantifying terrestrial ecosystems and their contributions to the ongoing climate change. The accumulation of ecological data is calling for more advanced quantitative approaches for assisting NEE prediction. In this study, we applied two widely used machine learning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), to build models for simulating NEE in major biomes based on the FLUXNET dataset. Both models accurately predicted NEE in all biomes, while XGBoost had higher computational efficiency (6~62 times faster than RF). Among environmental variables, net solar radiation, soil water content, and soil temperature are the most important variables, while precipitation and wind speed are less important variables in simulating temporal variations of site-level NEE as shown by both models. Both models perform consistently well for extreme climate conditions. Extreme heat and dryness led to much worse model performance in grassland (extreme heat: R2 = 0.66~0.71, normal: R2 = 0.78~0.81; extreme dryness: R2 = 0.14~0.30, normal: R2 = 0.54~0.55), but the impact on forest is less (extreme heat: R2 = 0.50~0.78, normal: R2 = 0.59~0.87; extreme dryness: R2 = 0.86~0.90, normal: R2 = 0.81~0.85). Extreme wet condition did not change model performance in forest ecosystems (with R2 changing −0.03~0.03 compared with normal) but led to substantial reduction in model performance in cropland (with R2 decreasing 0.20~0.27 compared with normal). Extreme cold condition did not lead to much changes in model performance in forest and woody savannas (with R2 decreasing 0.01~0.08 and 0.09 compared with normal, respectively). Our study showed that both models need training samples at daily timesteps of >2.5 years to reach a good model performance and >5.4 years of daily samples to reach an optimal model performance. In summary, both RF and XGBoost are applicable machine learning algorithms for predicting ecosystem NEE, and XGBoost algorithm is more feasible than RF in terms of accuracy and efficiency.

Download Full-text

Data based predictive models for odor perception

Scientific Reports ◽

10.1038/s41598-020-73978-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Rinu Chacko ◽

Deepak Jain ◽

Manasi Patwardhan ◽

Abhishek Puri ◽

Shirish Karande ◽

...

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Model Performance ◽

Structural Features ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Structure Property ◽

Odor Perception ◽

The Impact

Abstract Machine learning and data analytics are being increasingly used for quantitative structure property relation (QSPR) applications in the chemical domain where the traditional Edisonian approach towards knowledge-discovery have not been fruitful. The perception of odorant stimuli is one such application as olfaction is the least understood among all the other senses. In this study, we employ machine learning based algorithms and data analytics to address the efficacy of using a data-driven approach to predict the perceptual attributes of an odorant namely the odorant characters (OC) of “sweet” and “musky”. We first analyze a psychophysical dataset containing perceptual ratings of 55 subjects to reveal patterns in the ratings given by subjects. We then use the data to train several machine learning algorithms such as random forest, gradient boosting and support vector machine for prediction of the odor characters and report the structural features correlating well with the odor characters based on the optimal model. Furthermore, we analyze the impact of the data quality on the performance of the models by comparing the semantic descriptors generally associated with a given odorant to its perception by majority of the subjects. The study presents a methodology for developing models for odor perception and provides insights on the perception of odorants by untrained human subjects and the effect of the inherent bias in the perception data on the model performance. The models and methodology developed here could be used for predicting odor characters of new odorants.

Download Full-text

Using Machine Learning Algorithms to Estimate Soil Organic Carbon Variability with Environmental Variables and Soil Nutrient Indicators in an Alluvial Soil

Land ◽

10.3390/land9120487 ◽

2020 ◽

Vol 9 (12) ◽

pp. 487

Author(s):

Kingsley JOHN ◽

Isong Abraham Isong ◽

Ndiye Michael Kebonye ◽

Esther Okon Ayito ◽

Prince Chapman Agyeman ◽

...

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Catchment Area ◽

Vegetation Index ◽

Model Performance ◽

Soil Nutrient ◽

Machine Learning Algorithms ◽

Topographic Wetness Index ◽

Wetness Index

Soil organic carbon (SOC) is an important indicator of soil quality and directly determines soil fertility. Hence, understanding its spatial distribution and controlling factors is necessary for efficient and sustainable soil nutrient management. In this study, machine learning algorithms including artificial neural network (ANN), support vector machine (SVM), cubist regression, random forests (RF), and multiple linear regression (MLR) were chosen for advancing the prediction of SOC. A total of sixty (n = 60) soil samples were collected within the research area at 30 cm soil depth and measured for SOC content using the Walkley–Black method. From these samples, 80% were used for model training and 21 auxiliary data were included as predictors. The predictors include effective cation exchange capacity (ECEC), base saturation (BS), calcium to magnesium ratio (Ca_Mg), potassium to magnesium ratio (K_Mg), potassium to calcium ratio (K_Ca), elevation, plan curvature, total catchment area, channel network base level, topographic wetness index, clay index, iron index, normalized difference build-up index (NDBI), ratio vegetation index (RVI), soil adjusted vegetation index (SAVI), normalized difference vegetation index (NDVI), normalized difference moisture index (NDMI) and land surface temperature (LST). Mean absolute error (MAE), root-mean-square error (RMSE) and R2 were used to determine the model performance. The result showed the mean SOC to be 1.62% with a coefficient of variation (CV) of 47%. The best performing model was RF (R2 = 0.68) followed by the cubist model (R2 = 0.51), SVM (R2 = 0.36), ANN (R2 = 0.36) and MLR (R2 = 0.17). The soil nutrient indicators, topographic wetness index and total catchment area were considered an indicator for spatial prediction of SOC in flat homogenous topography. Future studies should include other auxiliary predictors (e.g., soil physical and chemical properties, and lithological data) as well as cover a broader range of soil types to improve model performance.

Download Full-text