scholarly journals Using Machine Learning Algorithms to Estimate Soil Organic Carbon Variability with Environmental Variables and Soil Nutrient Indicators in an Alluvial Soil

Land ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 487
Author(s):  
Kingsley JOHN ◽  
Isong Abraham Isong ◽  
Ndiye Michael Kebonye ◽  
Esther Okon Ayito ◽  
Prince Chapman Agyeman ◽  
...  

Soil organic carbon (SOC) is an important indicator of soil quality and directly determines soil fertility. Hence, understanding its spatial distribution and controlling factors is necessary for efficient and sustainable soil nutrient management. In this study, machine learning algorithms including artificial neural network (ANN), support vector machine (SVM), cubist regression, random forests (RF), and multiple linear regression (MLR) were chosen for advancing the prediction of SOC. A total of sixty (n = 60) soil samples were collected within the research area at 30 cm soil depth and measured for SOC content using the Walkley–Black method. From these samples, 80% were used for model training and 21 auxiliary data were included as predictors. The predictors include effective cation exchange capacity (ECEC), base saturation (BS), calcium to magnesium ratio (Ca_Mg), potassium to magnesium ratio (K_Mg), potassium to calcium ratio (K_Ca), elevation, plan curvature, total catchment area, channel network base level, topographic wetness index, clay index, iron index, normalized difference build-up index (NDBI), ratio vegetation index (RVI), soil adjusted vegetation index (SAVI), normalized difference vegetation index (NDVI), normalized difference moisture index (NDMI) and land surface temperature (LST). Mean absolute error (MAE), root-mean-square error (RMSE) and R2 were used to determine the model performance. The result showed the mean SOC to be 1.62% with a coefficient of variation (CV) of 47%. The best performing model was RF (R2 = 0.68) followed by the cubist model (R2 = 0.51), SVM (R2 = 0.36), ANN (R2 = 0.36) and MLR (R2 = 0.17). The soil nutrient indicators, topographic wetness index and total catchment area were considered an indicator for spatial prediction of SOC in flat homogenous topography. Future studies should include other auxiliary predictors (e.g., soil physical and chemical properties, and lithological data) as well as cover a broader range of soil types to improve model performance.

2021 ◽  
Author(s):  
Ali Sakhaee ◽  
Anika Gebauer ◽  
Mareike Ließ ◽  
Axel Don

Abstract. Soil organic carbon (SOC), as the largest terrestrial carbon pool, has the potential to influence climate change and mitigation, and consequently SOC monitoring is important in the frameworks of different international treaties. There is therefore a need for high resolution SOC maps. Machine learning (ML) offers new opportunities to do this due to its capability for data mining of large datasets. The aim of this study, therefore, was to test three commonly used algorithms in digital soil mapping – random forest (RF), boosted regression trees (BRT) and support vector machine for regression (SVR) – on the first German Agricultural Soil Inventory to model agricultural topsoil SOC content. Nested cross-validation was implemented for model evaluation and parameter tuning. Moreover, grid search and differential evolution algorithm were applied to ensure that each algorithm was tuned and optimised suitably. The SOC content of the German Agricultural Soil Inventory was highly variable, ranging from 4 g kg−1 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The results show that SVR provided the best performance with RMSE of 32 g kg−1 when the algorithms were trained on the full dataset. However, the average RMSE of all algorithms decreased by 34 % when mineral and organic soils were modeled separately, with the best result from SVR with RMSE of 21 g kg−1. Model performance is often limited by the size and quality of the available soil dataset for calibration and validation. Therefore, the impact of enlarging the training data was tested by including 1223 data points from the European Land Use/Land Cover Area Frame Survey for agricultural sites in Germany. The model performance was enhanced for maximum 1 % for mineral soils and 2 % for organic soils. Despite the capability of machine learning algorithms in general, and particularly SVR, in modelling SOC on a national scale, the study showed that the most important to improve the model performance was separate modelling of mineral and organic soils.


2019 ◽  
Vol 11 (13) ◽  
pp. 3569 ◽  
Author(s):  
Li Qi ◽  
Shuai Wang ◽  
Qianlai Zhuang ◽  
Zijiao Yang ◽  
Shubin Bai ◽  
...  

Quantification of soil organic carbon (SOC) and pH, and their spatial variations at regional scales, is a foundation to adequately assess agriculture, pollution control, or environmental health and ecosystem functioning, so as to establish better practices for land use and land management. In this study, we used the random forest (RF) model to map the distribution of SOC and pH in the topsoil (0–20 cm) and estimate SOC and pH changes from 1982 to 2012 in Liaoning Province, Northeast China. A total of 10 covariates (elevation, slope gradient, topographic wetness index (TWI), mean annual temperature (MAT), mean annual precipitation (MAP), visible-red band 3 (B3), near-infrared band 4 (B4), short-wave infrared band 5 (B5), normalized difference vegetation index (NDVI), and land-use data) and a set of 806 (in 1982) and 973 (in 2012) soil samples were selected. Cross-validation technology was used to test the performance and uncertainty of the RF model. We found that the prediction R2 of SOC and pH was 0.69 and 0.54 for 1982, and 0.63 and 0.48 for 2012, respectively. Elevation, NDVI, and land use are the main environmental variables affecting the spatial variability of SOC in both periods. Correspondingly, the topographic wetness index and mean annual precipitation were the two most critical environmental variables affecting the spatial variation of pH. The mean SOC and pH decreased from 18.6 to 16.9 kg−1 and 6.9 to 6.6, respectively, over a 30-year period. SOC distribution generated using the RF model showed a decreasing SOC trend from east to west across the city in the two periods. In contrast, the spatial distribution of pH showed an opposite trend in both periods. This study provided important information of spatial variations in SOC and pH to agencies and communities in this region, to evaluate soil quality and make decisions on remediation and prevention of soil acidification and salinization.


2020 ◽  
Vol 12 (14) ◽  
pp. 2234 ◽  
Author(s):  
Mostafa Emadi ◽  
Ruhollah Taghizadeh-Mehrjardi ◽  
Ali Cherati ◽  
Majid Danesh ◽  
Amir Mosavi ◽  
...  

Estimation of the soil organic carbon (SOC) content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines (SVM), artificial neural networks (ANN), regression tree, random forest (RF), extreme gradient boosting (XGBoost), and conventional deep neural network (DNN) for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 14.9% of SOC spatial variability followed by the normalized difference vegetation index (12.5%), day temperature index of moderate resolution imaging spectroradiometer (10.6%), multiresolution valley bottom flatness (8.7%) and land use (8.2%), respectively. Based on 10-fold cross-validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 0.59%, a root mean squared error of 0.75%, a coefficient of determination of 0.65, and Lin’s concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 3.71%, followed by the aquic (2.45%) and xeric (2.10%) classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN (hidden layers = 7, and size = 50) is a promising algorithm for handling large numbers of auxiliary data at a province-scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC base-line map and minimal uncertainty.


2014 ◽  
Vol 9 (No. 2) ◽  
pp. 47-57 ◽  
Author(s):  
T. Zádorová ◽  
D. Žížala ◽  
V. Penížek ◽  
Š. Čejková

Colluvial soils, resulting from accelerated soil erosion, represent a significant part of the soil cover pattern in agricultural landscapes. Their specific terrain position makes it possible to map them using geostatistics and digital terrain modelling. A study of the relationship between colluvial soil extent and terrain and soil variables was performed at a morphologically diverse study site in a Luvisol soil region in Central Bohemia. Assessment of the specificity of the colluviation process with regard to profile characteristics of Luvisols was another goal of the study. A detailed field survey, statistical analyses, and detailed digital elevation model processing were the main methods utilized in the study. Statistical analysis showed a strong relationship between the occurrence of colluvial soil, various topographic derivatives, and soil organic carbon content. A multiple range test proved that four topographic derivatives significantly distinguish colluvial soil from other soil units and can be then used for colluvial soil delineation. Topographic wetness index was evaluated as the most appropriate terrain predictor. Soil organic carbon content was significantly correlated with five topographic derivatives, most strongly with topographic wetness index (TWI) and plan curvature. Redistribution of the soil material at the study site is intensive but not as significant as in loess regions covered by Chernozem. Soil mass transport is limited mainly to the A horizon; an argic horizon is truncated only at the steepest parts of the slope.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5714 ◽  
Author(s):  
Jianli Ding ◽  
Aixia Yang ◽  
Jingzhe Wang ◽  
Vasit Sagan ◽  
Danlin Yu

Soil organic carbon (SOC) is an important soil property that has profound impact on soil quality and plant growth. With 140 soil samples collected from Ebinur Lake Wetland National Nature Reserve, Xinjiang Uyghur Autonomous Region of China, this research evaluated the feasibility of visible/near infrared (VIS/NIR) spectroscopy data (350–2,500 nm) and simulated EO-1 Hyperion data to estimate SOC in arid wetland regions. Three machine learning algorithms including Ant Colony Optimization-interval Partial Least Squares (ACO-iPLS), Recursive Feature Elimination-Support Vector Machine (RF-SVM), and Random Forest (RF) were employed to select spectral features and further estimate SOC. Results indicated that the feature wavelengths pertaining to SOC were mainly within the ranges of 745–910 nm and 1,911–2,254 nm. The combination of RF-SVM and first derivative pre-processing produced the highest estimation accuracy with the optimal values of Rt (correlation coefficient of testing set), RMSEt and RPD of 0.91, 0.27% and 2.41, respectively. The simulated EO-1 Hyperion data combined with Support Vector Machine (SVM) based recursive feature elimination algorithm produced the most accurate estimate of SOC content. For the testing set, Rt was 0.79, RMSEt was 0.19%, and RPD was 1.61. This practice provides an efficient, low-cost approach with potentially high accuracy to estimate SOC contents and hence supports better management and protection strategies for desert wetland ecosystems.


Sign in / Sign up

Export Citation Format

Share Document