Enhancing accuracy and interpretability of machine learning models using super learning and permutation feature importance techniques in digital soil mapping
<p>The digital soil mapping (DSM) approach predicts soil characteristics based on the relationship between soil observations and related covariates using machine learning (ML) models. In this research, we applied a wide range of machine learning models (12 base learners) to predict and map soil characteristics. To enhance accuracy and interpretability we combined the base learner predictions using super learning strategy. However, a major problem of using super learning and complex models is that the explicit share of individual covariates persons in the overall result cannot be explicitly quantified. To overcome this restriction and make the super learning models interpretable, we employed model-agnostic interpretation tools, for example, permutation feature importance. Particularly, we integrated the weight assigned to each ML base learner obtained by super learning and the ranked ML base learner&#8217;s covariates obtained by permutation feature importance to explore the contribution of covariates on the final prediction. We tested our super learning and permutation feature importance techniques to predict and mapping physicochemical soil characteristics of Urmia Playa Lake (UPL) sediments in Iran. As expected, our results indicated that super leaning could significantly improve the ML accuracies for predicting soil characteristics of single base learners. In terms of root mean square error, super learning improved over the performance of the linear regression by an average of 45.7%. Furthermore, the permutation feature importance allowed us to interpret our results better and prove the significant contribution of geomorphological features and groundwater data in predicting soil characteristics of UPL sediments.</p>