Digital mapping of topsoil pH by random forest with residual kriging (RFRK) in a hilly region

Soil Research ◽  
2019 ◽  
Vol 57 (4) ◽  
pp. 387 ◽  
Author(s):  
Lei Wang ◽  
Wei Wu ◽  
Hong-Bin Liu

Soil pH is a vital attribute of soil fertility. The accurate and efficient prediction of soil pH can provide the necessary basic information for agricultural development. In the present study, random forest with residual kriging (RFRK) was used to predict soil pH based on stratum, climate, vegetation and topography in a hilly region. The performance of RFRK was compared with those of the classification and regression tree (CART) and the random forest (RF). Comparative results showed that RFRK provided the best performance. The corresponding values of Lin’s concordance correlation coefficient, coefficient of determination, mean absolute error and root mean square error were as follows: 0.70, 0.51, 0.44 and 0.61 for CART; 0.80, 0.70, 0.34 and 0.48 for RF; and 0.88, 0.80, 0.25 and 0.39 for RFRK. Stratum and average annual temperature were the most important factors affecting the soil pH in the study area. Results indicate that RFRK is a feasible and reliable tool for predicting soil pH in hilly regions.

2020 ◽  
Author(s):  
Leo T. Pham ◽  
Lifeng Luo ◽  
Andrew O. Finley

Abstract. In the past decades, data-driven Machine Learning (ML) models have emerged as promising tools for short-term streamflow forecasts. Among other qualities, the popularity of ML for such applications is due to the methods' competitive performance compared with alternative approaches, ease of application, and relative lack of strict distributional assumptions. Despite the encouraging results, most applications of ML for streamflow forecast have been limited to watersheds where rainfall is the major source of runoff. In this study, we evaluate the potential of Random Forest (RF), a popular ML method, to make streamflow forecast at 1-day lead time at 86 watersheds in the Pacific Northwest. These watersheds span climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes: rainfall-dominated, transisent, and snowmelt-dominated based on the timing of center of annual flow volume. RF performance is benchmarked against Naive and multiple linear regression (MLR) models, and evaluated using four metrics Coefficient of determination, Root mean squared error, Mean absolute error, and Kling-Gupta efficiency. Model evaluation metrics suggest RF performs better in snowmelt-driven watersheds. Largest improvement in forecasts, compared to benchmark models, are found among rainfall-driven watersheds. We obtain Kling–Gupta Efficiency (KGE) scores in the range of 0.62–0.99. RF performance deteriorates with increase in catchment slope and increase in soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.


2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Rahmat Robi Waliyansyah ◽  
Nugroho Dwi Saputro

College education institutions regularly hold new student admissions activities, and the number of new students can increase and can also decrease. University of PGRI Semarang (UPGRIS) on the development of new student admissions for the 2014/2015 academic year up to 2018/2019 with so many admissions selection stages. To meet the minimum comparison requirements between the number of students with the development of human resources, facilities, and infrastructure, it is necessary to predict how much the number of students increases each year. To make a prediction system or forecasting, the number of prospective new students required a good forecasting method and sufficiently precise calculations to predict the number of prospective students who register. In this study, the method to be taken is the Random Forest method. For the evaluation of forecasting models used Random Sampling and Cross-validation. The parameter used is Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R2). The results of this study obtained the five highest and lowest study programs in the admission of new students. Therefore, UPGRIS will make a new strategy for the five lowest study programs so that the desired number of new students is achieved


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2849 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.


2021 ◽  
Vol 5 (10) ◽  
pp. 271
Author(s):  
Priyanka Gupta ◽  
Nakul Gupta ◽  
Kuldeep K. Saxena ◽  
Sudhir Goyal

Geopolymer is an eco-friendly material used in civil engineering works. For geopolymer concrete (GPC) preparation, waste fly ash (FA) and calcined clay (CC) together were used with percentage variation from 5, 10, and 15. In the mix design for geopolymers, there is no systematic methodology developed. In this study, the random forest regression method was used to forecast compressive strength and split tensile strength. The input content involved were caustic soda with 12 M, 14 M, and 16 M; sodium silicate; coarse aggregate passing 20 mm and 10 mm sieve; crushed stone dust; superplasticizer; curing temperature; curing time; added water; and retention time. The standard age of 28 days was used, and a total of 35 samples with a target-specified compressive strength of 30 MPa were prepared. In all, 20% of total data were trained, and 80% of data testing was performed. Efficacy in terms of mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), and MSE (mean squared error) is suggested in the model. The results demonstrated that the RFR model is likely to predict GPC compressive strength (MAE = 1.85 MPa, MSE = 0.05 MPa, RMSE = 2.61 MPa, and R2 = 0.93) and split tensile strength (MAE = 0.20 MPa, MSE = 6.83 MPa, RMSE = 0.24 MPa, and R2 = 0.90) during training.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


Sign in / Sign up

Export Citation Format

Share Document